LiteLLM: [Python SDK] [Proxy Server (LLM Gateway)]

2024-09-13 7 minute read

LiteLLM Proxy Server (LLM Gateway)

安装

pip install 'litellm[proxy]'

编辑配置文件：`config.yaml`

model_list:
  - model_name: qwen-coder
    litellm_params:
      model: ollama/qwen2.5-coder:7b
  - model_name: bge-m3
    litellm_params:
      model: ollama/bge-m3
  - model_name: llava
    litellm_params:
      model: ollama/llava:7b
      api_base: "http://localhost:11434"
      # api_base: http://127.0.0.1:11434/v1 # ❌ 500 Internal Server Error
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4-32k
      api_base: http://172.16.33.66:9997/v1
      api_key: none
  - model_name: bge
    litellm_params:
      model: openai/bge-m3
      api_base: http://172.16.33.66:9997/v1
      api_key: none
general_settings:
  master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)

命令部署

# 集成 Langfuse
LANGFUSE_PUBLIC_KEY=pk-lf-fd5d8fba-5134-4037-884d-d6780894a65a
LANGFUSE_SECRET_KEY=sk-lf-10122a92-da11-4423-b3f7-ad10e5f268fc
LANGFUSE_HOST=http://127.0.0.1:3000

litellm --config config.yaml

Docker 部署

NO DB

docker run --name litellm \
    -v $(pwd)/litellm_config.yaml:/app/config.yaml \
    -p 4000:4000 \
    ghcr.io/berriai/litellm:main-stable \
    --config /app/config.yaml \
    --detailed_debug

DB

Proxy 密钥管理 - Virtual Keys

# Get the code
git clone https://github.com/BerriAI/litellm

# Go to folder
cd litellm

# Add the master key - you can change this after setup
echo 'LITELLM_MASTER_KEY="sk-1234"' > .env

# Add the litellm salt key - you cannot change this after adding a model
# It is used to encrypt / decrypt your LLM API Key credentials
# We recommned - https://1password.com/password-generator/ 
# password generator to get a random hash for litellm salt key
echo 'LITELLM_SALT_KEY="sk-1234"' > .env

source .env

# Start
docker-compose up

模型测试

Chat Completions

curl -X POST 'http://127.0.0.1:4000/v1/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer sk-1234' \
    --data-raw '{
        "model": "gpt-4",
        "messages": [ 
            { "role": "system", "content": "你是一名数学家。" }, 
            { "role": "user", "content": "1 + 1 = ?"} 
        ]
    }'

Completions

curl -X POST 'http://127.0.0.1:4000/v1/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer sk-1234' \
    --data-raw '{
        "model": "gpt-4",
        "prompt": "天空为什么是蓝色的？",
        "max_tokens": 256,
        "temperature": 0
    }'

Embeddings

curl -X POST http://127.0.0.1:4000/v1/embeddings \
    -H "Content-Type: application/json" \
    -H 'Authorization: Bearer sk-1234' \
    -d '{
        "model": "bge",
        "input": "Hello!"
    }'

视觉问答（llava）

curl "http://127.0.0.1:4000/v1/chat/completions" \
  -X POST \
  -H "Content-Type: application/json" \
  -H 'Authorization: Bearer sk-1234' \
  -d '{
    "model": "llava",
    "messages": [
      {
        "role": "user", 
        "content": [
          { "type": "text", "text": "图像上面有几只动物(用中文回答)" },      
          { "type": "image_url", "image_url": { "url": "'$(base64 -i /Users/junjian/截屏/catdog.jpg)'" } }
        ]
      }
    ]
  }'

http://127.0.0.1:4000/v1/chat/completions 换成 http://127.0.0.1:4000/chat/completions 也可以。

LiteLLM Proxy Server UI

模型列表

添加模型

Provider: OpenAI-Compatible Endpoints (Together AI, etc.)
Public Model Name: gpt-4
LiteLLM Model Name(s): gpt-4-32k
API Key: NONE
API Base: http://172.16.33.66:9997/v1
LiteLLM Params: Pass JSON of litellm supported params litellm.completion() call

模型分析

LiteLLM Python SDK

安装

pip install litellm

OpenAI-Compatible Endpoints

import os
import litellm


os.environ['LITELLM_LOG'] = 'DEBUG'
os.environ["OPENAI_API_BASE"] = "http://172.16.33.66:9997/v1"
os.environ["OPENAI_API_KEY"] = "NONE"

openai_response = litellm.completion(
  model="gpt-4-32k",
  messages=[
    {"role": "system", "content": "您是人工智能助手。"}, 
    {"role": "user", "content": "介绍一下自己。"}
  ]
)

print(openai_response)

视觉问答（llava）

import litellm


# model = "ollama/minicpm-v:8b" # ❌ 没有成功，可能是 LiteLLM 组织的提示词有问题吧。
model = "ollama/llava:7b"

# 参考：https://github.com/BerriAI/litellm/blob/main/litellm/llms/ollama.py#L169
# ollama wants plain base64 jpeg/png files as images.  strip any leading dataURI
# and convert to jpeg if necessary.
def _convert_image(image):
    import base64
    import io

    try:
        from PIL import Image
    except:
        raise Exception(
            "ollama image conversion failed please run `pip install Pillow`"
        )

    image_data = Image.open(image)
    jpeg_image = io.BytesIO()
    image_data.convert("RGB").save(jpeg_image, "JPEG")
    jpeg_image.seek(0)
    return base64.b64encode(jpeg_image.getvalue()).decode("utf-8")

is_file = True
if is_file:
  image_path = "/Users/junjian/截屏/catdog.jpg"
  base64_image = _convert_image(image_path)
  url = f"data:image/jpeg;base64,{base64_image}"
else:
  url = "http://book.d2l.ai/_images/catdog.jpg"

response = litellm.completion(
  model=model,
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "图像上面有几只动物(用中文回答)"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": url
          }
        },
      ]
    }
  ],
)

print(response.choices[0].message.content)

这张图片显示了一只狗和一只猫。狗是一种可爱的家庭宠物，而猫则是另一种广受喜爱的家庭宠物。两者在一起看着相机，表现出一种互动感的关系。

LiteLLM & Langfuse Integration（集成）

LiteLLM Proxy Server (LLM Gateway)

部署 Langfuse

编辑 docker-compose.yml

networks:
  shared-network:
    name: shared-network

services:
  langfuse-server:
    image: langfuse/langfuse:2
    depends_on:
      db:
        condition: service_healthy
    ports:
      - "3000:3000"
    environment:
      - DATABASE_URL=postgresql://postgres:postgres@db:5432/postgres
      - NEXTAUTH_SECRET=mysecret
      - SALT=mysalt
      - ENCRYPTION_KEY=0000000000000000000000000000000000000000000000000000000000000000 # generate via `openssl rand -hex 32`
      - NEXTAUTH_URL=http://localhost:3000
      - TELEMETRY_ENABLED=${TELEMETRY_ENABLED:-true}
      - LANGFUSE_ENABLE_EXPERIMENTAL_FEATURES=${LANGFUSE_ENABLE_EXPERIMENTAL_FEATURES:-false}
    networks:
      - shared-network

  db:
    image: postgres
    restart: always
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 3s
      timeout: 3s
      retries: 10
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=postgres
    ports:
      - 5432:5432
    volumes:
      - database_data:/var/lib/postgresql/data
    networks:
      - shared-network

volumes:
  database_data:
    driver: local

创建共享网络 shared-network
设置每个服务的网络为 shared-network

启动服务

docker-compose up -d

部署 LiteLLM

编辑配置文件 config.yaml

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4-32k
      api_base: http://172.16.33.66:9997/v1
      api_key: NONE
  - model_name: bge
    litellm_params:
      model: openai/bge-m3
      api_base: http://172.16.33.66:9997/v1
      api_key: NONE
general_settings: 
  master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
litellm_settings:
  success_callback: ["langfuse"]
  failure_callback: ["langfuse"]

编辑 docker-compose.yml

version: "3.11"

networks:
  shared-network:
    name: shared-network

services:
  litellm:
    build:
      context: .
      args:
        target: runtime
    image: ghcr.io/berriai/litellm:main-stable
    volumes:
      - ./config.yaml:/etc/litellm/config.yaml
    ports:
      - "4000:4000" # Map the container port to the host, change the host port if necessary
    command:
      - '--config=/etc/litellm/config.yaml'
      - '--debug'
    environment:
        DATABASE_URL: "postgresql://llmproxy:dbpassword9090@db:5434/litellm"
        STORE_MODEL_IN_DB: "True" # allows adding models to proxy via UI
        LANGFUSE_PUBLIC_KEY: "pk-lf-fd5d8fba-5134-4037-884d-d6780894a65a"
        LANGFUSE_SECRET_KEY: "sk-lf-10122a92-da11-4423-b3f7-ad10e5f268fc"
        LANGFUSE_HOST: "http://langfuse-server:3000"
    env_file:
      - .env # Load local .env file
    networks:
      - shared-network

  db:
    image: postgres
    restart: always
    environment:
      PGPORT: 5434
      POSTGRES_DB: litellm
      POSTGRES_USER: llmproxy
      POSTGRES_PASSWORD: dbpassword9090
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -d litellm -U llmproxy"]
      interval: 1s
      timeout: 5s
      retries: 10
    networks:
      - shared-network
  
  prometheus:
    image: prom/prometheus
    volumes:
      - prometheus_data:/prometheus
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=15d'
    restart: always
    networks:
      - shared-network

volumes:
  prometheus_data:
    driver: local

创建共享网络 shared-network
设置每个服务的网络为 shared-network
services.litellm.environment
- 设置 LANGFUSE_HOST 为 http://langfuse-server:3000
- 设置 LANGFUSE_PUBLIC_KEY 和 LANGFUSE_SECRET_KEY 为 Langfuse 的公钥和私钥
- volumes
  - 挂载 config.yaml 到 /etc/litellm/config.yaml
- command
  - 开启调试模式 --debug
  - 指定配置文件路径 --config=/etc/litellm/config.yaml
db.environment
- 修改 postgre 数据库的默认端口 PGPORT: 5434，避免与 Langfuse 的数据库端口冲突🛑。

启动服务

docker-compose up -d

可以运行上面的模型测试（curl 命令）

LiteLLM Python SDK

编辑 main.py

import os
import litellm
 

os.environ["LANGFUSE_HOST"]="http://localhost:3000"
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-fd5d8fba-5134-4037-884d-d6780894a65a"
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-10122a92-da11-4423-b3f7-ad10e5f268fc"
 
os.environ["OPENAI_API_BASE"] = "http://172.16.33.66:9997/v1"
os.environ["OPENAI_API_KEY"] = "NONE"

os.environ['LITELLM_LOG'] = 'DEBUG'

litellm.success_callback = ["langfuse"]
litellm.failure_callback = ["langfuse"]

openai_response = litellm.completion(
  model="gpt-4-32k",
  messages=[
    {"role": "system", "content": "您是人工智能助手。"},
    {"role": "user", "content": "介绍一下自己。"}
  ]
)

print(openai_response)

运行 main.py

查看 Langfuse Dashboard

❌ 没有`撤梯子`出现了莫名其妙的问题

调试了半天 😅

INFO:     127.0.0.1:65383 - "POST /v1/chat/completions HTTP/1.1" 200 OK
16:30:59 - LiteLLM Proxy:ERROR: proxy_server.py:2586 - litellm.proxy.proxy_server.async_data_generator(): Exception occured - b''
Traceback (most recent call last):
  File "/opt/miniconda/lib/python3.10/site-packages/litellm/proxy/proxy_server.py", line 2565, in async_data_generator
    async for chunk in response:
  File "/opt/miniconda/lib/python3.10/site-packages/litellm/llms/ollama.py", line 426, in ollama_async_streaming
    raise e  # don't use verbose_logger.exception, if exception is raised
  File "/opt/miniconda/lib/python3.10/site-packages/litellm/llms/ollama.py", line 381, in ollama_async_streaming
    raise OllamaError(
litellm.llms.ollama.OllamaError: b''
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/opt/miniconda/lib/python3.10/site-packages/starlette/responses.py", line 265, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/opt/miniconda/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap
    await func()
  File "/opt/miniconda/lib/python3.10/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
    message = await receive()
  File "/opt/miniconda/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 596, in receive
    await self.message_event.wait()
  File "/opt/miniconda/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 16e0d7ac0

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/opt/miniconda/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 435, in run_asgi
  |     result = await app(  # type: ignore[func-returns-value]
  |   File "/opt/miniconda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
  |     return await self.app(scope, receive, send)
  |   File "/opt/miniconda/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
  |     await super().__call__(scope, receive, send)
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
  |     raise exc
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
  |     await self.app(scope, receive, _send)
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
  |     await self.app(scope, receive, send)
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
  |     raise exc
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/routing.py", line 756, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
  |     await route.handle(scope, receive, send)
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
  |     await self.app(scope, receive, send)
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
  |     raise exc
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/routing.py", line 75, in app
  |     await response(scope, receive, send)
  |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/responses.py", line 258, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/opt/miniconda/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 680, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/opt/miniconda/lib/python3.10/site-packages/litellm/proxy/proxy_server.py", line 2565, in async_data_generator
    |     async for chunk in response:
    |   File "/opt/miniconda/lib/python3.10/site-packages/litellm/llms/ollama.py", line 426, in ollama_async_streaming
    |     raise e  # don't use verbose_logger.exception, if exception is raised
    |   File "/opt/miniconda/lib/python3.10/site-packages/litellm/llms/ollama.py", line 381, in ollama_async_streaming
    |     raise OllamaError(
    | litellm.llms.ollama.OllamaError: b''
    | 
    | During handling of the above exception, another exception occurred:
    | 
    | Traceback (most recent call last):
    |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap
    |     await func()
    |   File "/opt/miniconda/lib/python3.10/site-packages/starlette/responses.py", line 250, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/opt/miniconda/lib/python3.10/site-packages/litellm/proxy/proxy_server.py", line 2607, in async_data_generator
    |     proxy_exception = ProxyException(
    |   File "/opt/miniconda/lib/python3.10/site-packages/litellm/proxy/_types.py", line 1839, in __init__
    |     "No healthy deployment available" in self.message
    | TypeError: a bytes-like object is required, not 'str'
    +------------------------------------

LiteLLM Proxy Server (LLM Gateway)

安装

编辑配置文件：config.yaml

命令部署

Docker 部署

NO DB

DB

模型测试

Chat Completions

Completions

Embeddings

视觉问答（llava）

LiteLLM Proxy Server UI

模型列表

添加模型

模型分析

LiteLLM Python SDK

安装

OpenAI-Compatible Endpoints

视觉问答（llava）

LiteLLM & Langfuse Integration（集成）

LiteLLM Proxy Server (LLM Gateway)

部署 Langfuse

部署 LiteLLM

LiteLLM Python SDK

查看 Langfuse Dashboard

❌ 没有撤梯子出现了莫名其妙的问题

参考资料

编辑配置文件：`config.yaml`

❌ 没有`撤梯子`出现了莫名其妙的问题