2 篇文章带有标签 “llm-deployment”

2024年1月16日星期二

使用 FastChat 在 CUDA 上部署 LLM

安装 FastChat & vLLM

pip install "fschat[model_worker,webui]"

安装 FlashAttention

Turing GPU T4 不支持 FlashAttention 2，需要使用 FlashAttention 1.x 。
Turing GPU T4 不支持 bf16，需要使用 fp16 。

安装 vLLM

pip install vllm -i https://mirrors.aliyun.com/pypi/simple/

升级 FastChat & vLLM

git pull
pip install -e ".[model_worker,webui]"
pip install -U vllm

部署 LLM

运行 Controller

python -m fastchat.serve.controller

运行 OpenAI API Server

python -m fastchat.serve.openai_api_server

运行 Model Worker Qwen-1_8B-Chat export CUDA_VISIBLE_DEVIC

2024-01-16 08:00

2024年1月11日星期四

在 MacBook Pro M2 Max 上安装 FastChat

FastChat

FastChat 是一个开放平台，用于训练、服务和评估基于大型语言模型的聊天机器人。

FastChat Server 架构图

安装 FastChat

克隆代码

git clone https://github.com/lm-sys/FastChat
cd FastChat

创建虚拟环境

python -m venv env
source env/bin/activate

安装

pip install --upgrade pip
pip install -e ".[model_worker,webui]"

升级 FastChat

git pull
pip install -e ".[model_worker,webui]"

创建大模型链接 LLM Qwen mkdir Qwen ln -s /Users/junjian/HuggingFace/Qwen/Qwen-14B-Chat Qwen/Qwen-14B-Chat ln -s /Users/junjian/HuggingFace/Qwen/Qwen-1_8B Qwen/Qwen-1_8B ln -s /Users/junjian/HuggingFace/Qwen/Qwen-1_8B-Chat Qwen/Qwen-1_8B-Chat ln

2024-01-11 08:00

fastchat qwen deepseek chatglm bge llm-deployment openai-api mps macbook-pro-m2-max

2 篇文章带有标签 “llm-deployment”

2024年1月16日 星期二

使用 FastChat 在 CUDA 上部署 LLM

2024年1月11日 星期四

在 MacBook Pro M2 Max 上安装 FastChat

2024年1月16日星期二

2024年1月11日星期四