7 篇文章带有标签 “FastChat”

2024年1月17日星期三

LLM 的基准测试

Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is coming soon, please use FlashAttention 1.x for Turing GPUs for now.
Datatype fp16 and bf16 (bf16 requires Ampere, Ada, or Hopper GPUs).
All head dimensions up to 256. Head dim > 192 backward requires A100/A800 or H100/H800.

Turing GPU T4 不支持，需要使用 FlashAttention 1.x，否则会报错 ❌：

data: {
  "text": "**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**\n\n(FlashAttention only supports Ampere GPUs or newer.)", 
  "error_code": 50001
}

2024年1月17日 4 分钟 958 字

LLM Benchmark 测速 wrk Qwen FastChat vLLM TeslaT4

2024年1月16日星期二

使用 FastChat 在 CUDA 上部署 LLM

pip install "fschat[model_worker,webui]"

如果不支持 bfloat16，则降至 float16

📌 模型变动需要重新部署聊天机器人

2024年1月16日 1 分钟 277 字

FastChat vLLM CUDA

2024年1月11日星期四

在 MacBook Pro M2 Max 上安装 FastChat

FastChat 是一个开放平台，用于训练、服务和评估基于大型语言模型的聊天机器人。

DeepSeek

mkdir deepseek-ai
ln -s /Users/junjian/HuggingFace/deepseek-ai/deepseek-llm-7b-chat deepseek-ai/deepseek-llm-7b-chat
ln -s /Users/junjian/HuggingFace/deepseek-ai/deepseek-coder-1.3b-instruct deepseek-ai/deepseek-coder-1.3b-instruct

ChatGLM

mkdir THUDM
ln -s /Users/junjian/HuggingFace/THUDM/chatglm3-6b THUDM/chatglm3-6b

deepseek-ai/deepseek-llm-7b-chat

python -m fastchat.serve.model_worker \
    --model-path deepseek-ai/deepseek-llm-7b-chat --port 21002 \
    --worker-address http://localhost:21002 \
    --device mps

2024年1月11日 1 分钟 129 字

FastChat Qwen DeepSeek ChatGLM OpenAI MacBookProM2Max

2024年1月9日星期二

基于 PyCharm 使用 Tabby 和 CodeGPT 插件搭建免费的 GitHub Copilot

启动服务 Controller

python -m fastchat.serve.controller

启动服务 Model Worker

python -m fastchat.serve.model_worker \
  --model-path THUDM/chatglm3-6b --port 21002 \
  --worker-address http://localhost:21002 \
  --model-names chatglm3-6b,gpt-3.5-turbo

启动服务 OpenAI API Server

python -m fastchat.serve.openai_api_server --port 8000

2024年1月9日 1 分钟 200 字

GitHubCopilot PyCharm Tabby CodeGPT FastChat OpenAI CodeLLM LLM

2023年12月28日星期四

Langchain‐Chatchat 和 FastChat 结合

THUDM/chatglm3-6b

fatal: fetch-pack: invalid index-pack output

Cloning into 'Langchain-Chatchat'...
remote: Enumerating objects: 8958, done.
remote: Counting objects: 100% (270/270), done.
remote: Compressing objects: 100% (168/168), done.
error: 6146 bytes of body are still expectediB | 367.00 KiB/s 
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

这个错误可能是由于网络问题或者 Git 服务器问题导致的。当 Git 在克隆仓库时，它需要从服务器下载一些数据。如果在这个过程中连接被中断，或者服务器发送的数据有问题，就可能会出现这个错误。

你可以尝试以下几种解决方法：

2023年12月28日 1 分钟 405 字

Langchain‐Chatchat FastChat OpenAI LLM

2023年12月25日星期一

Qwen (通义千问)

Qwen

命令行聊天

python cli_demo.py

Web 聊天

python web_demo.py

Model Worker

python -m fastchat.serve.model_worker \
    --model-path Qwen/Qwen-1_8B-Chat --port 21002 \
    --worker-address http://localhost:21002 \
    --device mps

OpenAI API Server

python -m fastchat.serve.openai_api_server --port 8000

Web Server

python -m fastchat.serve.gradio_web_server --host 0.0.0.0 --port 8001

使用 Web 聊天的时候出现乱码，感觉 ChatML 格式的问题。

2023年12月25日 1 分钟 100 字

QWen FastChat MacBookProM2Max

2023年10月24日星期二

* [Chatbot Arena](https://chat.lmsys.org/) * [FastChat](https://github.com/lm-sys/FastChat) * [LMSYS BLOG](https://lmsys.org/blog/) * [Use AutoGen for Local LLMs](https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/)

这种方式安装比较容易调试，适合开发者。

克隆代码

git clone https://github.com/lm-sys/FastChat.git
cd FastChat

创建环境

python -m venv env
source env/bin/activate

安装

pip install --upgrade pip  # enable PEP 660 support
pip install -e ".[model_worker,webui]"
pip install -U transformers==4.33.0 # AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'

2023年10月24日 2 分钟 654 字

FastChat LangChain Vicuna ChatGLM2-6B 分布式

7 篇文章带有标签 “FastChat”

2024年1月17日 星期三

LLM 的基准测试

2024年1月16日 星期二

使用 FastChat 在 CUDA 上部署 LLM

2024年1月11日 星期四

在 MacBook Pro M2 Max 上安装 FastChat

2024年1月9日 星期二