在华为 Atlas 800I A2 服务器上搭建大模型推理服务
类别: Atlas800 MindIE 标签: Atlas800 NPU 910B4 MindIE LLM目录
华为昇腾 NPU 与英伟达 GPU 生态层级对比:
NPU | GPU |
---|---|
CANN | CUDA |
MindSpore | PyTorch |
MindFormer | Transformers |
MindIE | vLLM |
下载大模型
cd /home/luruan/disk1/models
大型语言模型
- Qwen1.5-7B
git clone https://www.modelscope.cn/Qwen/Qwen1.5-7B-Chat.git
- Qwen2-7B ❌
git clone https://www.modelscope.cn/Qwen/Qwen2-7B-Instruct.git
- Qwen2-72B
git clone https://www.modelscope.cn/Qwen/Qwen2-72B-Instruct.git
代码大模型
- DeepSeek-Coder-6.7B
git clone https://www.modelscope.cn/deepseek-ai/deepseek-coder-6.7b-instruct.git
- StarCoder2-15B ❌
git clone https://www.modelscope.cn/AI-ModelScope/starcoder2-15b.git
- CodeGeeX2-6B ❌
git clone https://www.modelscope.cn/ZhipuAI/codegeex2-6b.git
缺少软件包 sentencepiece
。
模型格式转换(bin → safetensor)
因为 MindIE 不支持 bin
格式的模型,需要将模型转换为 safetensor
格式,使用下面的命令进行转换:
cd /home/disk1/models/deepseek-coder-6.7b-instruct/
python /home/luruan/cann8.0.rc2/examples/convert/convert_weights.py \
--model_path /home/disk1/models/deepseek-coder-6.7b-instruct/
修改配置文件
修改配置文件:Qwen2-7B-Instruct/config.json
{
"torch_dtype": "float16",
}
NPU
不支持 bfloat16
,有的模型配置文件需要修改为 float16
。
npu-smi
NPU 不支持卡共享模式,同一个卡不能挂载到多个容器上
sudo npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc2 Version: 24.1.rc2 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 0 910B4 | OK | 99.1 43 0 / 0 |
| 0 | 0000:C1:00.0 | 0 0 / 0 30209/ 32768 |
+===========================+===============+====================================================+
| 1 910B4 | OK | 92.1 41 0 / 0 |
| 0 | 0000:C2:00.0 | 0 0 / 0 30208/ 32768 |
+===========================+===============+====================================================+
| 2 910B4 | OK | 90.5 41 0 / 0 |
| 0 | 0000:81:00.0 | 0 0 / 0 30206/ 32768 |
+===========================+===============+====================================================+
| 3 910B4 | OK | 86.4 43 0 / 0 |
| 0 | 0000:82:00.0 | 0 0 / 0 30207/ 32768 |
+===========================+===============+====================================================+
| 4 910B4 | OK | 87.4 45 0 / 0 |
| 0 | 0000:01:00.0 | 0 0 / 0 30207/ 32768 |
+===========================+===============+====================================================+
| 5 910B4 | OK | 90.4 46 0 / 0 |
| 0 | 0000:02:00.0 | 0 0 / 0 30206/ 32768 |
+===========================+===============+====================================================+
| 6 910B4 | OK | 88.0 45 0 / 0 |
| 0 | 0000:41:00.0 | 0 0 / 0 30207/ 32768 |
+===========================+===============+====================================================+
| 7 910B4 | OK | 92.0 47 0 / 0 |
| 0 | 0000:42:00.0 | 0 0 / 0 30207/ 32768 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| 0 0 | 518867 | mindieservice_b | 27474 |
+===========================+===============+====================================================+
| 1 0 | 518868 | mindieservice_b | 27474 |
+===========================+===============+====================================================+
| 2 0 | 518869 | mindieservice_b | 27474 |
+===========================+===============+====================================================+
| 3 0 | 518870 | mindieservice_b | 27474 |
+===========================+===============+====================================================+
| 4 0 | 518871 | mindieservice_b | 27474 |
+===========================+===============+====================================================+
| 5 0 | 518872 | mindieservice_b | 27474 |
+===========================+===============+====================================================+
| 6 0 | 518873 | mindieservice_b | 27474 |
+===========================+===============+====================================================+
| 7 0 | 518874 | mindieservice_b | 27474 |
+===========================+===============+====================================================+
MindIE 推理服务
下载镜像
wget https://modelers.cn/coderepo/web/v1/file/xieyuxiang/mindie_1.0.RC2_image/main/media/mindie-service_1.0.RC2.tar.gz
加载镜像
sudo docker load -i mindie-service_1.0.RC2.tar.gz
部署 LLM 推理服务(MindIE)
sudo docker run --net=host --ipc=host -it \
--name MindIE2 \
--device=/dev/davinci_manager \
--device=/dev/hisi_hdc \
--device=/dev/devmm_svm \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
-v /usr/local/sbin:/usr/local/sbin:ro \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /home/luruan:/home/luruan \
-v /disk1/:/home/disk1 \
mindie-service:1.0.RC2 /bin/bash
进行 mindie 执行目录
cd /usr/local/Ascend/mindie/latest/mindie-service/bin
配置文件:../conf/config.json
{
"OtherParam" :
{
"ResourceParam" :
{
"cacheBlockSize" : 128
},
"LogParam" :
{
"logLevel" : "Info",
"logPath" : "logs/mindservice.log"
},
"ServeParam" :
{
"ipAddress" : "127.0.0.1",
"managementIpAddress" : "127.0.0.2",
"port" : 1025,
"managementPort" : 1026,
"maxLinkNum" : 1000,
"httpsEnabled" : false,
"tlsCaPath" : "security/ca/",
"tlsCaFile" : ["ca.pem"],
"tlsCert" : "security/certs/server.pem",
"tlsPk" : "security/keys/server.key.pem",
"tlsPkPwd" : "security/pass/mindie_server_key_pwd.txt",
"tlsCrl" : "security/certs/server_crl.pem",
"managementTlsCaFile" : ["management_ca.pem"],
"managementTlsCert" : "security/certs/management_server.pem",
"managementTlsPk" : "security/keys/management_server.key.pem",
"managementTlsPkPwd" : "security/pass/management_mindie_server_key_pwd.txt",
"managementTlsCrl" : "security/certs/management_server_crl.pem",
"kmcKsfMaster" : "tools/pmt/master/ksfa",
"kmcKsfStandby" : "tools/pmt/standby/ksfb",
"multiNodesInferPort" : 1120,
"interNodeTLSEnabled" : true,
"interNodeTlsCaFile" : "security/ca/ca.pem",
"interNodeTlsCert" : "security/certs/server.pem",
"interNodeTlsPk" : "security/keys/server.key.pem",
"interNodeTlsPkPwd" : "security/pass/mindie_server_key_pwd.txt",
"interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
"interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb"
}
},
"WorkFlowParam" :
{
"TemplateParam" :
{
"templateType" : "Standard",
"templateName" : "Standard_llama"
}
},
"ModelDeployParam" :
{
"engineName" : "mindieservice_llm_engine",
"modelInstanceNumber" : 1,
"tokenizerProcessNumber" : 8,
"maxSeqLen" : 8192,
"npuDeviceIds" : [[0,1,2,3,4,5,6,7]],
"multiNodesInferEnabled" : false,
"ModelParam" : [
{
"modelInstanceType" : "Standard",
"modelName" : "qwen",
"modelWeightPath" : "/home/luruan/disk1/models/Qwen1.5-7B-Chat",
"worldSize" : 8,
"cpuMemSize" : 5,
"npuMemSize" : 8,
"backendType" : "atb",
"pluginParams" : ""
}
]
},
"ScheduleParam" :
{
"maxPrefillBatchSize" : 50,
"maxPrefillTokens" : 8192,
"prefillTimeMsPerReq" : 150,
"prefillPolicyType" : 0,
"decodeTimeMsPerReq" : 50,
"decodePolicyType" : 0,
"maxBatchSize" : 200,
"maxIterTimes" : 8000,
"maxPreemptCount" : 0,
"supportSelectBatch" : false,
"maxQueueDelayMicroseconds" : 5000
}
}
- ModelDeployParam
- maxSeqLen: 最大序列长度,输入的长度+输出的长度<=maxSeqLen
- npuMemSize: NPU中可以用来申请kv cache的size上限
- tokenizerProcessNumber: 并行的 NPU 数
- modelName: 模型名字
- modelWeightPath: 模型的路径
- backendType: 推理大模型必须是 atb
- ScheduleParam
- maxPrefillTokens: 输入 Tokens 数
- maxBatchSize: 最大 decode batch size
- maxIterTimes: 最大迭代次数,即一句话最大可生成长度
maxSeqLen = maxPrefillTokens(输入 Tokens) + maxIterTimes(输出 Tokens)
运行模型推理服务:
./mindieservice_daemon
部署 LLM 推理服务(vLLM)
vLLM
测试
curl
curl 'http://localhost:1025/v1/chat/completions' \
-H "Content-Type: application/json" \
-d '{
"model": "qwen",
"messages": [
{ "role": "system", "content": "你是位人工智能专家。" },
{ "role": "user", "content": "解释人工智能" }
]
}'
MindIE 没有对应的 /v1/completions
参考资料
- GPU Performance (Data Sheets) Quick Reference (2023)
- GPU 进阶笔记(一):高性能 GPU 服务器硬件拓扑与集群组网(2023
- GPU 进阶笔记(二):华为昇腾 910B GPU 相关(2023)
- GPU 进阶笔记(三):华为 NPU/GPU 演进(2024)
- GPU 进阶笔记(四):NVIDIA GH200 芯片、服务器及集群组网(2024)
- Ascend NPU 架构 & CANN 平台入门学习
- 大模型国产化适配1-华为昇腾AI全栈软硬件平台总结
- 大模型国产化适配8-基于昇腾MindIE推理工具部署Qwen-72B实战(推理引擎、推理服务化)
- 华为 MindIE 初体验
- 如何在macOS系统安装根证书