安装
git clone https://github.com/wg/wrk.git
cd wrk
#使用多线程(机器的处理器核数)加速编译,
make -j $(nproc)
cp wrk /usr/local/bin/
Tabby Server
服务器:NVIDIA T4 16GB X 4
部署
- 模型:TabbyML/DeepseekCoder-6.7B
docker run -d --gpus all -p 8080:8080 \
-v /data/zhw/tabby/data:/data \
tabbyml/tabby:latest \
serve --model TabbyML/DeepseekCoder-6.7B \
--device cuda --parallelism 4
- 模型:TabbyML/DeepseekCoder-1.3B
docker run -d --gpus all -p 8080:8080 \
-v /data/zhw/tabby/data:/data \
tabbyml/tabby:latest \
serve --model TabbyML/DeepseekCoder-1.3B \
--device cuda --parallelism 12
curl 测试
curl http://127.0.0.1:8080/v1/completions -H "Content-Type: application/json" -d '{
"language": "python",
"segments": {
"prefix": "#实现一个快速排序\n def "
}
}'|jq
{
"id": "cmpl-6ef400f4-86da-43cc-b27a-eeae9394c316",
"choices": [
{
"index": 0,
"text": "quick_sort(arr):\n if len(arr) <= 1:\n return arr\n pivot = arr[0]\n left = [x for x in arr[1:] if x <= pivot]\n right = [x for x in arr[1:] if x > pivot]\n return quick_sort(left) + [pivot] + quick_sort(right)"
}
]
}
输入 12 个 Tokens
#实现一个快速排序
def
输出 74 个 Tokens
quick_sort(arry):
if len(arr) <= 1:
return arr
pivot = arr[0]
left = [x for x in arr[1:] if x <= pivot]
right = [x for x in arr[1:] if x > pivot]
return quick_sort(left) + [pivot] + quick_sort(right)
准备
编辑测试脚本 post_json.lua
wrk.method = "POST"
wrk.body = "{\"language\": \"python\", \"segments\": {\"prefix\": \"#Implement a quick sort\\n def \"}}"
wrk.headers["Content-Type"] = "application/json"
\\n 不能写为 \n,否则会报错:Failed to parse the request body as JSON: segments.prefix: control character (\u0000-\u001F) found while parsing a string at line 2 column 0
监控 8080 端口
sudo tcpdump -i any -A 'tcp port 8080 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' -w -
基准测试
基准测试工具:wrk
持续时间:1 分钟
超时时间:10 秒
总结
TabbyML/DeepseekCoder-6.7B(并行 4)
|
并发连接数 |
线程数 |
平均延迟 |
最大延迟 |
完成请求数 |
超时请求数 |
平均每秒请求数 |
|
1 |
1 |
3.75s |
3.81s |
16 |
0 |
0.27 |
|
2 |
2 |
5.00s |
5.11s |
24 |
0 |
0.40 |
|
3 |
3 |
5.80s |
5.99s |
30 |
0 |
0.50 |
| 👍 |
4 |
4 |
5.43s |
5.62s |
43 |
0 |
0.72 |
|
5 |
5 |
6.12s |
6.30s |
40 |
9 |
0.67 |
|
6 |
6 |
7.42s |
9.14s |
41 |
9 |
0.68 |
|
8 |
8 |
7.29s |
9.85s |
40 |
34 |
0.67 |
TabbyML/DeepseekCoder-1.3B(并行 12)
|
并发连接数 |
线程数 |
平均延迟 |
最大延迟 |
完成请求数 |
超时请求数 |
平均每秒请求数 |
|
1 |
1 |
1.03s |
1.11s |
57 |
0 |
0.95 |
|
4 |
4 |
1.46s |
1.79s |
161 |
0 |
2.68 |
|
8 |
8 |
1.95s |
2.18s |
241 |
0 |
4.02 |
| 👍 |
12 |
12 |
2.79s |
3.06s |
251 |
0 |
4.18 |
|
16 |
16 |
4.19s |
6.40s |
221 |
0 |
3.68 |
|
20 |
20 |
6.43s |
8.16s |
177 |
0 |
2.95 |
|
24 |
24 |
9.01s |
10.00s |
143 |
42 |
2.38 |
测试数据
TabbyML/DeepseekCoder-6.7B
- 1 个并发连接,1 个线程,持续 1 分钟,超时时间 10 秒
wrk -c1 -t1 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.75s 29.55ms 3.81s 68.75%
Req/Sec 0.00 0.00 0.00 100.00%
Latency Distribution
50% 3.75s
75% 3.78s
90% 3.79s
99% 3.81s
16 requests in 1.00m, 9.05KB read
Requests/sec: 0.27
Transfer/sec: 154.21B
- 2 个并发连接,2 个线程,持续 1 分钟,超时时间 10 秒
wrk -c2 -t2 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
2 threads and 2 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.00s 43.20ms 5.11s 87.50%
Req/Sec 0.00 0.00 0.00 100.00%
Latency Distribution
50% 5.00s
75% 5.01s
90% 5.01s
99% 5.11s
24 requests in 1.00m, 13.57KB read
Requests/sec: 0.40
Transfer/sec: 231.57B
- 3 个并发连接,3 个线程,持续 1 分钟,超时时间 10 秒
wrk -c3 -t3 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
3 threads and 3 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.80s 91.00ms 5.99s 73.33%
Req/Sec 0.00 0.00 0.00 100.00%
Latency Distribution
50% 5.81s
75% 5.84s
90% 5.88s
99% 5.99s
30 requests in 1.00m, 16.96KB read
Requests/sec: 0.50
Transfer/sec: 289.48B
- 4 个并发连接,4 个线程,持续 1 分钟,超时时间 10 秒
wrk -c4 -t4 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
4 threads and 4 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.43s 143.39ms 5.62s 72.09%
Req/Sec 0.00 0.00 0.00 100.00%
Latency Distribution
50% 5.46s
75% 5.52s
90% 5.60s
99% 5.62s
43 requests in 1.00m, 24.31KB read
Requests/sec: 0.72
Transfer/sec: 414.28B
- 5 个并发连接,5 个线程,持续 1 分钟,超时时间 10 秒
wrk -c5 -t5 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
5 threads and 5 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 6.12s 98.11ms 6.30s 64.52%
Req/Sec 0.00 0.00 0.00 100.00%
Latency Distribution
50% 6.13s
75% 6.17s
90% 6.25s
99% 6.30s
40 requests in 1.00m, 22.62KB read
Socket errors: connect 0, read 0, write 0, timeout 9
Requests/sec: 0.67
Transfer/sec: 385.36B
- 6 个并发连接,6 个线程,持续 1 分钟,超时时间 10 秒
wrk -c6 -t6 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
6 threads and 6 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 7.42s 1.62s 9.14s 71.88%
Req/Sec 0.00 0.00 0.00 100.00%
Latency Distribution
50% 7.83s
75% 9.01s
90% 9.12s
99% 9.14s
41 requests in 1.00m, 22.94KB read
Socket errors: connect 0, read 0, write 0, timeout 9
Requests/sec: 0.68
Transfer/sec: 390.83B
- 8 个并发连接,8 个线程,持续 1 分钟,超时时间 10 秒
wrk -c8 -t8 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
8 threads and 8 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 7.29s 1.89s 9.85s 66.67%
Req/Sec 0.00 0.00 0.00 100.00%
Latency Distribution
50% 6.29s
75% 9.58s
90% 9.85s
99% 9.85s
40 requests in 1.00m, 22.47KB read
Socket errors: connect 0, read 0, write 0, timeout 34
Requests/sec: 0.67
Transfer/sec: 382.82B
TabbyML/DeepseekCoder-1.3B
- 1 个并发连接,1 个线程,持续 1 分钟,超时时间 10 秒
wrk -c1 -t1 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
1 threads and 1 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.03s 12.40ms 1.11s 89.47%
Req/Sec 0.04 0.19 1.00 96.49%
Latency Distribution
50% 1.03s
75% 1.04s
90% 1.04s
99% 1.11s
57 requests in 1.00m, 32.40KB read
Requests/sec: 0.95
Transfer/sec: 552.85B
- 4 个并发连接,4 个线程,持续 1 分钟,超时时间 10 秒
wrk -c4 -t4 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
4 threads and 4 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.46s 93.34ms 1.79s 85.09%
Req/Sec 0.00 0.00 0.00 100.00%
Latency Distribution
50% 1.40s
75% 1.55s
90% 1.56s
99% 1.79s
161 requests in 1.00m, 92.53KB read
Requests/sec: 2.68
Transfer/sec: 1.54KB
- 8 个并发连接,8 个线程,持续 1 分钟,超时时间 10 秒
wrk -c8 -t8 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
8 threads and 8 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.95s 115.96ms 2.18s 59.34%
Req/Sec 0.00 0.00 0.00 100.00%
Latency Distribution
50% 1.98s
75% 2.05s
90% 2.09s
99% 2.16s
241 requests in 1.00m, 138.75KB read
Requests/sec: 4.02
Transfer/sec: 2.31KB
- 12 个并发连接,12 个线程,持续 1 分钟,超时时间 10 秒
wrk -c12 -t12 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
12 threads and 12 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.79s 173.15ms 3.06s 58.57%
Req/Sec 0.00 0.00 0.00 100.00%
Latency Distribution
50% 2.78s
75% 2.95s
90% 3.01s
99% 3.05s
251 requests in 1.00m, 144.62KB read
Requests/sec: 4.18
Transfer/sec: 2.41KB
- 16 个并发连接,16 个线程,持续 1 分钟,超时时间 10 秒
wrk -c16 -t16 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
16 threads and 16 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.19s 874.57ms 6.40s 61.99%
Req/Sec 0.00 0.00 0.00 100.00%
Latency Distribution
50% 3.82s
75% 5.06s
90% 5.49s
99% 6.04s
221 requests in 1.00m, 127.01KB read
Requests/sec: 3.68
Transfer/sec: 2.11KB
- 20 个并发连接,20 个线程,持续 1 分钟,超时时间 10 秒
wrk -c20 -t20 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
20 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 6.43s 1.25s 8.16s 61.58%
Req/Sec 0.00 0.00 0.00 100.00%
Latency Distribution
50% 6.98s
75% 7.38s
90% 7.72s
99% 8.09s
177 requests in 1.00m, 102.04KB read
Requests/sec: 2.95
Transfer/sec: 1.70KB
- 24 个并发连接,24 个线程,持续 1 分钟,超时时间 10 秒
wrk -c24 -t24 -d1m --timeout 10s --latency -s post_json.lua http://127.0.0.1:8080/v1/completions
Running 1m test @ http://127.0.0.1:8080/v1/completions
24 threads and 24 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 9.01s 1.39s 10.00s 88.12%
Req/Sec 0.00 0.00 0.00 100.00%
Latency Distribution
50% 9.52s
75% 9.74s
90% 9.82s
99% 10.00s
143 requests in 1.00m, 82.42KB read
Socket errors: connect 0, read 0, write 0, timeout 42
Requests/sec: 2.38
Transfer/sec: 1.37KB