| Model |
Average |
ARC |
HellaSwag |
MMLU |
TruthfulQA |
Winogrande |
GSM8K |
| mistralai/Mistral-7B-Instruct-v0.2 |
65.71 |
63.14 |
84.88 |
60.78 |
68.26 |
77.19 |
40.03 |
| 01-ai/Yi-34B-Chat |
65.32 |
65.44 |
84.16 |
74.9 |
55.37 |
80.11 |
31.92 |
| Qwen/Qwen1.5-14B-Chat |
62.37 |
58.79 |
82.33 |
68.52 |
60.38 |
73.32 |
30.86 |
| 01-ai/Yi-6B-200K |
56.76 |
53.75 |
75.57 |
64.65 |
41.56 |
73.64 |
31.39 |
| Qwen/Qwen1.5-7B-Chat |
55.15 |
55.89 |
78.56 |
61.65 |
53.54 |
67.72 |
13.57 |
| 01-ai/Yi-6B |
54.08 |
55.55 |
76.57 |
64.11 |
41.96 |
74.19 |
12.13 |
| deepseek-ai/deepseek-llm-7b-chat |
59.38 |
55.8 |
79.38 |
51.75 |
47.98 |
74.82 |
46.55 |
| internlm/internlm-20b-chat |
55.53 |
55.38 |
78.58 |
58.53 |
43.22 |
78.77 |
18.73 |
| deepseek-ai/deepseek-coder-7b-instruct-v1.5 |
50.89 |
48.55 |
72.35 |
50.45 |
46.73 |
66.85 |
20.39 |
·
4 分钟 ·
1,136 字