返回首页

大模型推理需要多少显存?

计算加载模型需要的显存

模型参数(B)参数使用的位数(bits)加载需要显存(G)
0.5161
1.5163
71614
91618
221644
7216144

计算支持不同长度的上下文需要的显存

模型参数(B)加载显存(G)上下文长度(Token)上下文需要显存(G)总计需要显存(G)
1.5340003.616.61
80007.2110.21
1600014.4317.43
3200028.8631.86
6400057.7160.71
128000115.42118.42
71440003.6117.61
80007.2121.21
1600014.4328.43
3200028.8642.86
6400057.7171.71
128000115.42129.42
91840003.6121.61
80007.2125.21
1600014.4332.43
3200028.8646.86
6400057.7175.71
128000115.42133.42
224440003.6147.61
80007.2151.21
1600014.4358.43
3200028.8672.86
6400057.71101.71
128000115.42159.42
72134.7440009.82144.56
800019.64154.38
1600039.28174.02
3200078.55213.29
64000157.11291.85
128000314.22448.96

Qwen2 效率评估数据

模型参数(B)卡数上下文长度(Token)显存使用(G)上下文长度差值(Token)显存使用差值(G)每 Token 使用显存(M)
0.5111.17
161446.4261435.250.88
11433613.481433512.310.88
13072027.613071926.440.88
1.5113.44
161448.7461435.30.88
11433615.921433512.480.89
13072030.313071926.870.90
71114.92
1614420.2661435.340.89
11433627.711433512.790.91
13072042.623071927.70.92
7221134.74
26144144.3861439.641.61
314336169.931433535.192.51
330720209.033071974.292.48

参考资料

🤖

智能问答助手

⏳ 初始化...

💡 配置和聊天记录仅保存在本地浏览器中