LLM 的基准测试
安装 FastChat & vLLM
安装 FastChat
FlashAttention-2 currently supports:
- Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is coming soon, please use FlashAttention 1.x for Turing GPUs for now.
- Datatype fp16 and bf16 (bf16 requires Ampere, Ada, or Hopper GPUs).
- All head dimensions up to 256. Head dim > 192 backward requires A100/A800 or H100/H800.
Turing GPU T4 不支持,需要使用 FlashAttention 1.x,否则会报错 ❌:


