---
layout: single
title:  "whisper.cpp 实战指南（Jetson Thor 平台）"
date:   2025-10-19 10:00:00 +0800
categories: [硬件加速, AI 与大模型]
tags: [JetsonThor, Jetson, Thor, whisper, whisper.cpp, TTS, NVIDIA]
---

<!-- more -->

## 编译 [whisper.cpp](https://github.com/ggml-org/whisper.cpp)

### 克隆仓库

```bash
git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp
```

### 编译 whisper.cpp

```bash
cmake -B build -DGGML_CUDA=1 -DCMAKE_CUDA_ARCHITECTURES="110"
cmake --build build -j --config Release
```

### 下载模型

```bash
sh ./models/download-ggml-model.sh small
sh ./models/download-ggml-model.sh large-v3-turbo
```

- tiny.en
- tiny
- base.en
- base
- small.en
- small
- medium.en
- medium
- large-v1
- large-v2
- large-v3
- large-v3-turbo

### 运行 whisper-cli

```bash
./build/bin/whisper-cli -f samples/jfk.wav
./build/bin/whisper-cli -m /models/whisper.cpp/models/ggml-large-v3-turbo.bin -f samples/jfk.wav
```


## whisper-server
- [whisper.cpp/examples/server](https://github.com/ggml-org/whisper.cpp/tree/master/examples/server)

### 运行 whisper-server

```bash
./build/bin/whisper-server \
    --model /models/whisper.cpp/models/ggml-large-v3-turbo.bin \
    --host 0.0.0.0 \
    --port 8080 \
    --flash-attn \
    --language auto
```

### 测试

```bash
curl 127.0.0.1:8080/inference \
    -H "Content-Type: multipart/form-data" \
    -F file="@samples/jfk.wav" \
    -F temperature="0.0" \
    -F temperature_inc="0.2" \
    -F response_format="json"
```

```json
{
  "text": " And so, my fellow Americans, ask not what your country can do for you, ask what you can do for your country.\n"
}
```


```bash
curl 127.0.0.1:8080/inference \
    -H "Content-Type: multipart/form-data" \
    -F file="@/models/iic/SenseVoiceSmall/example/zh.mp3" \
    -F response_format="json"
```

```json
{"text":"開放時間早上9點至下午5點。\n"}
```

> ⚠️ 自动识别语言时，识别为繁体中文，需要手动指定简体中文。

### 转录简体中文

运行 whisper-server 时，使用 `--language zh` 选项来指定简体中文。

```bash
./build/bin/whisper-server \
    --model /models/whisper.cpp/models/ggml-large-v3-turbo.bin \
    --host 0.0.0.0 \
    --port 8080 \
    --flash-attn \
    --language zh
```

```bash
curl 127.0.0.1:8080/inference \
    -H "Content-Type: multipart/form-data" \
    -F file="@/models/iic/SenseVoiceSmall/example/zh.mp3" \
    -F response_format="json"
```

```json
{"text":"开放时间早上九点至下午五点\n"}
```


## 参考资料
- [whisper.cpp/examples/bench](https://github.com/ggml-org/whisper.cpp/tree/master/examples/bench)
- [iic/Whisper-large-v3-turbo](https://www.modelscope.cn/models/iic/Whisper-large-v3-turbo)
- [openai-mirror/whisper-small](https://www.modelscope.cn/models/openai-mirror/whisper-small/summary)
