SeamlessM4T — Massively Multilingual & Multimodal Machine Translation(大规模多语言和多模式机器翻译)
类别: SeamlessM4T 标签: SeamlessM4T ASR TTS conda MacBookProM2Max目录
- SeamlessM4T 概述
- 安装 Seamless Communication
- 准备数据
- Languages List
- SeamlessM4T 命令行(m4t_predict)
- SeamlessM4T Web UI
- 参考资料
- ASR: Automatic speech recognition for 96 languages.
- S2ST: Speech-to-Speech translation from 100 source speech languages into 35 target speech languages.
- S2TT: Speech-to-text translation from 100 source speech languages into 95 target text languages.
- T2ST: Text-to-Speech translation from 95 source text languages into 35 target speech languages.
- T2TT: Text-to-text translation (MT) from 95 source text languages into 95 target text languages.
SeamlessM4T 概述
安装 Seamless Communication
克隆仓库
git clone https://github.com/facebookresearch/seamless_communication
cd seamless_communication
创建虚拟环境
conda create -n seamless-m4t python==3.10.9 -y
conda activate seamless-m4t
增加 MPS 的支持
经过测试,使用 MPS
在 S2ST
, S2TT
, ASR
这三个任务都有问题,输入是语音
就有问题。
cli/m4t/predict/predict.py
if torch.cuda.is_available():
device = torch.device("cuda:0")
dtype = torch.float16
elif torch.backends.mps.is_available():
device = torch.device("mps")
dtype = torch.float32
else:
device = torch.device("cpu")
dtype = torch.float32
设置环境变量:PYTORCH_ENABLE_MPS_FALLBACK=1
解决 MPS 没有实现的操作 ❌
NotImplementedError: The operator 'aten::_weight_norm_interface' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
设置环境变量
conda env config vars set PYTORCH_ENABLE_MPS_FALLBACK=1
查看环境变量
conda env config vars list
安装 Seamless Communication
pip install .
conda install -c conda-forge libsndfile==1.0.31 -y
mkdir -p /opt/homebrew/opt/libsndfile/lib/
ln -s /opt/miniconda/envs/seamless-m4t/lib/libsndfile.1.0.31.dylib /opt/homebrew/opt/libsndfile/lib/libsndfile.1.dylib
准备数据
CHINESE_TEXT="荷兰发布了一份主题为“宣布即将对先进半导体制造设备采取的出口管制措施”的公告表示,鉴于技术的发展和地缘政治的背景,政府已经得出结论,有必要扩大现有的特定半导体制造设备的出口管制。"
ENGLISH_TEXT="The Netherlands issued an announcement titled \"Announcement of Upcoming Export Control Measures on Advanced Semiconductor Manufacturing Equipment\" stating that given the development of technology and the geopolitical context, the government has concluded that it is necessary to expand existing specific semiconductor manufacturing Export controls on equipment."
- 中文语音文件:
chinese.wav
- 英文语音文件:
english.wav
Languages List
Source Languages(S2ST / S2TT)
Code | Language Name | Code | Language Name | Code | Language Name | Code | Language Name | Code | Language Name | Code | Language Name |
---|---|---|---|---|---|---|---|---|---|---|---|
afr | Afrikaans | cym | Welsh | hye | Armenian | lit | Lithuanian | oci | Occitan | swh | Swahili |
amh | Amharic | dan | Danish | ibo | Igbo | ltz | Luxembourgish | ory | Odia | tam | Tamil |
arb | Modern Standard Arabic | deu | German | ind | Indonesian | lug | Ganda | pan | Punjabi | tel | Telugu |
ary | Moroccan Arabic | ell | Greek | isl | Icelandic | luo | Luo | pbt | Southern Pashto | tgk | Tajik |
arz | Egyptian Arabic | eng | English | ita | Italian | lvs | Standard Latvian | pes | Western Persian | tgl | Tagalog |
asm | Assamese | est | Estonian | jav | Javanese | mai | Maithili | pol | Polish | tha | Thai |
ast | Asturian | eus | Basque | jpn | Japanese | mal | Malayalam | por | Portuguese | tur | Turkish |
azj | North Azerbaijani | fin | Finnish | kam | Kamba | mar | Marathi | ron | Romanian | ukr | Ukrainian |
bel | Belarusian | fra | French | kan | Kannada | mkd | Macedonian | rus | Russian | urd | Urdu |
ben | Bengali | gaz | West Central Oromo | kat | Georgian | mlt | Maltese | slk | Slovak | uzn | Northern Uzbek |
bos | Bosnian | gle | Irish | kaz | Kazakh | mni | Meitei | slv | Slovenian | vie | Vietnamese |
bul | Bulgarian | glg | Galician | kea | Kabuverdianu | mya | Burmese | sna | Shona | xho | Xhosa |
cat | Catalan | guj | Gujarati | khk | Halh Mongolian | nld | Dutch | snd | Sindhi | yor | Yoruba |
ceb | Cebuano | heb | Hebrew | khm | Khmer | nno | Norwegian Nynorsk | som | Somali | yue | Cantonese |
ces | Czech | hin | Hindi | kir | Kyrgyz | nob | Norwegian Bokmål | spa | Spanish | zlm | Colloquial Malay |
ckb | Central Kurdish | hrv | Croatian | kor | Korean | npi | Nepali | srp | Serbian | zsm | Standard Malay |
cmn | Mandarin Chinese | hun | Hungarian | lao | Lao | nya | Nyanja | swe | Swedish | zul | Zulu |
Source Languages(T2TT / T2ST)
Code | Language Name | Code | Language Name | Code | Language Name | Code | Language Name | Code | Language Name | Code | Language Name |
---|---|---|---|---|---|---|---|---|---|---|---|
afr | Afrikaans | cym | Welsh | hye | Armenian | lit | Lithuanian | swh | Swahili | ||
amh | Amharic | dan | Danish | ibo | Igbo | ory | Odia | tam | Tamil | ||
arb | Modern Standard Arabic | deu | German | ind | Indonesian | lug | Ganda | pan | Punjabi | tel | Telugu |
ary | Moroccan Arabic | ell | Greek | isl | Icelandic | luo | Luo | pbt | Southern Pashto | tgk | Tajik |
arz | Egyptian Arabic | eng | English | ita | Italian | lvs | Standard Latvian | pes | Western Persian | tgl | Tagalog |
asm | Assamese | est | Estonian | jav | Javanese | mai | Maithili | pol | Polish | tha | Thai |
eus | Basque | jpn | Japanese | mal | Malayalam | por | Portuguese | tur | Turkish | ||
azj | North Azerbaijani | fin | Finnish | mar | Marathi | ron | Romanian | ukr | Ukrainian | ||
bel | Belarusian | fra | French | kan | Kannada | mkd | Macedonian | rus | Russian | urd | Urdu |
ben | Bengali | gaz | West Central Oromo | kat | Georgian | mlt | Maltese | slk | Slovak | uzn | Northern Uzbek |
bos | Bosnian | gle | Irish | kaz | Kazakh | mni | Meitei | slv | Slovenian | vie | Vietnamese |
bul | Bulgarian | glg | Galician | mya | Burmese | sna | Shona | ||||
cat | Catalan | guj | Gujarati | khk | Halh Mongolian | nld | Dutch | snd | Sindhi | yor | Yoruba |
ceb | Cebuano | heb | Hebrew | khm | Khmer | nno | Norwegian Nynorsk | som | Somali | yue | Cantonese |
ces | Czech | hin | Hindi | kir | Kyrgyz | nob | Norwegian Bokmål | spa | Spanish | ||
ckb | Central Kurdish | hrv | Croatian | kor | Korean | npi | Nepali | srp | Serbian | zsm | Standard Malay |
cmn | Mandarin Chinese | hun | Hungarian | lao | Lao | nya | Nyanja | swe | Swedish | zul | Zulu |
Target Languages(S2ST / T2ST)
Code | Language Name | 中文名 | Code | Language Name | 中文名 | Code | Language Name | 中文名 |
---|---|---|---|---|---|---|---|---|
eng | English | 英语 | hin | Hindi | 印地语 | slk | Slovak | 斯洛伐克语 |
arb | Modern Standard Arabic | 现代标准阿拉伯语 | ind | Indonesian | 印度尼西亚语 | spa | Spanish | 西班牙语 |
ben | Bengali | 孟加拉语 | ita | Italian | 意大利语 | swe | Swedish | 瑞典语 |
cat | Catalan | 加泰罗尼亚语 | jpn | Japanese | 日语 | swh | Swahili | 斯瓦希里语 |
ces | Czech | 捷克语 | kor | Korean | 韩语 | tel | Telugu | 泰卢固语 |
cmn | Mandarin Chinese | 普通话 | mlt | Maltese | tgl | Tagalog | 他加禄语 | |
cym | Welsh | 威尔士语 | nld | Dutch | 荷兰语 | tha | Thai | 泰语 |
dan | Danish | 丹麦语 | pes | Western Persian | 波斯语 | tur | Turkish | 土耳其语 |
deu | German | 德语 | pol | Polish | 波兰语 | ukr | Ukrainian | 乌克兰语 |
est | Estonian | 爱沙尼亚语 | por | Portuguese | 葡萄牙语 | urd | Urdu | 乌尔都语 |
fin | Finnish | 芬兰语 | ron | Romanian | 罗马尼亚语 | uzn | Northern Uzbek | 北乌兹别克语 |
fra | French | 法语 | rus | Russian | 俄语 | vie | Vietnamese | 越南语 |
SeamlessM4T 命令行(m4t_predict)
m4t_predict -h
usage: m4t_predict [-h] [--task TASK] [--tgt_lang TGT_LANG] [--src_lang SRC_LANG] [--output_path OUTPUT_PATH] [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME] [--text_generation_beam_size TEXT_GENERATION_BEAM_SIZE]
[--text_generation_max_len_a TEXT_GENERATION_MAX_LEN_A] [--text_generation_max_len_b TEXT_GENERATION_MAX_LEN_B] [--text_generation_ngram_blocking TEXT_GENERATION_NGRAM_BLOCKING]
[--no_repeat_ngram_size NO_REPEAT_NGRAM_SIZE] [--unit_generation_beam_size UNIT_GENERATION_BEAM_SIZE] [--unit_generation_max_len_a UNIT_GENERATION_MAX_LEN_A]
[--unit_generation_max_len_b UNIT_GENERATION_MAX_LEN_B] [--unit_generation_ngram_blocking UNIT_GENERATION_NGRAM_BLOCKING] [--unit_generation_ngram_filtering UNIT_GENERATION_NGRAM_FILTERING]
[--text_unk_blocking TEXT_UNK_BLOCKING]
input
M4T inference on supported tasks using Translator.
positional arguments:
input Audio WAV file path or text input.
options:
-h, --help show this help message and exit
--task TASK Task type
--tgt_lang TGT_LANG Target language to translate/transcribe into.
--src_lang SRC_LANG Source language, only required if input is text.
--output_path OUTPUT_PATH
Path to save the generated audio.
--model_name MODEL_NAME
Base model name (`seamlessM4T_medium`, `seamlessM4T_large`, `seamlessM4T_v2_large`)
- –src_lang S2ST / S2TT / ASR 不需要指定,T2ST / T2TT 需要指定
S2ST: Speech-to-Speech translation
中文 → 英文
m4t_predict chinese.wav --task s2st --tgt_lang eng --output_path eng.wav
英文 → 中文
m4t_predict english.wav --task s2st --tgt_lang cmn --output_path cmn.wav
S2TT: Speech-to-text translation
中文 → 中文
m4t_predict chinese.wav --task s2tt --tgt_lang cmn
荷兰发布了一份主题为 宣布即将对先进半导体制造设备采取的出口管制措施 的公告 表示 鉴于技术的发展和地缘政治的背景 政府已经得出结论 有必要扩大现有的特定半导体制造设备的出口管制
中文 → 英文
m4t_predict chinese.wav --task s2tt --tgt_lang eng
The announcement, titled "Announcing Imminent Export Control Measures for Advanced Semiconductor Manufacturing Equipment", said that given the development of technology and geopolitics, the government has concluded that it is necessary to expand export controls on existing specific semiconductor manufacturing equipment.
英文 → 中文(❌)
m4t_predict english.wav --task s2tt --tgt_lang cmn
The announcement titled announcing export control measures for advanced semiconductor manufacturing equipment that given the development of technology and geopolitics, the government has concluded that it is necessary to expand export controls on specific semiconductor manufacturing equipment.
T2ST: Text-to-Speech translation
中文 → 中文
m4t_predict $CHINESE_TEXT --task t2st --src_lang cmn --tgt_lang cmn --output_path cmn.wav
荷兰发布了一份主题为 ⁇ 宣布即将对先进半导体制造设备采取的出口管制措施 ⁇ 的公告表示,鉴于技术的发展和地缘政治的背景,政府已经得出结论,有必要扩大现有的特定半导体制造设备的出口管制 ⁇
中文 → 英文
m4t_predict $CHINESE_TEXT --task t2st --src_lang cmn --tgt_lang eng --output_path eng.wav
The Netherlands issued an announcement titled "Announcing Imminent Export Control Measures for Advanced Semiconductor Manufacturing Equipment" stating that, given the technological development and geopolitical context, the government has concluded that it is necessary to expand existing export controls for certain semiconductor manufacturing equipment.
英文 → 中文
m4t_predict $ENGLISH_TEXT --task t2st --src_lang eng --tgt_lang cmn --output_path cmn.wav
荷兰发布了题为 ⁇ 关于先进半导体制造设备即将实施出口管制措施的公告 ⁇ ,该公告指出,鉴于技术发展和地缘政治背景,政府得出结论,有必要扩大现有的特定半导体制造设备出口管制 ⁇
T2TT: Text-to-text translation
中文 → 英文
m4t_predict $CHINESE_TEXT --task t2tt --src_lang cmn --tgt_lang eng
The Netherlands issued an announcement titled "Announcing Imminent Export Control Measures for Advanced Semiconductor Manufacturing Equipment" stating that, given the technological development and geopolitical context, the government has concluded that it is necessary to expand existing export controls for certain semiconductor manufacturing equipment.
中文 → 法语
m4t_predict $CHINESE_TEXT --task t2tt --src_lang cmn --tgt_lang fra
Les Pays-Bas ont publié un thème pour ⁇ annoncer les prochaines mesures de contrôle des exportations prises sur des équipements de fabrication de semi-conducteurs ⁇ l'annonce indique que, compte tenu du développement technologique et du contexte géopolitique, le gouvernement a conclu qu'il est nécessaire d'étendre les exportations existantes de certains équipements de fabrication de semi-conducteurs ⁇
英文 → 中文
m4t_predict $ENGLISH_TEXT --task t2tt --src_lang eng --tgt_lang cmn
荷兰发布了题为 ⁇ 关于先进半导体制造设备即将实施出口管制措施的公告 ⁇ ,该公告指出,鉴于技术发展和地缘政治背景,政府得出结论,有必要扩大现有的特定半导体制造设备出口管制 ⁇
ASR: Automatic speech recognition
中文 → 中文
m4t_predict chinese.wav --task asr --tgt_lang cmn
荷兰发布了一份主题为 宣布即将对先进半导体制造设备采取的出口管制措施 的公告 表示 鉴于技术的发展和地缘政治的背景 政府已经得出结论 有必要扩大现有的特定半导体制造设备的出口管制
中文 → 英文
m4t_predict chinese.wav --task asr --tgt_lang eng
The announcement, titled "Announcing Imminent Export Control Measures for Advanced Semiconductor Manufacturing Equipment", said that given the development of technology and geopolitics, the government has concluded that it is necessary to expand export controls on existing specific semiconductor manufacturing equipment.
英文 → 英文
m4t_predict english.wav --task asr --tgt_lang eng
The announcement, titled "Announcing Important Export Control Measures for Advanced Semiconductor Manufacturing Equipment", said that given the development of technology and geopolitics, the government has decided that it is necessary to expand export controls on specific semiconductor manufacturing equipment.
SeamlessM4T Web UI
安装依赖 gradio
pip install gradio
设置环境变量
conda env config vars set CHECKPOINTS_PATH=/Users/junjian/GitHub/facebookresearch/seamless_communication/seamless-m4t-v2-large
运行应用
python demo/m4tv2/app.py
参考资料
- Meta AI发布SeamlessM4T模型,近100种多语种多模态语音识别,已开源
- facebook/seamless-m4t-v2-large
- Bringing the world closer together with a foundational multimodal model for speech translation
- HuggingFace SeamlessM4T-v2
- Setting Up Meta AI’s SeamlessM4T — Massively Multilingual & Multimodal Machine Translation Model
- SeamlessM4T—Massively Multilingual & Multimodal Machine Translation
- Meta Seamless 2: Working examples
- microsoft/SpeechT5
- coqui-ai/TTS
- Sambert语音合成模型训练教程
- Sambert-Hifigan模型介绍