返回首页
SeamlessM4T

SeamlessM4T — Massively Multilingual & Multimodal Machine Translation(大规模多语言和多模式机器翻译)

Seamless Communication

  • ASR: Automatic speech recognition for 96 languages.
  • S2ST: Speech-to-Speech translation from 100 source speech languages into 35 target speech languages.
  • S2TT: Speech-to-text translation from 100 source speech languages into 95 target text languages.
  • T2ST: Text-to-Speech translation from 95 source text languages into 35 target speech languages.
  • T2TT: Text-to-text translation (MT) from 95 source text languages into 95 target text languages.

SeamlessM4T 概述

安装 Seamless Communication

克隆仓库

git clone https://github.com/facebookresearch/seamless_communication
cd seamless_communication

创建虚拟环境

conda create -n seamless-m4t python==3.10.9 -y
conda activate seamless-m4t

增加 MPS 的支持

经过测试,使用 MPSS2ST, S2TT, ASR 这三个任务都有问题,输入是语音就有问题。

cli/m4t/predict/predict.py

    if torch.cuda.is_available():
        device = torch.device("cuda:0")
        dtype = torch.float16
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
        dtype = torch.float32
    else:
        device = torch.device("cpu")
        dtype = torch.float32

设置环境变量:PYTORCH_ENABLE_MPS_FALLBACK=1 解决 MPS 没有实现的操作 ❌

NotImplementedError: The operator 'aten::_weight_norm_interface' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

设置环境变量

conda env config vars set PYTORCH_ENABLE_MPS_FALLBACK=1

查看环境变量

conda env config vars list

安装 Seamless Communication

pip install .
conda install -c conda-forge libsndfile==1.0.31 -y

mkdir -p /opt/homebrew/opt/libsndfile/lib/
ln -s /opt/miniconda/envs/seamless-m4t/lib/libsndfile.1.0.31.dylib /opt/homebrew/opt/libsndfile/lib/libsndfile.1.dylib

准备数据

CHINESE_TEXT="荷兰发布了一份主题为“宣布即将对先进半导体制造设备采取的出口管制措施”的公告表示,鉴于技术的发展和地缘政治的背景,政府已经得出结论,有必要扩大现有的特定半导体制造设备的出口管制。"
ENGLISH_TEXT="The Netherlands issued an announcement titled \"Announcement of Upcoming Export Control Measures on Advanced Semiconductor Manufacturing Equipment\" stating that given the development of technology and the geopolitical context, the government has concluded that it is necessary to expand existing specific semiconductor manufacturing Export controls on equipment."
  • 中文语音文件:chinese.wav
  • 英文语音文件:english.wav

Languages List

Source Languages(S2ST / S2TT)

CodeLanguage NameCodeLanguage NameCodeLanguage NameCodeLanguage NameCodeLanguage NameCodeLanguage Name
afrAfrikaanscymWelshhyeArmenianlitLithuanianociOccitanswhSwahili
amhAmharicdanDanishiboIgboltzLuxembourgishoryOdiatamTamil
arbModern Standard ArabicdeuGermanindIndonesianlugGandapanPunjabitelTelugu
aryMoroccan ArabicellGreekislIcelandicluoLuopbtSouthern PashtotgkTajik
arzEgyptian ArabicengEnglishitaItalianlvsStandard LatvianpesWestern PersiantglTagalog
asmAssameseestEstonianjavJavanesemaiMaithilipolPolishthaThai
astAsturianeusBasquejpnJapanesemalMalayalamporPortugueseturTurkish
azjNorth AzerbaijanifinFinnishkamKambamarMarathironRomanianukrUkrainian
belBelarusianfraFrenchkanKannadamkdMacedonianrusRussianurdUrdu
benBengaligazWest Central OromokatGeorgianmltMalteseslkSlovakuznNorthern Uzbek
bosBosniangleIrishkazKazakhmniMeiteislvSlovenianvieVietnamese
bulBulgarianglgGaliciankeaKabuverdianumyaBurmesesnaShonaxhoXhosa
catCatalangujGujaratikhkHalh MongoliannldDutchsndSindhiyorYoruba
cebCebuanohebHebrewkhmKhmernnoNorwegian NynorsksomSomaliyueCantonese
cesCzechhinHindikirKyrgyznobNorwegian BokmålspaSpanishzlmColloquial Malay
ckbCentral KurdishhrvCroatiankorKoreannpiNepalisrpSerbianzsmStandard Malay
cmnMandarin ChinesehunHungarianlaoLaonyaNyanjasweSwedishzulZulu

Source Languages(T2TT / T2ST)

CodeLanguage NameCodeLanguage NameCodeLanguage NameCodeLanguage NameCodeLanguage NameCodeLanguage Name
afrAfrikaanscymWelshhyeArmenianlitLithuanianswhSwahili
amhAmharicdanDanishiboIgbooryOdiatamTamil
arbModern Standard ArabicdeuGermanindIndonesianlugGandapanPunjabitelTelugu
aryMoroccan ArabicellGreekislIcelandicluoLuopbtSouthern PashtotgkTajik
arzEgyptian ArabicengEnglishitaItalianlvsStandard LatvianpesWestern PersiantglTagalog
asmAssameseestEstonianjavJavanesemaiMaithilipolPolishthaThai
eusBasquejpnJapanesemalMalayalamporPortugueseturTurkish
azjNorth AzerbaijanifinFinnishmarMarathironRomanianukrUkrainian
belBelarusianfraFrenchkanKannadamkdMacedonianrusRussianurdUrdu
benBengaligazWest Central OromokatGeorgianmltMalteseslkSlovakuznNorthern Uzbek
bosBosniangleIrishkazKazakhmniMeiteislvSlovenianvieVietnamese
bulBulgarianglgGalicianmyaBurmesesnaShona
catCatalangujGujaratikhkHalh MongoliannldDutchsndSindhiyorYoruba
cebCebuanohebHebrewkhmKhmernnoNorwegian NynorsksomSomaliyueCantonese
cesCzechhinHindikirKyrgyznobNorwegian BokmålspaSpanish
ckbCentral KurdishhrvCroatiankorKoreannpiNepalisrpSerbianzsmStandard Malay
cmnMandarin ChinesehunHungarianlaoLaonyaNyanjasweSwedishzulZulu

Target Languages(S2ST / T2ST)

CodeLanguage Name中文名CodeLanguage Name中文名CodeLanguage Name中文名
engEnglish英语hinHindi印地语slkSlovak斯洛伐克语
arbModern Standard Arabic现代标准阿拉伯语indIndonesian印度尼西亚语spaSpanish西班牙语
benBengali孟加拉语itaItalian意大利语sweSwedish瑞典语
catCatalan加泰罗尼亚语jpnJapanese日语swhSwahili斯瓦希里语
cesCzech捷克语korKorean韩语telTelugu泰卢固语
cmnMandarin Chinese普通话mltMaltesetglTagalog他加禄语
cymWelsh威尔士语nldDutch荷兰语thaThai泰语
danDanish丹麦语pesWestern Persian波斯语turTurkish土耳其语
deuGerman德语polPolish波兰语ukrUkrainian乌克兰语
estEstonian爱沙尼亚语porPortuguese葡萄牙语urdUrdu乌尔都语
finFinnish芬兰语ronRomanian罗马尼亚语uznNorthern Uzbek北乌兹别克语
fraFrench法语rusRussian俄语vieVietnamese越南语

SeamlessM4T 命令行(m4t_predict)

m4t_predict -h
usage: m4t_predict [-h] [--task TASK] [--tgt_lang TGT_LANG] [--src_lang SRC_LANG] [--output_path OUTPUT_PATH] [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME] [--text_generation_beam_size TEXT_GENERATION_BEAM_SIZE]
                   [--text_generation_max_len_a TEXT_GENERATION_MAX_LEN_A] [--text_generation_max_len_b TEXT_GENERATION_MAX_LEN_B] [--text_generation_ngram_blocking TEXT_GENERATION_NGRAM_BLOCKING]
                   [--no_repeat_ngram_size NO_REPEAT_NGRAM_SIZE] [--unit_generation_beam_size UNIT_GENERATION_BEAM_SIZE] [--unit_generation_max_len_a UNIT_GENERATION_MAX_LEN_A]
                   [--unit_generation_max_len_b UNIT_GENERATION_MAX_LEN_B] [--unit_generation_ngram_blocking UNIT_GENERATION_NGRAM_BLOCKING] [--unit_generation_ngram_filtering UNIT_GENERATION_NGRAM_FILTERING]
                   [--text_unk_blocking TEXT_UNK_BLOCKING]
                   input

M4T inference on supported tasks using Translator.

positional arguments:
  input                 Audio WAV file path or text input.

options:
  -h, --help            show this help message and exit
  --task TASK           Task type
  --tgt_lang TGT_LANG   Target language to translate/transcribe into.
  --src_lang SRC_LANG   Source language, only required if input is text.
  --output_path OUTPUT_PATH
                        Path to save the generated audio.
  --model_name MODEL_NAME
                        Base model name (`seamlessM4T_medium`, `seamlessM4T_large`, `seamlessM4T_v2_large`)
  • —src_lang S2ST / S2TT / ASR 不需要指定,T2ST / T2TT 需要指定

S2ST: Speech-to-Speech translation

中文 → 英文

m4t_predict chinese.wav --task s2st --tgt_lang eng --output_path eng.wav

英文 → 中文

m4t_predict english.wav --task s2st --tgt_lang cmn --output_path cmn.wav

S2TT: Speech-to-text translation

中文 → 中文

m4t_predict chinese.wav --task s2tt --tgt_lang cmn
荷兰发布了一份主题为 宣布即将对先进半导体制造设备采取的出口管制措施 的公告 表示 鉴于技术的发展和地缘政治的背景 政府已经得出结论 有必要扩大现有的特定半导体制造设备的出口管制

中文 → 英文

m4t_predict chinese.wav --task s2tt --tgt_lang eng
The announcement, titled "Announcing Imminent Export Control Measures for Advanced Semiconductor Manufacturing Equipment", said that given the development of technology and geopolitics, the government has concluded that it is necessary to expand export controls on existing specific semiconductor manufacturing equipment.

英文 → 中文(❌)

m4t_predict english.wav --task s2tt --tgt_lang cmn
The announcement titled announcing export control measures for advanced semiconductor manufacturing equipment that given the development of technology and geopolitics, the government has concluded that it is necessary to expand export controls on specific semiconductor manufacturing equipment.

T2ST: Text-to-Speech translation

中文 → 中文

m4t_predict $CHINESE_TEXT --task t2st --src_lang cmn --tgt_lang cmn --output_path cmn.wav
荷兰发布了一份主题为 ⁇ 宣布即将对先进半导体制造设备采取的出口管制措施 ⁇ 的公告表示,鉴于技术的发展和地缘政治的背景,政府已经得出结论,有必要扩大现有的特定半导体制造设备的出口管制 ⁇

中文 → 英文

m4t_predict $CHINESE_TEXT --task t2st --src_lang cmn --tgt_lang eng --output_path eng.wav
The Netherlands issued an announcement titled "Announcing Imminent Export Control Measures for Advanced Semiconductor Manufacturing Equipment" stating that, given the technological development and geopolitical context, the government has concluded that it is necessary to expand existing export controls for certain semiconductor manufacturing equipment.

英文 → 中文

m4t_predict $ENGLISH_TEXT --task t2st --src_lang eng --tgt_lang cmn --output_path cmn.wav
荷兰发布了题为 ⁇ 关于先进半导体制造设备即将实施出口管制措施的公告 ⁇ ,该公告指出,鉴于技术发展和地缘政治背景,政府得出结论,有必要扩大现有的特定半导体制造设备出口管制 ⁇ 

T2TT: Text-to-text translation

中文 → 英文

m4t_predict $CHINESE_TEXT --task t2tt --src_lang cmn --tgt_lang eng
The Netherlands issued an announcement titled "Announcing Imminent Export Control Measures for Advanced Semiconductor Manufacturing Equipment" stating that, given the technological development and geopolitical context, the government has concluded that it is necessary to expand existing export controls for certain semiconductor manufacturing equipment.

中文 → 法语

m4t_predict $CHINESE_TEXT --task t2tt --src_lang cmn --tgt_lang fra
Les Pays-Bas ont publié un thème pour  ⁇  annoncer les prochaines mesures de contrôle des exportations prises sur des équipements de fabrication de semi-conducteurs  ⁇  l'annonce indique que, compte tenu du développement technologique et du contexte géopolitique, le gouvernement a conclu qu'il est nécessaire d'étendre les exportations existantes de certains équipements de fabrication de semi-conducteurs  ⁇

英文 → 中文

m4t_predict $ENGLISH_TEXT --task t2tt --src_lang eng --tgt_lang cmn
荷兰发布了题为 ⁇ 关于先进半导体制造设备即将实施出口管制措施的公告 ⁇ ,该公告指出,鉴于技术发展和地缘政治背景,政府得出结论,有必要扩大现有的特定半导体制造设备出口管制 ⁇

ASR: Automatic speech recognition

中文 → 中文

m4t_predict chinese.wav --task asr --tgt_lang cmn
荷兰发布了一份主题为 宣布即将对先进半导体制造设备采取的出口管制措施 的公告 表示 鉴于技术的发展和地缘政治的背景 政府已经得出结论 有必要扩大现有的特定半导体制造设备的出口管制

中文 → 英文

m4t_predict chinese.wav --task asr --tgt_lang eng
The announcement, titled "Announcing Imminent Export Control Measures for Advanced Semiconductor Manufacturing Equipment", said that given the development of technology and geopolitics, the government has concluded that it is necessary to expand export controls on existing specific semiconductor manufacturing equipment.

英文 → 英文

m4t_predict english.wav --task asr --tgt_lang eng
The announcement, titled "Announcing Important Export Control Measures for Advanced Semiconductor Manufacturing Equipment", said that given the development of technology and geopolitics, the government has decided that it is necessary to expand export controls on specific semiconductor manufacturing equipment.

SeamlessM4T Web UI

安装依赖 gradio

pip install gradio                                                                          

设置环境变量

conda env config vars set CHECKPOINTS_PATH=/Users/junjian/GitHub/facebookresearch/seamless_communication/seamless-m4t-v2-large

运行应用

python demo/m4tv2/app.py

参考资料

🤖

智能问答助手

Ollama + AI 问答

⏳ 初始化...

💡 配置和聊天记录仅保存在本地浏览器中