Last Updated on December 2, 2023 by SPN Editor
Meta AI has unveiled a revolutionary translator, ‘SeamlessM4T’, engineered for real-time multilingual communication. This pioneering model surpasses the constraints of traditional systems by providing translation and transcription services in almost 100 languages. The ‘SeamlessM4T’ model is adept at executing a range of tasks including speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations. This signifies a substantial progression in the field of instantaneous language translation and transcription.
SeamlessM4T, an acronym for Massively Multilingual and Multimodal Machine Translation, is a sophisticated AI model capable of multilingual multimodal translation and transcription. It incorporates the insights and capabilities from Meta’s No Language Left Behind (NLLB), Universal Speech Translator, and Massively Multilingual Speech initiatives, all within a single model.
This model is proficient in performing multiple tasks across speech and text: speech-to-text, speech-to-speech, text-to-speech, text-to-text translation, and speech recognition. This unified system approach minimizes errors and delays, enhancing the efficiency and quality of the translation process.
On the input side, the model accommodates up to 100 languages depending on the task. Moreover, SeamlessM4T inherently identifies the source language(s), eliminating the need for a separate language identification model. As a consolidated model, it can reduce latency compared to cascaded systems.
SeamlessM4T underwent rigorous evaluation across all languages using both automatic metrics (ASR-BLEU, BLASER 2) and human evaluation. It was also assessed for robustness, bias, and added toxicity, where it significantly surpassed previous state-of-the-art models.
SeamlessM4T has exhibited exceptional accuracy in its performance. It underwent comprehensive evaluation across all languages using both automatic metrics (ASR-BLEU, BLASER 2) and human evaluation.
Compared to robust cascaded models, SeamlessM4T enhanced the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech.
Furthermore, the system manages background noises and speaker variations in speech-to-text tasks better than the current state-of-the-art model by 38 percent and 49 percent respectively.
These results demonstrate that SeamlessM4T significantly surpasses previous state-of-the-art models in terms of accuracy.
How SeamlessM4T works as a universal real-time translator
The Seamless translator is a new AI tool that enables real-time translation between over 100 languages while preserving the speaker’s voice style, emotion, and prosody. It consists of three models:
SeamlessExpressive: Preserves the vocal style and emotional nuances of the speaker’s voice during translation.
SeamlessStreaming: Enables near real-time translation with about two seconds of latency across nearly 100 languages.
SeamlessM4T v2: Serves as the foundation for the other two models, providing improved consistency between text and speech output.
These models could transform global communication, enabling real-time multilingual conversations, automatically dubbed videos, and podcasts. They could also help break down language barriers for immigrants and others who struggle with communication.
However, there are concerns about potential misuse of the technology for voice phishing scams, deep fakes, and other harmful applications. To promote safety and responsible use, several measures have been implemented, including audio watermarking and new techniques to reduce hallucinated toxic outputs.
In line with Meta’s commitment to open research and collaboration, the Seamless Communication models have been publicly released on Hugging Face and Github. By making these models freely available, Meta aims to enable researchers and developers to build upon and extend this work to help connect people across languages and cultures.
The researchers believe that Seamless could lead to a significant change in how machine-assisted cross-lingual communication is accomplished.