META Introducing SeamlessM4T AI Translation Model

Introducing SeamlessM4T, an innovative Artificial Intelligence (AI) model that provides comprehensive translation and transcription services in multiple languages and modalities.

This model can translate speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages depending on the task.

Today, we’re introducing SeamlessM4T, the first all-in-one multimodal and multilingual AI translation model that allows people to communicate effortlessly through speech and text across different languages, said Meta

SeamlessM4T covers:

  • 101 languages for speech input.
  • 96 Languages for text input/output.
  • 35 languages for speech output.

This unified model enables multiple tasks without relying on multiple separate models:
  1. Speech-to-speech translation (S2ST)
  2. Speech-to-text translation (S2TT)
  3. Text-to-speech translation (T2ST)
  4. Text-to-text translation (T2TT)
  5. Automatic speech recognition (ASR)

SeamlessAlign is the largest open multimodal translation dataset, with 270,000 hours of mined speech and text alignments.

Creating a universal language translator, akin to the fictitious Babel Fish in The Hitchhiker's Guide to the Galaxy, presents challenges due to the limited coverage of speech-to-speech and speech-to-text systems for the world's languages.  

SeamlessM4T's single system approach increases translation efficiency and quality, enabling effective communication between speakers of different languages.

This is just the latest step in our ongoing endeavor to create AI-powered technology that facilitates communication between individuals who speak different languages, said Meta.

