Thousands of Languages Supported

The limits of existing voice recognition and generating technology will only hasten the extinction of many languages around the world. We want to make it simpler for people to utilize technology and access information in their native tongue, so today, we’re releasing a number of artificial intelligence (AI) models that may be able to assist them.

Massively Multilingual Speech (MMS) models increase the number of languages that can be recognized by text-to-speech and speech-to-text technology from 100 to over 1,100, or more than ten times as many as before, and more than 4,000 spoken languages, or 40 times more than before.

There are numerous applications for speech technology that can be utilized in a person’s favorite language and can recognize everyone’s voice, ranging from virtual and augmented reality technology to messaging services.

We’re making our models and code available to the research community so that they can expand on them, conserve the languages of the world, and unite people from different cultures.

Our Method

The biggest speech datasets currently available only cover 100 languages. Therefore, our initial objective was to collect audio data for hundreds of different languages. To get around this, we looked to religious literature, like the Bible, whose translations have been extensively investigated for text-based language translation studies and have appeared in many other languages.

There are publicly accessible audio recordings of these translations of readers of these books in several languages. We compiled a collection of New Testament readings in more than 1,100 languages as part of the MMS project, providing an average of 32 hours of data per language.

Moving forward 

In the future, we plan to expand MMS’s language support to include even more tongues and take on the dialect management problem, which is frequently challenging for current speech technology.