(Subscribe to our Today's Cache newsletter for a quick snapshot of top 5 tech stories. Click here to subscribe for free.)
Facebook on Monday introduced the first open-source multi-lingual machine translation model M2M-100 that can translate between any pair of 100 languages without relying on English data.
When translating, previous models made use of English translation data as it was widely available. Facebook's model translates directly from language one to another to preserve meaning better, the company said in a statement.
The programme outperforms English-centric systems by 10 points on the BLEU metric, which is used for evaluating machine translations.
M2M-100 is trained on a total of 2,200 language directions. It will improve the quality of translations for billions of people, especially those who speak low-resource languages, it added.
Typical translation systems require building separate AI models for each language and task, but this approach doesn’t scale effectively on Facebook, where people post content in more than 160 languages across billions of posts. Advanced multilingual systems can process multiple languages at once, but compromise on accuracy by relying on English data to bridge the gap between the source and target languages.
Also read | Find the song by humming it to Google Assistant
Facebook used novel mining strategies to create translation data, building a data set with 7.5 billion sentences for 100 languages.
For years, AI researchers have been working toward building a single universal model that can understand all languages across different tasks. A single model that supports all languages, dialects, and modalities will help serve more people, keep translations up to date, and create new experiences for billions of people equally.