Friday, March 29

A real-time translator for 200 languages: Meta wants to make a linguistic leap with artificial intelligence


Meta, the company formerly known as Facebook, is setting out to create an artificial intelligence-based language translation platform. The essential novelty is not that, but the support of a lot of languages ​​that until now had not been covered by other tools of this type.

200 languages. The system developed by Meta, called NLLB-200, (from “No Language Left Behind”, that is, “No language will be forgotten”) is especially ambitious for its support in 40,000 different addresses thanks to the combination of those 200 different languages ​​supported . Google Translate supports 133, for example.

“low resource languages”. That offer will include languages ​​that are much less common in translators and that have less than a million pairs of translated phrases available to train the system. Among them are various languages ​​spoken in Africa or India that are not supported by commercial translation tools.

Artificial intelligence + human validation. The Meta AI system, described in a scientific study, explains how NLLB has combined a human translation validation system (FLORES-200) with a training phrase creation mechanism and various modeling techniques to improve translation.

How to achieve a decent translation? Machine translation systems often make notable errors. To avoid problems, Meta created a test dataset with 3,001 pairs of sentences in each language covered by the model, each of which was translated from English into the target language not by the machine, but by professional human translators who not only were they, but they spoke that target language natively.

Also Read  The 8 phases to choose a cloud provider

The results promise. From there they compared machine translation with human references using a popular benchmark in this field, the so-called BLEU (BiLingual Evaluation Understudy). This test offers a score to the translations and made it clear that the Meta model improves by 44% the results of the best automatic translation systems that existed until now.

But beware of automatic translation. The result is promising, but as indicated by a Microsoft expert in this field, they are not definitive. In translations with under-resourced languages, translation errors can be difficult to detect—for example, confidently asserting something as true even though it isn’t—and so it will be important to take those translations with some perspective. It is also important to integrate toxicity lists to detect and avoid profanity and potentially offensive content.

We have tested DeepL, the new translator that puts Google's one in evidence

An Open Source project. Another of the outstanding features of this system is that the code will be open and the research tools will be published, which could mean that even more languages ​​end up being added to this translation engine.

More knowledge for everyone. The practical application of this translation system is obvious: that more and more people have access to content on the Internet that was in languages ​​they do not speak. Wikipedia is a good example, and in fact Meta has partnered with the Wikimedia Foundation to try to support the online encyclopedia’s translation systems.

Leave a Reply

Your email address will not be published. Required fields are marked *