110 new languages to be added to Google Translate

110 new languages to be added to Google Translate

Google Translate breaks down language barriers, helping people communicate and better understand the world around them. The Internet giant is constantly introducing the latest technology to ensure that more people have access to this tool: in 2022, they added 24 new languages using the Zero-Shot Machine Translation approach, where a machine learning model learns to translate into another language even if it doesn’t see an example.

Google also announced the 1000 Languages initiative, which involves the creation of artificial intelligence models that will support the world’s 1000 most common languages.

Today, the company is using artificial intelligence to expand the variety of languages supported. Thanks to the large language model PaLM 2, the team is starting to add 110 new languages to Google Translate, the largest expansion ever.

Translation support for more than half a billion people

Languages ranging from Cantonese to Kekchi are spoken by more than 614 million speakers, providing access to translation for about 8% of the world’s population. Some of these languages are among the world’s largest languages with more than 100 million speakers. Other languages are spoken by small indigenous communities, and some have almost no native speakers, but active efforts are underway to revive them. About a quarter of the new languages originated in Africa, the largest expansion of African languages to date, including Fon, Kikongo, Luo, Ga, Swati, Venda, and Wolof.

Here are some of the new languages that will be supported in Google Translate:

  • Afar is a tonal language spoken in Djibouti, Eritrea, and Ethiopia. Of all the languages launched this time, Afar had the most input from the volunteer community.

  • Cantonese has long been one of the most requested languages for Google Translate. But there are some challenges, as it often overlaps with Mandarin in writing, making it difficult to find data and train models.

  • The Crimean Tatar language is a Turkic language, the native language of the Crimean Tatars. Today, Crimean Tatar is classified as a language in need of additional protection by UNESCO. In January 2023, Ukraine established the National Commission for the Crimean Tatar Language to protect it.

  • Manx is the Celtic language of the Isle of Man. It almost disappeared with the death of the last native speaker in 1974. But thanks to a movement to revive the language on the island, it is now spoken by thousands of people.

  • Nko is a standardized form of West African Mandinka languages that unifies many dialects into one common language. Its unique alphabet was invented in 1949, and today it has an active research community that develops resources and technologies for it.

  • Punjabi (Shahmukhi) is a variety of Punjabi written in the Persian-Arabic script (Shahmukhi) and is the most widely spoken language in Pakistan.

  • Tamazight (Amazigh) is a Berber language spoken in North Africa. Although there are many dialects, the written form is generally mutually intelligible. It is written in the Latin script and the Tifinag script, both of which are supported by Google Translate.

  • Tok Pisin is an English-based creole language and the language of interethnic communication in Papua New Guinea. If you speak English, try to translate into Tok Pisin – you might be able to understand the meaning!

Як Google обирає нові мови

How Google chooses new languages

There are many factors to consider when adding new languages to Translate – from what kinds of languages we offer to what specific spellings we use.

Languages have a huge number of variations: regional varieties, dialects, and different spelling standards. In fact, many languages do not have any standard form, so it is impossible to choose the “right” variant. Google’s approach is to prioritize the most commonly used varieties of each language. For example, the Romani language has many dialects across Europe. The models create text that is closest to South Wallachian Romani, the variety that is widely used on the Internet. But it also contains elements from other dialects, such as North Vlach and Balkan Roma.

PaLM 2 is a key piece of the puzzle, helping Translators to learn languages that are closely related to each other more efficiently, such as Hindi-like languages such as Awadhi and Marwadi, and French creoles such as Seychellois Creole and Mauritian Creole. As technology advances and as developers continue to work with linguistic experts and native speakers, more language varieties and spelling rules will be supported over time.

Visit the Help Center to learn more about these newly supported languages. And start translating on translate.google.com or in the Google Translate app at Android and iOS.


Please enter your comment!
Please enter your name here