The model aims to help computer systems understand as many as 16 different Indian languages to solve for linguistic complexities involved in translation, transliteration, and understanding the sentiment of what the user meant in his/her search.
Google also announced the launch of new features for its users such as easily toggling between search results in English and Tamil, Telugu, Bangla, Marathi. (Reuters file photo)
Search engine Google, which has been working on localising the internet to benefit Indians in their native languages and dialects, is now looking to help startups, researchers, and others who might be working on building Indian language technologies (LT). LT is associated with the computational processing of the written or spoken form of a language and aimed at easing its interaction with computer systems and processing a large amount of textual information. Towards this, Google announced a multilingual model, on Thursday, called Multilingual Representations for Indian Languages (MuRIL). The model aims to help computer systems understand as many as 16 different Indian languages to solve for linguistic complexities involved in translation, transliteration, and understanding the sentiment of what the user meant in his/her search.
For example, “the sentence Achha hua account bandh nahi hua would previously be interpreted as having a negative meaning, but MuRIL correctly identifies this as a positive statement. Or take the ability to classify a person versus a place: Shirdi ke sai baba would previously be interpreted as a place, which is wrong, but MuRIL correctly interprets it as a person,” the company said in a blog post.
The free open-source MuRIL is currently available to download from the TensorFlow Hub. Google hoped it to be “the next big evolution for Indian language understanding, forming a better foundation for researchers, students, startups, and anyone else interested in building Indian language technologies.” The model also supports transliterated text such as when writing Hindi using Roman script, which Google said was missing from previous models of its kind. Also, the support to 16 languages is the “highest coverage for Indian languages among any other publicly available model of its kind.”
Google also announced the launch of new features for its users such as easily toggling between search results in English and Tamil, Telugu, Bangla, Marathi apart from Hindi; showing relevant content in Indian languages including Hindi, Bangla, Marathi, Tamil, and Telugu even if the user type query English; allowing users to use Google Maps into one of nine Indian languages; and more.
The Indian language internet user base is likely to grow at a CAGR of 18 per cent to reach 536 million by 2021 vis-à-vis the English internet user base growing at 3 per cent to reach 199 million. Indian language internet users are expected to account for close to 75 per cent of the country’s internet user base by 2021, according to a 2017 KPMG report. The growth will be on the back of penetration of internet-enabled devices, affordable high-speed internet, rising digital literacy, and more. Some of the Indian startups working leveraging natural language processing, speech recognition, conversational AI are Reverie Language Technologies, Niki.ai, Gnani.ai, Manthan, and more.