Breaking down language barriers: Future of cloud computing

Building India-specific Indic-language LLMs, therefore, is an essential step

Datasets and the architecture of AI models are vital for building Large Language Models
Datasets and the architecture of AI models are vital for building Large Language Models

By Tarun Dua

India’s linguistic landscape is remarkably diverse. Our country is home to a mind-boggling 19,500 languages and dialects, including 22 officially recognized languages under the Eighth Schedule of the Indian Constitution. Major languages like Hindi, Bengali, Telugu, Marathi, Gujarati, Tamil, and Urdu are spoken by lakhs of Indians, and by a rapidly growing global diaspora, each with its own unique script, literary traditions, and regional variations. 

This linguistic plurality of our nation is built into our culture and tradition, and it is something we all treasure. However, this diversity also presents unique challenges, often creating barriers in educational and administrative systems. It also complicates delivery of government services and limits access to information and opportunities for those not fluent in English, or in the dominant regional or national languages. 

Is there a way that we can break down challenges that language barriers create, while preserving our diversity? This is where Indic-language LLMs offer an answer.

In late 2022, when Large Language Models (LLMs) burst into prominence with the massively viral launch of ChatGPT, users got a glimpse into how humans may interact with machines in the future — through natural language instead of code. 

While the launch of ChatGPT marked a significant milestone in AI development, it was the open-source AI landscape where remarkable progress in AI has truly taken off. The beauty of the open-source community is that it is highly collaborative, where developers are easily able to build on each other’s work. This has led to the creation of a number of powerful tools and models needed for AI development. Additionally, platforms such as Hugging Face and Indian-born E2E Cloud evolved to enable developers, startups and enterprises to build, deploy, and run open-source AI models in the cloud with ease, without relying on proprietary closed-source models. 

Why Build Indic Language-Enabled LLMs?

One of the biggest challenges with proprietary AI models developed in the US or EU is that they are trained on datasets that reflect the linguistic and cultural trends and biases of their originating communities, and as such, are likely to lack an understanding of our country’s diverse cultural landscape. 

Additionally, the structure of our languages differs significantly from English, leading most popular AI models to struggle with basic tasks when prompted in Indian languages. While proprietary models developed in the US rely heavily on text prompts; an Indic-language LLM might end up prioritizing audio as a key interface. 

Building India-specific Indic-language LLMs, therefore, is an essential step for us to unlock the benefits of this powerful technology for the Indian user base. While the core techniques of building and training Large Language Models are universally relevant across languages, the datasets used, the way the models are fine-tuned and tested, and how we evaluate the performance of a model changes entirely when the language changes. 

The Japanese government has taken a similar step by investing hundreds of millions of dollars alongside major tech companies to develop AI systems. These LLMs are specifically designed for the Japanese language, rather than relying on translations of the English version.

In India, one of the foremost initiatives on this front is Bhashini. Launched by honorary Prime Minister of India, Narendra Modi, as part of India’s National Language Technology Mission in July 2022, the project aims to provide language technology solutions as digital public goods, and leverages AI/ML and NLP technologies. With over 1000 pre-trained AI models accessible through Open Bhashini APIs, it is actively pushing for development of Indic-language AI amongst startups, academia, and government bodies. Other similar initiatives are underway amongst some Indian startups, universities and enterprises as well.

Cloud GPUs for Delivering Foundational Models

Datasets and the architecture of AI models are vital for building Large Language Models (LLMs), but the role of cloud infrastructure, particularly advanced GPUs, is equally crucial. These cloud GPUs not only facilitate the creation of these models but also their deployment and operation. The significance of cloud GPUs is so paramount that without them, the construction of a foundational language model becomes an exceedingly challenging, if not nearly impossible, task.

In fact, newer and highly sophisticated platforms have emerged to facilitate this – the most recent one being the extremely powerful HGX 8xH100, billed as the AI supercomputer. HGX 8xH100 is built on the powerful H100 GPU, which is the world’s first GPU capable of accelerating trillion-parameter AI models. Building a language model requires instant access to a platform like this, and this is where India-focussed hyperscale cloud providers play a significant role. They have rapidly evolved to bring the capabilities required to build Indic-language LLMs, which developers and researchers can have access to.

Also, as these LLMs evolve and grow, cloud computing platforms will be used to deliver them. They will be used by both end-users and businesses alike, to simplify some of the challenges that arise due to India’s linguistic diversity. Businesses, especially the ones who wish to cater to India’s regional populations, would be able to deliver their services to customers in a language they are comfortable with. Government bodies would also be able to deliver services more seamlessly. This will have a transformative impact on bringing digital inclusivity to lakhs of Indians reading and writing in varied languages and dialects. 

Looking ahead to a future where interactions between humans and machines are inherently more intuitive and conducted in the user’s preferred language, it’s certain that an array of AI-driven language technologies will emerge. This advancement will predominantly unfold on cloud computing platforms, serving as the primary stage for this technological evolution. It’s this prospect of transformative progress that fuels my enthusiasm and motivates our continued efforts in development.

The author is CEO, E2E Networks Ltd

Follow us on TwitterFacebookLinkedIn

Get live Share Market updates, Stock Market Quotes, and the latest India News and business news on Financial Express. Download the Financial Express App for the latest finance news.

This article was first uploaded on November thirty, twenty twenty-three, at zero minutes past eight in the morning.
Market Data
Market Data