BharatGen will unveil a multilingual foundation model supporting all 22 scheduled Indian languages at the India AI Impact Summit. The model, which incorporates reasoning, coding, and mathematics capabilities, will be released initially as a pre-trained checkpoint, enabling developers and enterprises to fine-tune it for specific use cases, said Rishi Bal.
BharatGen, a Section 8 non-profit housed at Indian Institute of Technology Bombay, has received around `235 crore in seed funding from the Department of Science and Technology and approximately Rs 1,000 crore from the Ministry of Electronics and Information Technology (MeitY). According to Bal, nearly 95% of the MeitY allocation has gone towards GPU compute.
“Most of the capital in building large models is used for compute,” Bal said, adding that sustained access to GPUs is foundational if the country wants to build competitive AI systems domestically.
The funding is part of the Rs 10,300-crore IndiaAI Mission, under which the government is supporting foundational model developers, compute infrastructure, and dataset creation. BharatGen was approved under the mission last year following an application process with MeitY.
Bal said BharatGen has had access to an additional 4,000 GPUs over the past two months as part of the AI Mission, which significantly shortened training timelines. “We have requested around 4,000 GPUs for a two-month window. This is roughly the scale we’ve been operating at,” he said.
Mixture-of-Experts
The 17-billion-parameter model uses a mixture-of-experts architecture, activating subsets of parameters for specific tasks rather than the entire network each time. “That allows us to be more efficient both in training and inference,” Bal said.
Bal believes India cannot simply replicate the US or Chinese playbook to achieve a global AI presence. “The American model is heavily venture-capital driven and subsidised by large tech firms. The Chinese model combines state backing with a closed domestic market. We have to find an Indian way to succeed,” he said.
Beyond Text
BharatGen is also developing speech recognition and text-to-speech systems across Indian languages. This includes lightweight models that can run on minimal hardware, as well as more capable systems integrated with large language models. “The future of AI in India is likely to be voice-first,” Bal said.
He noted that linguistic diversity poses both a challenge and an opportunity. “If you drive from Delhi to Patna, Hindi changes every 100 kilometres. Capturing dialects and domain-specific vocabulary, whether in agriculture or retail, is a lot of work.” While platforms such as Bhashini provide baseline datasets, Bal said more grassroots data collection is necessary.
