By Sunil Gupta
In the last few years, AI has quietly moved from the lab to the balance sheet. McKinsey’s latest global survey finds that 72% of organisations now use AI in at least one business function, and 65% are already using generative AI regularly – roughly double the share in 2023. Menlo Ventures reports that enterprise gen-AI spending jumped from $2.3 billion in 2023 to $13.8 billion in 2024, a six-fold surge as companies shift from pilots to production and pour money into real deployments, especially at the application layer.
As AI becomes embedded in a variety of domains, the real contest is no longer about who can train the largest model. It is about which models you run, for which sector, on whose infrastructure. That is the inference layer – the part of the stack where models are served, make decisions, and answer the real questions – and that is why verticalised AI really starts to matter.
Verticalised AI: From generic intelligence to sector fluency
Gartner reports that enterprise adoption is shifting from experiments with generic LLMs to domain-specific generative AI tailored to particular industries and functions. Venture investors like Bessemer describe “Vertical AI” as the future – AI built ground-up for sectors such as healthcare, finance, retail, and manufacturing, not for “everyone and everything.”
Let’s consider one example: tuberculosis screening in India’s public health system. Radiologists are scarce, X-ray volumes are high, and every missed case has serious human and economic costs. A recent health technology assessment commissioned by the department of health research evaluated AI-assisted chest X-ray software for TB screening and found that these tools improved case detection while reducing overall screening costs compared with manual readings, in effect, making AI a demonstrably cost-saving intervention in high-burden settings.
This is verticalised AI in action: a model steeped in one domain, tuned to one workflow, delivering tangible outcomes. The logic is straightforward: Verticalised inferencing is where this fluency appears. It is also where AI becomes real for boards and regulators, because ROI shows up in fewer losses, faster processes, better triage, and more productive staff.
From frontier models to right-sized AI for India
Once the use case is established, the next question is how big that model really needs to be to serve India sustainably. This is a different axis altogether. Now, it is no longer about what the system knows, but about how much compute, power, and money it takes to run. Frontier models with tens or hundreds of billions of parameters are remarkable, but they are also expensive to run. They need premium AI accelerators, large memory pools, and power-hungry clusters, which makes every inference costly. That is why so much recent work focuses on distillation and compression: using a large “teacher” model to train a smaller “student” that keeps most of the capability while cutting compute and latency.
India’s own Indic ecosystem reflects this shift. Sarvam-1 is a 2-billion-parameter model, specifically optimised for 10 major Indian languages plus English, yet it matches or beats larger global models in those languages by focusing on high-quality, local data rather than sheer size. In a country with tight budgets, hot climates, and huge linguistic diversity, this “right-sized AI” approach is an elegant and sustainable way to put capable models into every data centre, region, and edge site where India will actually use them.
Affordable, low-latency AI for India’s MSME backbone
AI can unlock over $500 billion in economic value for India’s MSME sector if its adoption scales, according to a WEF playbook. However, the same playbook warns that small businesses face real barriers, including limited capital, thin margins, scarce digital skills, and a fear of complex technology. MSMEs see AI not just as an R&D experiment, but as a cost to deal with in their monthly bill. The larger adoption will only occur if three conditions hold:
1. Inference is affordable: They can pay by usage, without long-term lock-ins or hardware purchases.
2. Access is simple: They can call AI over an API or plug it into familiar tools, without managing clusters.
3. Latency is low: Responses arrive quickly enough to sit in the flow of work – in an accounting workflow, a logistics control room, or a customer chat.
Inference-first design makes this possible. When we talk about “AI as infrastructure”, this is what it should mean for small businesses: AI that feels like electricity. Always there. Metered fairly. Available through simple interfaces, not complex control panels.
Offering Indian MSMEs affordable, accessible, low-latency AI services in the nation’s languages will help boost productivity while democratising access to intelligence in a way few large economies have managed. Yet if AI inferencing is to become a basic utility for MSMEs, it must also be a trusted one, which means the pipes, platforms, and models behind it must be designed for India’s data protection regime from day one.
DPDP-ready, in-country AI inferencing
India’s Digital Personal Data Protection Act makes one thing very clear: you cannot process personal data without consent that is free, specific, informed, unconditional, and unambiguous, given through a clear affirmative action. Commentaries on the law underline that notices must be simple, itemised, and available in India’s 22 scheduled languages, and that withdrawal of consent must be easy. AI inference falls squarely inside this definition of “processing” because prompts, chat logs, and outputs often contain personal and financial information.
For large enterprises and public systems, that makes DPDP-compliant, in-country inferencing and logging non-negotiable. It is no longer enough to store databases in India while calling models and keeping telemetry elsewhere; the full inference path must sit under Indian law and Indian accountability from day one.
The way forward
The IndiaAI Mission has already approved more than Rs 10,371 crore to build public AI infrastructure, including an 18,000+ GPU pool through public-private partnerships. This gives us the hardware backbone for an India-first AI decade. What we build on top of it matters more.
The next step is not just more capacity, but a clear choice. We can rent intelligence from distant clouds, or we can design an inference fabric of our own: sector-fluent, right-sized, affordable, and DPDP-compliant by design. If we choose the latter, India does not simply join the AI race. India helps define what a responsible, large-scale inference economy looks like.
The writer is co-founder, MD & CEO, Yotta Data Services
Disclaimer: The views expressed are the author’s own and do not reflect the official policy or position of Financial Express.
