By Akhilesh Tilotia
If you want to know which villages in India have an electric connection, where should you look? Which official dataset would give you reliable, current information? How can daily operational data from a thermal power plant help reduce air pollution while simultaneously boosting efficiency at cement plants?
India is sitting on a goldmine of public data. With the growing adoption of artificial intelligence (AI) and digital tools, it is time to unlock its true potential.
Digital India and the data exhaust
The Digital India initiative and digitisation of public services have dramatically increased the volume of data generated by government departments, regulators, statutory bodies, and public institutions. Many of these datasets — covering education, health, environment, infrastructure, taxation, and more — are already being collected at a granular level and sometimes even updated very frequently. Meanwhile, companies and individuals are generating large volumes of publicly accessible digital information. Together, this ecosystem of “alternative data” is rapidly expanding.
For over a decade now, investors and businesses have used such data to gain information advantages — famously, by analysing satellite images of retail parking lots to forecast store revenues. Over time, alternative data sources have broadened to include transaction records, regulatory filings, and scraping of open databases. India now hosts thousands of such datasets in the public domain.
AI enters the arena
The emergence of AI has added a powerful new player to the world of data analysis. What was once the domain of seasoned analysts is now being democratised by AI models that can process vast volumes of data, identify patterns, and generate actionable insights.
Young analysts who started tracking companies in a sector would invest countless hours in understanding industry structure and its dynamics, figuring out what data is required to track the entities and where it would be available, and then eventually updating these datasets immaculately. It could be years before they got a good grasp of understanding the detailed value chain (suppliers, customers) and the other relevant stakeholders (government, industry peers, etc.). AI tools can drastically accelerate the learning curve for junior analysts tracking sectors and companies, enabling them to extract deeper insights with greater efficiency.
AI’s promise depends fundamentally on the availability and quality of data. Just as with human intelligence, AI adheres to the rule of “garbage in, garbage out”. Poor data quality, incompleteness, or lack of clarity results in misleading outputs — or worse, hallucinations. Ensuring clean, structured data pipelines and deep, secure data lakes is, hence, crucial.
The role of public policy
While private firms will compete to develop proprietary AI models, public policy has an indispensable role to play in improving access to clean and credible datasets. Specifically, data funded by public resources — be it through taxpayer money or government-administered systems — should be made openly available, subject to appropriate safeguards for privacy and security.
Consider the benefits of publishing granular, high-frequency public datasets. Data from the Unified District Information System for Education (UDISE) can indicate whether schools in remote villages report having electricity connections, offering an independent check on rural electrification claims. Or note that daily data from thermal power plants can help estimate fly ash generation, which cement companies can use to plan procurement more efficiently.
These are just two examples. The economic value hidden in publicly held datasets is immense — and often unlocked in unexpected ways.
Time for a data unlock
India already releases consolidated reports, such as annual tax collections or monthly inflation indices. Richer insights can emerge when more granular data is made available more frequently. As AI models evolve, they need wide, varied, and verifiable inputs to train on. With access to robust public datasets, researchers, entrepreneurs, and investors can build tools that deliver sharper insights, enhance governance, and even predict macroeconomic inflection points.
We have seen this before. In an earlier article in this newspaper (India’s fiscal contract: more incomes in the tax net, April 19, 2024), we noted that publicly available tax data revealed that individual taxpayers’ contribution to tax-to-GDP rose steadily even though average tax rates remained flat, while corporate tax contributions stagnated despite lower effective rates. Imagine what AI could surface if given access to more granular, real-time data across sectors.
A simple but powerful policy reform
The ask is straightforward: make publicly funded datasets public, in machine-readable formats, and updated frequently enough for them to be useful while ensuring that privacy and security are not compromised.
Doing so will not just improve transparency: it will catalyse innovation, enhance productivity, and give India a powerful competitive advantage in the global AI economy.
The writer is co-founder, Thurro.
Disclaimer: Views expressed are personal and do not reflect the official position or policy of FinancialExpress.com. Reproducing this content without permission is prohibited.
