Data privacy in the age of chatbots

In his latest book, Nexus: A Brief History of Information Networks from the Stone Age to AI, Yuval Noah Harari has shared an interesting insight from Geoffrey Hinton, a former vice president and engineering fellow at Google and widely regarded as the godfather of artificial intelligence (AI), which in hindsight appears to be quite strategic. Hinton says that Google’s decades of offering free services to users was never really about altruism. It was simply about one thing: data.

Fast forward to the present times, and you will realise the boast wasn’t empty and what that data has provided not only to Google but to all big tech firms today. The government is close to releasing the rules governing the Digital Personal Data Protection Act (DPDP), enacted in 2023, which is seen as a landmark legislation aimed at safeguarding individual privacy and regulating the opaque usage of personal data by corporations and platforms. The question is whether the efforts through DPDP would succeed in checking or curbing this practice, especially with the rise of generative AI (GenAI) platforms like ChatGPT, Gemini, and others.

ALSO READ

Across the aisle by P Chidambaram: Trump embraces Xi, Modi in the cold

Let’s examine. The DPDP Act basically aims to establish a structured framework for how personal data is collected, processed, and protected. It mandates that organisations, termed data fiduciaries under the law, obtain informed, specific, and explicit consent from data principals (users) before processing their data. The law enshrines rights around data access, correction, deletion, grievance redress, and imposes rigorous accountability on entities handling sensitive data. Further, DPDP emphasises purpose limitation, which means that data collected for one purpose cannot be freely re-purposed without fresh consent, and mandates strict security standards to protect data against misuse or breaches.

Herein lies the rub. GenAI platforms like ChatGPT operate on fundamentally different principles. They collect vast amounts of user inputs like questions, prompts, dialogues, etc., to train and refine AI models that generate responses, predictions, or recommendations. Users, in many cases, are unaware of the extent to which their data is retained, reused, or mined to improve the platform’s intelligence or to create new services. The consent mechanisms employed are often broad, linking to comprehensive terms of service that few users scrutinise. This ambiguity and the dynamic nature of AI training data sets risk rendering DPDP’s core principles of informed and specific consent practically ineffective.

Consider a user interacting with a GenAI chatbot. The user may submit queries involving sensitive personal information. Under DPDP, the platform is legally required to clearly state how this data will be used, obtain explicit consent for any purpose beyond immediate response, and allow the user control over their data. However, GenAI systems typically store these conversations as training inputs to improve machine learning models without granular consent management or data segregation. This lack of transparency and control creates a scenario where DPDP safeguards could be circumvented—data intended only for one interaction may be reused for broader training, potentially exposing it to risks or uses never approved by the user.

Clearly, the mushrooming presence of GenAI platforms poses a compliance challenge that the DPDP Act’s current provisions may not fully address. That said, it does not mean that all uses of personal data within AI and digital ecosystems would be equally threatened by this disconnect. One area where the DPDP would continue to retain powerful teeth is in targeted marketing and direct consumer communication. Platforms and companies that collect data for marketing purposes, be it targeted advertising, product recommendations, or personalised sales outreach, would still need to comply with DPDP rules. Targeted marketing is an easier use case to monitor because it typically involves distinct, purpose-driven data flows that can be audited.

ALSO READ

Fifth column by Tavleen Singh: Bihar betrayed again

However, the real challenge would lie in the blurred lines between the uses of data for AI training and for marketing. Data collected ostensibly for improving AI outputs can indirectly feed consumer profiling, fuelling marketing algorithms that target users with tailored advertisements or offers. This means the DPDP would retain influence but would require sophisticated enforcement tools and cross-sector coordination to ensure transparency and compliance.

So, while the DPDP Act is certainly an important step in data governance by calibrating the balance between technological innovation and individual privacy rights, the exponential growth of GenAI platforms has introduced complexities that expose the law’s current implementation gaps.

The recent ministry of electronics and information technology’s AI Governance Guidelines report acknowledges precisely this tension. It recognises that DPDP’s stringent consent and purpose limitation norms do not neatly align with the fluid, multipurpose data training pipelines of GenAI. To tackle this, it proposes a calibrated approach involving enhanced transparency from AI service providers, strengthened grievance mechanisms, and evolving consent management and data portability tools. It also indicates openness to amendments and cross-sector regulatory coordination, with voluntary compliance frameworks that could become mandatory as the ecosystem matures.

However, it remains to be seen how far such measures will succeed in practice. Given the global architecture of GenAI platforms and their rapidly evolving data models, ensuring meaningful personal data protection may prove far more challenging than anticipated. Whatever course the future takes, one thing is certain—that for the government and citizens, the DPDP Act may not be as effective as one thought it to be.

Data privacy in the age of chatbots

DPDP’s consent framework may struggle to keep pace with GenAI platforms, given their global architecture and rapidly evolving data models