Jailbreaking generative AI

By Siddharth Pai

Continue reading this story with Financial Express premium subscription

Already a subscriber? Sign in

The company Adversa.AI is an “ethical hacking” firm which is focused on artificial intelligence solutions. Its website (bit.ly/3CFCaCv) claims as follows: “Adversa AI is the world-leading Trusted AI Research and Deployment Startup working on applied security measures for artificial intelligence. With a team of multi-disciplinary experts in mathematics, data science, cybersecurity, neuroscience, and psychology Adversa AI is uniquely able to provide holistic, end-to-end support for the entire AI Trust Risk & Security Management.”

Some weeks ago, Wired magazine ran a sensational special (bit.ly/3Ng5yUN) about the hacking of ChatGPT. It turns out that Adversa.AI has been able to build a “universal” jailbreak model that can crack open multiple Large Language Models (LLMs) including GPT 4, Google’s Bard, Anthropic’s Claude, and Microsoft’s Bing Chat system. According to Wired, in March, it took Alex Polyakov, boss of Adversa.AI, less than two hours to break into GPT4 when OpenAI released its latest text generating chatbot model. From there it was easy to get the model to spew out hateful statements against different classes of people, send out phishing emails, and support violence.

Also Read

ChatGPT effect: Jack Ma-backed Ant developing large language model tech

I urge you to visit Adversa.AI’s website which goes into detail about how the firm created this universal model to “jailbreak” generative AI systems. The efficiency with which a single set of commands can flummox these models is genuinely surprising and gives us an abject lesson in the vulnerability of these systems (bit.ly/3PvQi8O). If you have the time, you can dally or Dall-E (pun intended) online, like I did, and see if you can ask a generative AI system on the dangers represented by such jailbreaks. Here are a few:

Unraveling the Pandora’s Box: Generative AI systems have garnered significant attention due to their ability to create, replicate, and learn from vast amounts of data. These AI algorithms have revolutionised various fields, from art and music to language and content creation. However, these systems are carefully designed, trained, and controlled to prevent malicious use or unintended consequences. Jailbreaking generative AI involves tampering with these safeguards, thereby opening Pandora’s digital box.

Destruction of integrity: One of the foremost dangers of jailbreaking generative AI lies in compromising the integrity of the AI system. By circumventing security measures and altering the underlying architecture, individuals can manipulate the system to generate misleading or harmful outputs. This can have severe implications in areas such as fake news propagation, identity theft, or the creation of highly realistic deep-fake videos, which could undermine public trust and wreak havoc on societal stability.

Ethical quagmires: Jailbreaking generative AI also raises complex ethical dilemmas. When individuals gain unrestricted access to these AI systems, they can exploit them for personal gain or malicious purposes. Unauthorised modification can lead to the generation of offensive, discriminatory, or violent content, amplifying existing biases and perpetuating harmful stereotypes. This not only erodes the ethical foundations of AI development but also exacerbates societal divisions and prejudices.

Also Read

OpenAI lobbies in the EU to reduce the AI regulatory burden

Security breaches: The unauthorised tampering of generative AI systems opens the floodgates to potential security breaches. By jailbreaking AI algorithms, individuals bypass the safety mechanisms put in place by developers, leaving the systems vulnerable to exploitation by cybercriminals. These compromised systems can then be utilised to launch large-scale attacks, compromise sensitive data, or even manipulate financial markets. The consequences of such breaches are far-reaching, affecting individuals, organisations, and economies on a global scale.

Intellectual property concerns: Jailbreaking generative AI poses significant challenges to intellectual property rights. AI models developed by companies or research institutions are often protected by copyrights and patents, ensuring fair use, attribution, and the proper distribution of rewards. When individuals jailbreak these systems, they undermine these protections and potentially profit from unauthorised use or distribution of copyrighted content. This not only stifles innovation and disincentivises AI research but also jeopardises the economic viability of AI development.

Regulatory nightmares: The rise of jailbreaking generative AI presents regulatory challenges for governments worldwide. Establishing effective guidelines and policies to curb unauthorised access and modification of AI systems is a complex task. Striking the right balance between innovation and security is crucial, as overly restrictive regulations may impede progress, while lax measures can enable misuse and endanger public safety. Policymakers must navigate this intricate landscape to safeguard society from the perils of unchecked jailbreaking.

The allure of unleashing the full potential of generative AI is tempting, and most consultants are having a field-day playing futurists to a variety of different organisations since generative AI (unlike the far more potent AI lurking in our cell phones for example) has caught the imagination of the laity, many of whom are in responsible jobs and find the need to respond to the opportunities and threats it represents to their organisations. A slick McKinsey study doing the rounds on WhatsApp goes into detail on what the technology can achieve in various sectors such as consumer packaged goods, banking, pharmaceutical companies and so on. It promises significant increases in productivity if Generative AI were to be employed early and arrives at a net increase of 3.3% in global productivity (given of course, that human labour hours are ‘redeployed effectively’).

But all is not rosy; the dangers of jailbreaking it cannot be overlooked. The consequences, ranging from compromised security to ethical dilemmas and legal ramifications, pose a significant threat to individuals, organisations, and society at large. By adhering to established ethical frameworks, collaborating on responsible AI development, and prioritising transparency and accountability, we can unlock the true potential of AI while mitigating the risks associated with jailbreaking generative AI.

Can the world do this? The cynic in me doesn’t think so.

The author is Technology consultant and venture capitalist

By invitation