The future of generative AI is all about alignment

– By Jayachandran Ramachandran

The recent advances in the field of Generative AI have created new avenues for reimagining current business processes and identifying use cases which can create immense impact in every business function and stream of work. Every organization is striving hard to adopt GenAI technology to reinvent business models, improve business operations, enhance customer experience, and maximize benefits. The speed of this innovation has brought back the issue of “AI Alignment”, one of the key challenges that is hindering large scale AI adoption in every walk of life.

AI Alignment is required to ensure AI systems do what we intend them to do without any consequences. It is a field of ongoing research that ensures AI meets its goals and objectives ethically without endangering the consumers of its applications. It’s a foregone conclusion that AI is expected to be more powerful in the coming years. With such rapid progress on the horizon, the big question is, “Does AI align with human values and norms without causing societal and ethical issues?” Many research groups have raised concerns, and some have even called for pausing for a brief period while proper guidelines and regulatory frameworks are put in place.

Also Read

Steel prices firming up on strong domestic demand, spike in raw material prices, says Kotak

AI alignment problem and challenges

The share of AI in every device and application which we use is increasing day by day. The smartphone is a classic example with its unique capabilities of fingerprint recognition, face detection, camera filters, auto suggest/correct, voice assistants and many more. Many a times AI is mediator between humans and the external world without us being aware of it. As AI continues to augment human intelligence and drives a lot of decision making, the lines between AI and humans are vanishing at a rapid rate. For example, we are evolving from being consumers of traditional chat bots to co-pilots and soon we will be in an autopilot mode!

While on the surface, AI systems seem to work normally, it cannot guarantee that the output is valid for all scenarios. The challenge is that it’s difficult to comprehend all possible scenarios and desired behavior for each while training AI systems. AI systems are as good as the data on which it is trained and the type of algorithms which it uses.

Training AI models for all possible scenarios

Machine learning involves multiple ways of training AI models such as supervised learning, semi-supervised learning, unsupervised learning, reinforcement learning etc. In the case of supervised learning, labeled data is provided for algorithms to learn and it optimizes for the objective function which defines success. The challenge is to get an “all encompassing” dataset which represents all permutation and combinations of scenarios. It is practically impossible as new scenarios keep emerging and AI models cannot be trained in real time to accommodate constant changes. The problem is no different in other forms of training such as semi-supervised and unsupervised learning.

Exploiting the ‘reward shaping’ method

Reinforcement learning provides an alternative to having large volumes of labelled data. It learns by understanding the environment and learning through trial and error through its actions. It has an objective function to earn maximum rewards/minimize penalties through its actions. Many of the gaming applications, robotic systems and product recommendation systems use this.

The current set of Generative AI models use RLHF (Reinforcement Learning with Human Feedback). Reward shaping is an ongoing activity and is required to fine-tune the models. However, there are many challenges with this approach as well. When the objective of a RL model is to maximize reward points, will it take shortcuts, bypass defined constraints leading to a failed reward framework? Will it forget about its end objective, exploit the system, and behave in a way to just accumulate points? While it’s difficult to define a very robust reward framework in lab settings, the real-world settings will be even more complex, and the risk posed by such systems are unprecedented. This could lead to the misalignment of AI systems that have been previously aligned in simulated environments/lab settings.

Some examples of misaligned systems that may take shortcuts to achieve their objective, which leads to an undesirable output or an unintended impact:

The system increases the number of visits to the website and enhances customer engagement, but does not result in sales.
It recommends more negative news as it catches more eyeballs than positive news leading to societal issues.
If a passenger sets a time constraint to reach a destination, an autonomous car may perform rash driving as its objective is to meet the time constraint.
Provide a toxic answer to a toxic question.
A bad quality software code input will receive a bad quality code output.
A brokerage application advising its client to take mindless investment risks to maximize profits.
Provide hallucinated responses bordering on facts.

There can be many such examples in every walk of life. Without having the complete context of evolving human values, societal norms and ethical considerations, AI systems can become a bane rather than a boon. They may learn the wrong things instead of the right things.

How do we evolve and overcome the alignment issues?

Solving the alignment problem is an ongoing research area and multiple approaches are being recommended and explored while training the AI models.

In case of Reinforcement Learning-based AI systems:

Qualitative over quantitative: A qualitative reward mechanism may be better than a quantitative reward mechanism to AI systems trained using RL. When humans take decisions, we do not attribute a score for every task or action. It’s more of a relative ranking while weighing the options such as “like it”, “it is ok”, “not great” “dislike”, etc. Such qualitative assessments help to create a broader context to learn than points-based narrow systems.

Breakdown tasks into smaller ones: Transform a single objective function into a combination of multiple small objective functions with qualitative rewards for each task. For instance, a co-pilot for an ecommerce customer would help the customer search products, identify the most relevant, collect the best offers, payment options, etc. Each of them is a sub-task for the larger task of converting a browsing customer to a buying customer.

Inverse reinforcement learning: The adoption of an inverse reinforcement learning-driven approach where the agent learns and creates its own reward function based on situations and scenarios for specific situations.

Intrinsically motivated agent that is novelty seeking, gets internally motivated without any stimulus from external environment. Let the systems make mistakes and learn the real world through trial and error.

In the case of Supervised Learning-based AI systems:

Data quality: Improve the quality of the training data, identify what-if scenarios/edge cases. There should more focus on quality than quantity, scenario coverage and training data being representative of real-world data.

Calculated tradeoffs: Adopt the right level of tradeoff between overfit and underfit models.