New AI Models: Making a wider connection

Multimodal AI can improve customer understanding

Multimodal AI integrates and processes multiple modalities simultaneously
Multimodal AI integrates and processes multiple modalities simultaneously

By Uma Ganesh

The impact and advantages of AI applications are expanding day by day and businesses are trying to build their unique value propositions that are giving them an edge in the marketplace. The current day AI systems are unimodal that use one type of data and algorithms developed for the specific modality. ChatGPT for instance is built around text content and the output it produces is also in the text format.
Multimodal AI integrates and processes multiple modalities simultaneously and produces more than one type of output. This new paradigm of AI is at the intersection of computer vision, natural language processing (NLP) and audio processing that promises a radical change in the human-machine interaction. It has the ability to create new value for the business by processing and generating insights from multiple data types.
Multimodal AI is essential for the development of robots, enabling them to interact with humans, other robots and interact with the environment. This is made possible because data from multiple devices such as cameras, sensors, GPS and microphones interact with one another to guide the robots.
With multimodal AI, the understanding of the customer profile could become more refined by combining customer feedback, voice conversations, sentiment analysis of social media interactions and user patterns on the website. The combination of voice recognition, NLP and generative AI could create efficient summary and notes of proceedings of meetings. In the context of healthcare, by blending medical images of the patient history with genetic information and diagnostic data, patient treatment could get more focussed with the help of a multimodal AI solution.
Presently designing multimodal AI applications could be challenging as compared to unimodal AI. Firstly, clean datasets that cover multiple modalities are not easily available. Without such multimodal datasets, training for large scale models would not be possible. Further data from each modality that is required to construct the models would be in different formats and representation methods. Therefore the efforts involved in data synchronisation, alignment and ensuring consistency in data quality could be complex and time consuming.
Additionally, while building AI models, use of multiple algorithms and computation as applicable to each modality for deployment in scalable scenarios could also pose challenges.
There are ongoing research studies to address the challenges outlined above and it is only a matter of time that new tools and techniques would be available to migrate from unimodal AI to multimodal AI. As the models become more sophisticated, innovative applications would emerge. As human-computer interactions improve, recommendations and support for decision making would become more impactful. In particular, multimodal AI that combines generative and predictive AI could make businesses more proactive and resilient.

The author is chairperson, Global Talent Track, a corporate training solutions company

Follow us on TwitterFacebookLinkedIn

Get live Share Market updates, Stock Market Quotes, and the latest India News and business news on Financial Express. Download the Financial Express App for the latest finance news.

This article was first uploaded on March ten, twenty twenty-four, at eighteen minutes past two in the afternoon.
Market Data
Market Data