Artificial intelligence (AI) induced digital revolution seems to be ascertaining how futuristic global technological progress will happen. With machine learning (ML) models considered important for it, researchers have raised the question how they can deal with data from multiple modalities. To face this challenge, multimodal machine learning (MML) mechanisms are being developed to process massive data amounts from modalities such as audio, text, image, video, among others.

“I think MML refers to the field of AI that combines and analyses information from data modalities. By leveraging data sources, the models can gain an understanding of content, leading to improved performance in tasks such as image captioning, speech recognition, and sentiment analysis,” Vipin Vindal, CEO, Quarks Technosoft, an IT consulting and services provider, told FE Blockchain.

From what it’s understood, MML utilises artificial neural connections to process information, which enables them to grow. According to Roboflow, a computer application company, MML comprises three parts, which are unimodal encoders for individual modalities, a fusion structure for amalgamating traits of every input modality, and a classifier for receiving fused data. The company also mentioned that MML use cases have been text-to-image creation, visual question answering, and natural language for visual reasoning.

“MML can enhance depth and accuracy of risk assessment. In brand protection, this technology can enable real-time monitoring, identifying potential threats to brand reputation across multiple channels. Overall, advantages of MML should make it a tool in augmenting risk management effectiveness and resilience,” Sanjay Kaushik, managing director, Netrika Consulting, a digital forensics company, said.

Also Read TAC Security’s Trishneet Arora on how AI in Cybersecurity can change the game of security

In terms of benefits, AIMultiple, a technology industry analyst, highlighted that MML can help enhance a model’s abilities by giving it a human touch, as well as increase its accuracy. However, challenges associated with MML include representation of collected data, alignment of different modalities, and interpretation drawbacks, as stated by Serokell, a software development company. Reportedly, companies making use of MML include Meta, Google, Japanese scientists and researchers, among others. For example, Meta is working on a MML-backed digital assistant to make human interactions, and can ensure image conversion into text and vice-versa. It’s believed that scientists and researchers of Yahoo! Japan, University of Tokyo, and ML-based Mantra created a MML model to translate comic book text from speech bubbles, with it being created to translate Japanese comics.

Data from Fortune Business Insights, a market researcher, stated that global ML market will reach $26.03 billion in 2023 and $225.91 billion by 2030, at a 36.2% compound annual growth rate (CAGR) during the forecast period. As per ABI Research, a technology intelligence firm, total number of projects installed with MML applications will be 514.1 million in 2023. MML’s future, as predicted by Infosys, a technology services firm, should focus on increasing human-machine communication. Applications which are expected to benefit MML are advanced computer vision, cross-modal transfer learning, context-aware systems, among others.

“In summary, MML is a field that can leverage information to improve performance, enhance understanding, and unlock possibilities in applications. The future of MML can involve advancements in deep learning, ethical considerations, multimodal reinforcement learning, and interdisciplinary research,” Sumit Ghosh, co-founder and CEO, Chingari, a Web3.0 video application, concluded.

Follow us on Twitter, Facebook, LinkedIn