At its annual I/O developer conference, Google announced Gemini Omni as its next-generation AI model capable of generating and editing highly realistic video outputs from virtually any form of media input.
Touted by the tech giant as the “next big step” in AI, Gemini Omni is designed as a comprehensive world model. Unlike previous text-to-video platforms, such as Google’s own Veo, Omni is a true multimodal system built on the company’s core Gemini architecture. It utilises advanced reasoning and simulated physics to interpret text, images, and existing video clips simultaneously, producing highly consistent and sophisticated cinematic results grounded in real-world logic.
At launch, the tool allows users to generate and modify videos using mixed media inputs. Google confirmed that features allowing Omni to output standalone text and images will come in a future update.
Gemini Omni brings powerful editing features
Beyond generating content from scratch, Gemini Omni introduces highly advanced video editing capabilities that could fundamentally change how digital video is produced. Users can feed an Omni-generated clip back into the interface to execute complex changes via simple text prompts. They can also upload their original footage to alter or swap out specific elements.
While the technology offers a lot of flexibility for creators, the ability to effortlessly alter reality has triggered immediate concerns regarding the potential spread of deepfakes and misinformation.
To combat these risks, Google announced built-in security guardrails. All content generated or altered by Gemini Omni will automatically embed Google’s proprietary SynthID watermark, ensuring viewers can identify that the media has been modified by AI.
Gemini Omni: Rollout and integration
Google is deploying Gemini Omni across its consumer and developer ecosystems using a phased approach. The model will be split into two tiers:
– Omni Flash
– Omni Pro
Omni Flash is available immediately, while the more powerful Omni Pro model remains in development for a future release.
Omni Flash features are available to select paid subscribers on YouTube Shorts and Google Flow. The technology has also been integrated into a newly redesigned Gemini consumer app, allowing users to apply built-in templates directly to their mobile camera rolls or create personalised digital avatars that mimic their exact look and voice.
Google expects to roll out Gemini Omni to enterprise clients and developers via specialised application programming interfaces (APIs) in the coming weeks, opening the door for widespread third-party software integration.
