Anthropic Claude Opus 4.5 released: How it compares to ChatGPT 5.1 and Google Gemini 3.0

On a proprietary 2-hour take-home exam designed for prospective engineering hires, Claude Opus 4.5 outperformed even top human candidates in technical skills and judgment under pressure. 

anthropic claude 4.5
Opus 4.5 touted as the company's most robustly aligned model yet.

After the AI salvo fired by Google and OpenAI in the last few weeks, Anthropic has jumped into the arena with Claude Opus 4.5. Anthropic says it is the world’s most advanced model for coding, autonomous agents, and computer-use tasks. The release marks a significant leap in AI capabilities, directly targeting rivals like OpenAI’s ChatGPT and Google’s Gemini with better benchmark scoring in real-world engineering and agentic performance. 

At the core of this launch is Opus 4.5’s performance on SWE-bench Verified, a benchmark simulating real-world software engineering challenges. The model achieved an impressive 80.9% accuracy, which is the first to surpass the 80% threshold, thus outshining Google’s Gemini 3 Pro at 76.2% and OpenAI’s GPT-5.1 Codex Max at 77.9%. 

This isn’t just an incremental upgrade – it’s a milestone for AI to accelerate it’s role in code generation and debugging, potentially automating routine tasks that once required hours of human effort.

Anthropic Claude Opus 4.5: What does it offer

On a proprietary 2-hour take-home exam designed for prospective engineering hires, Claude Opus 4.5 outperformed even top human candidates in technical skills and judgment under pressure. 

“The take-home test is designed to assess technical ability and judgment under time pressure,” Anthropic noted in its release. “It doesn’t test for other crucial skills candidates may possess, like collaboration, communication, or the instincts that develop over the years. But this result—where an AI model outperforms strong candidates on important technical skills—raises questions about how AI will change engineering as a profession.” 

For agentic AI, systems that act independently to complete multi-step tasks, Opus 4.5 dominates the τ2-bench evaluation. In a simulated scenario as an airline service agent handling a distressed customer, the model creatively upgraded the cabin class first before legitimately modifying flights, solving the issue where competitors might rigidly refuse changes to basic economy bookings. This demonstrates enhanced reasoning and adaptability, making it ideal for applications in customer support, virtual assistants, and automated workflows.

Claude Opus 4.5 prioritises safety

Safety remains a priority of Anthropic’s approach, with Opus 4.5 touted as the company’s most robustly aligned model yet. It shows marked improvements in resisting prompt injection attacks, deceptive inputs that trick AIs into harmful actions. 

“With Opus 4.5, we’ve made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behaviour,” the firm stated. “Opus 4.5 is harder to trick with prompt injection than any other frontier model in the industry.”

Claude Opus 4.5 is rolling out via the Claude app on Android and iOS, the Claude website, and directly to developers through APIs. Premium access for enterprise users will start around $20 per month, similar to prior Opus iterations. Free tiers will offer limited usage to entice individual creators and hobbyists.

This article was first uploaded on November twenty-five, twenty twenty-five, at three minutes past eight in the night.