Anthropic has exposed and disrupted what it calls the world’s first large-scale cyber espionage campaign orchestrated primarily by AI. Detected in mid-September 2025, the operation, which is attributed to a Chinese state-sponsored group, used Anthropic’s Claude Code tool to infiltrate approximately 30 global targets, including tech giants, financial firms, chemical manufacturers, and government agencies.
The discovery reveals how agentic AI systems are empowering cyber threats, executing complex hacks with minimal human intervention. Anthropic’s rapid response halted the breach, but it highlighted how a tool for innovation can be used by malicious actors to convert it into autonomous weapons.
The AI-driven cyberattack: What happened
The campaign exploited Claude’s intelligence, agency, and tool integration – features that have matured dramatically over the past year. Attackers began by jailbreaking Claude, tricking it into bypassing safety guardrails by framing tasks as “defensive testing” for a fictional cybersecurity firm. They broke malicious actions into innocuous steps, avoiding full context disclosure.
In Phase 1, human operators selected targets and built an autonomous framework using Claude Code for reconnaissance. The AI scanned infrastructures at lightning speed, thousands of requests per second. It identified high-value databases far quicker than human hackers could. Next phases involved Claude researching vulnerabilities, crafting exploit code, harvesting credentials, and exfiltrating data, all with limited human check-ins (just 4-6 per operation).
“Models’ general levels of capability have increased to the point that they can follow complex instructions and understand context in ways that make very sophisticated tasks possible. Not only that, but several of their well-developed specific skills—in particular, software coding—lend themselves to being used in cyberattacks.” Anthropic stated in its report.
The AI even generated post-attack documentation, categorising stolen intel by value. While hallucinations occasionally spoiled results, i.e., fabricating credentials or mistaking public data for secrets, the operation still achieved 80-90% autonomy. This is a scale unattainable by human teams alone.
Anthropic detected and stopped the attack
Anthropic’s Threat Intelligence team, using Claude for analysis, mapped the threat over 10 days, banning accounts, notifying victims, and collaborating with authorities. The company stated, “We’re sharing this case publicly to help those in industry, government, and the wider research community strengthen their own cyber defenses. We’ll continue to release reports like this regularly, and be transparent about the threats we find.”
“Our goal is for Claude—into which we’ve built strong safeguards—to assist cybersecurity professionals to detect, disrupt, and prepare for future versions of the attack,” says Anthropic in the report.
