Anthropic Exposes Coordinated AI Distillation Attacks from Three Labs

The AI safety and research company Anthropic has identified industrial-scale campaigns by three AI laboratories to illicitly extract capabilities from its flagship model, Claude. The operations involved over 16 million interactions across approximately 24,000 fraudulent accounts, raising significant concerns about national security and the proliferation of unsafe AI systems.

The companies identified are DeepSeek, Moonshot AI, and MiniMax. Anthropic stated that these labs used a technique known as "distillation" to train their own models on Claude's outputs, effectively stealing sophisticated capabilities in a fraction of the time and cost required for independent development.

Key Takeaways

Anthropic identified three AI labs—DeepSeek, Moonshot AI, and MiniMax—conducting illicit "distillation attacks."
The campaigns generated over 16 million exchanges using 24,000 fraudulent accounts to access the Claude AI model.
This practice poses national security risks by creating powerful AI models that lack the safety features built into the original systems.
The attacks were designed to copy Claude's most advanced capabilities, including reasoning, coding, and tool use.
Anthropic is calling for a coordinated response from the AI industry, cloud providers, and policymakers to address the threat.

Uncovering Industrial-Scale AI Capability Theft

Anthropic announced on February 23, 2026, that it had uncovered and stopped three separate, large-scale campaigns aimed at systematically copying the core functionalities of its Claude AI. The company attributed the attacks to three specific AI laboratories: DeepSeek, Moonshot AI, and MiniMax.

These campaigns were not casual use but highly organized efforts. According to the investigation, the labs created thousands of fraudulent accounts and used proxy services to hide their activities and bypass regional access restrictions. Anthropic does not offer commercial access to Claude in China, where these labs are based.

What is AI Distillation?

Distillation is a common and legitimate technique in machine learning. It involves training a smaller, less capable AI model on the outputs of a larger, more powerful one. Legitimate uses include creating more efficient or specialized versions of a company's own models. However, when used by a competitor, it becomes a method to acquire advanced capabilities without undertaking the original research and development, a practice that violates terms of service and intellectual property norms.

The sheer volume of the interactions—over 16 million in total—and the specific nature of the prompts indicated a clear intent to extract training data rather than use the service for its intended purpose. Anthropic has called this a growing threat that requires immediate and coordinated action across the industry.

The National Security Implications

The primary concern raised by Anthropic extends beyond corporate espionage. The company warns that illicitly distilled models present a significant national security risk. AI systems like Claude are developed with extensive safeguards to prevent misuse, such as aiding in the development of bioweapons or executing malicious cyberattacks.

When a model's capabilities are distilled, these crucial safety features are often stripped away. This means that powerful, unprotected AI capabilities could be integrated into military, intelligence, and surveillance systems by authoritarian governments.

By the Numbers: The Scale of the Attacks

Total Interactions: Over 16 million
Fraudulent Accounts: Approximately 24,000
MiniMax: Over 13 million exchanges
Moonshot AI: Over 3.4 million exchanges
DeepSeek: Over 150,000 exchanges

These stripped-down models could be used for offensive cyber operations, sophisticated disinformation campaigns, or mass surveillance. The risk is compounded if these distilled models are open-sourced, as the dangerous capabilities could then proliferate uncontrollably beyond the control of any single entity.

"Foreign labs that distill American models can then feed these unprotected capabilities into military, intelligence, and surveillance systems—enabling authoritarian governments to deploy frontier AI for offensive cyber operations, disinformation campaigns, and mass surveillance."

Furthermore, these attacks undermine the effectiveness of export controls designed to maintain a competitive advantage in AI technology for the United States and its allies. The apparent rapid progress of some foreign labs may not be due to genuine innovation but rather to their success in illicitly extracting capabilities from leading American models.

A Closer Look at the Attackers

Anthropic provided details on the specific methods and targets of each of the three campaigns. The company stated it attributed the operations to each lab with high confidence using IP addresses, request metadata, and other infrastructure indicators.

DeepSeek's Coordinated Campaign

DeepSeek's operation involved over 150,000 exchanges and was notable for its synchronized traffic across multiple accounts. This suggested a coordinated "load balancing" effort to maximize data extraction while avoiding detection. Their prompts specifically targeted Claude's reasoning capabilities and its ability to act as a reward model for reinforcement learning.

A key technique involved asking Claude to explain its internal reasoning step-by-step, effectively generating high-quality "chain-of-thought" training data. Investigators also observed DeepSeek using Claude to generate censorship-safe responses to politically sensitive topics, likely to train its own models to avoid discussing censored subjects.

Moonshot AI's Multi-Faceted Approach

The campaign from Moonshot AI was larger, with over 3.4 million exchanges. It used hundreds of fraudulent accounts across different access pathways, making the coordinated effort more difficult to detect. This operation focused on extracting agentic reasoning, tool use, and coding capabilities. In later stages, the attacks became more targeted, attempting to reconstruct Claude's internal reasoning traces.

MiniMax's Real-Time Pivot

The largest campaign, attributed to MiniMax, generated over 13 million exchanges focused on agentic coding and tool use. Anthropic detected this operation while it was still active, providing unique insight into the attack's lifecycle. When Anthropic released an updated model during the campaign, MiniMax pivoted its operations within 24 hours, redirecting nearly half of its traffic to begin extracting capabilities from the new system immediately.

How the Attacks Were Executed

To circumvent regional blocks, the labs utilized commercial proxy services that resell access to frontier AI models. These services operate what Anthropic calls "hydra cluster" architectures—vast networks of fraudulent accounts that distribute traffic across multiple platforms to avoid single points of failure. In one instance, a single proxy network managed over 20,000 fraudulent accounts.

The attacks are identified by their distinct patterns. While a single prompt might seem harmless, the repetition of thousands of similar, structured prompts across hundreds of accounts reveals the intent. Hallmarks of a distillation attack include:

Massive volume concentrated on a few key capabilities.
Highly repetitive prompt structures.
Content that maps directly to what is most valuable for AI model training.

Anthropic's Response and Call to Action

In response to these findings, Anthropic is investing heavily in new defense mechanisms. These include advanced detection systems to identify distillation patterns and behavioral fingerprinting to spot coordinated activity. The company is also strengthening verification processes for new accounts, a common vector for creating fraudulent access.

However, the company emphasizes that this is not a problem one firm can solve alone. It is actively sharing technical indicators with other AI labs, cloud providers, and government authorities to create a more comprehensive defense against these threats.

By publishing its findings, Anthropic aims to raise awareness across the entire AI ecosystem. The announcement serves as a call for a unified front among industry players and policymakers to develop robust standards and enforcement mechanisms to prevent the illicit proliferation of powerful, unsafe AI technologies.

Key Takeaways

Anthropic identified three AI labs—DeepSeek, Moonshot AI, and MiniMax—conducting illicit "distillation attacks."
The campaigns generated over 16 million exchanges using 24,000 fraudulent accounts to access the Claude AI model.
This practice poses national security risks by creating powerful AI models that lack the safety features built into the original systems.
The attacks were designed to copy Claude's most advanced capabilities, including reasoning, coding, and tool use.
Anthropic is calling for a coordinated response from the AI industry, cloud providers, and policymakers to address the threat.

Uncovering Industrial-Scale AI Capability Theft

What is AI Distillation?

The National Security Implications

By the Numbers: The Scale of the Attacks

Total Interactions: Over 16 million
Fraudulent Accounts: Approximately 24,000
MiniMax: Over 13 million exchanges
Moonshot AI: Over 3.4 million exchanges
DeepSeek: Over 150,000 exchanges

"Foreign labs that distill American models can then feed these unprotected capabilities into military, intelligence, and surveillance systems—enabling authoritarian governments to deploy frontier AI for offensive cyber operations, disinformation campaigns, and mass surveillance."

Key Takeaways

Uncovering Industrial-Scale AI Capability Theft

What is AI Distillation?

The National Security Implications

By the Numbers: The Scale of the Attacks

A Closer Look at the Attackers

DeepSeek's Coordinated Campaign

Moonshot AI's Multi-Faceted Approach

MiniMax's Real-Time Pivot

How the Attacks Were Executed

Anthropic's Response and Call to Action

Related Articles

Wiz Finalizes Google Merger, Affirms Multi-Cloud Strategy

AI-Generated Fake War Footage Floods Social Media

AI Models Corrupted by Just 6,000 Lines of Flawed Code

OpenAI Acquires Promptfoo to Bolster AI Security

Key Takeaways

Uncovering Industrial-Scale AI Capability Theft

What is AI Distillation?

The National Security Implications

By the Numbers: The Scale of the Attacks

A Closer Look at the Attackers

DeepSeek's Coordinated Campaign

Moonshot AI's Multi-Faceted Approach

MiniMax's Real-Time Pivot

How the Attacks Were Executed

Anthropic's Response and Call to Action