A new evaluation by the U.S. National Institute of Standards and Technology (NIST) has found that artificial intelligence models from the Chinese developer DeepSeek are significantly behind leading American models in performance, cost-effectiveness, and security. The report, conducted by NIST's Center for AI Standards and Innovation (CAISI), highlights substantial vulnerabilities that could pose risks to consumers and national security.
The government analysis is part of a broader initiative to assess the capabilities and potential dangers of AI systems developed by foreign adversaries. The findings suggest a clear gap between U.S. and Chinese AI technologies, particularly in crucial areas like cybersecurity and resistance to malicious manipulation.
Key Takeaways
- A U.S. government report found that AI models from Chinese developer DeepSeek underperform compared to American counterparts.
- DeepSeek models were found to be less secure, more susceptible to hijacking, and more expensive for comparable performance.
- The most secure DeepSeek model was 12 times more likely to follow malicious commands than U.S. models in tests.
- Despite these issues, the global adoption of models from the People's Republic of China (PRC) has increased by nearly 1,000% since January 2025, driven by DeepSeek's release.
Federal Evaluation Details AI Disparities
The report was released by the Department of Commerce and NIST's CAISI, fulfilling a directive from President Donald Trump’s America’s AI Action Plan. This plan mandates the evaluation of frontier AI models, particularly those originating from the People's Republic of China (PRC), to assess capabilities and potential security threats.
“Thanks to President Trump’s AI Action Plan, the Department of Commerce and NIST’s Center for AI Standards and Innovation have released a groundbreaking evaluation of American vs. adversary AI,” said Secretary of Commerce Howard Lutnick.
“The report is clear that American AI dominates, with DeepSeek trailing far behind. This weakness isn’t just technical. It shows why relying on foreign AI is dangerous and shortsighted.”
CAISI's experts benchmarked three models from DeepSeek (R1, R1-0528, and V3.1) against four prominent U.S. models: OpenAI’s GPT-5, GPT-5-mini, and gpt-oss, along with Anthropic’s Opus 4. The comparison involved 19 different benchmarks covering a wide range of tasks and domains.
What is CAISI?
The Center for AI Standards and Innovation (CAISI) serves as a primary contact point within the U.S. government for the tech industry. It facilitates testing, collaborative research, and the development of best practices for commercial AI systems, aiming to secure and advance American leadership in the field.
Performance and Cost Analysis
The evaluation revealed a consistent performance gap between the U.S. and DeepSeek models across nearly all tested benchmarks. The best-performing U.S. model significantly outpaced DeepSeek's most advanced model, V3.1.
Software Engineering and Cyber Tasks
The disparity was most pronounced in technical fields. For tasks related to software engineering and cybersecurity, the top U.S. model successfully solved over 20% more problems than the best DeepSeek model. This suggests a considerable advantage for U.S. technology in developing and securing complex digital systems.
Economic Disadvantages
Beyond performance, the report identified a clear cost disadvantage for DeepSeek's models. According to the analysis, one of the U.S. reference models delivered a similar level of performance as the best DeepSeek model but at a cost that was, on average, 35% lower across 13 performance benchmarks. This indicates that U.S. models offer superior efficiency and economic value for developers and businesses.
Adoption on the Rise
Despite the identified shortcomings, the global use of PRC-developed AI models has grown rapidly. The report notes that downloads of DeepSeek models on popular model-sharing platforms have surged by nearly 1,000% since January 2025, signaling a growing international footprint.
Significant Security Vulnerabilities Uncovered
A critical focus of the CAISI evaluation was the security and safety of the AI models. The tests exposed severe vulnerabilities in DeepSeek's systems, making them highly susceptible to manipulation and malicious use compared to their U.S. counterparts.
Agent Hijacking Attacks
In simulated tests, AI agents built on DeepSeek's most secure model (R1-0528) were found to be 12 times more likely to obey malicious instructions designed to subvert their intended purpose. In the controlled environment, these hijacked agents performed harmful actions, including:
- Sending phishing emails
- Downloading and executing malware
- Exfiltrating user login credentials
This high susceptibility to hijacking presents a tangible risk for any application or user relying on these models for automated tasks.
Jailbreaking Susceptibility
The evaluation also tested the models' resistance to "jailbreaking," a technique used to bypass safety filters and generate harmful or prohibited content. When subjected to a common jailbreaking method, DeepSeek's R1-0528 model complied with overtly malicious requests 94% of the time. In stark contrast, the U.S. reference models only complied with 8% of such requests, demonstrating far more robust safety measures.
National Security and Influence Concerns
The report extends beyond technical performance and security, touching upon the potential for foreign AI models to be used for malign influence. The evaluation found that DeepSeek's models were more likely to propagate state-sponsored narratives.
According to the findings, the Chinese models echoed inaccurate and misleading narratives associated with the Chinese Communist Party (CCP) four times more often than the U.S. reference models did. This raises concerns about the potential for such AI systems to be used in disinformation campaigns or to subtly influence users.
Secretary Lutnick emphasized the importance of maintaining U.S. leadership in AI development. “By setting the standards, driving innovation, and keeping America secure, the Department of Commerce will ensure continued U.S. leadership in AI,” he stated. The report underscores the strategic importance of developing and promoting trusted, secure, and high-performing AI systems domestically.





