
NIST says China’s DeepSeek trails U.S. rivals in cybersecurity and reasoning, shows censorship and data‑sharing risks. Experts urge cautious, controlled use.
A U.S. government assessment has raised red flags about Chinese AI vendor DeepSeek, finding its models underperform U.S. rivals on key security and reasoning measures and exhibit politically aligned censorship and data‑sharing behaviors.
Released Sept. 30 by the National Institute of Standards and Technology’s Center for AI Standards and Innovation (CAISI), the report says DeepSeek systems lag OpenAI’s GPT‑5 and Anthropic’s Claude Opus 4 across several benchmarks. NIST found the models are more vulnerable to “agent hijacking” attacks—attempts to capture credentials or redirect actions—and more likely to follow malicious instructions.
CAISI said it produced the evaluation in response to President Donald Trump’s AI Action Plan, which asked the agency to assess models from China.
The report lands about a year after DeepSeek‑R1 briefly stunned the market by approaching Western performance while using far less compute and capital. Although DeepSeek’s open‑source releases never matched the popularity of ChatGPT, they drew a global user base and influenced rivals—reducing momentum for some open‑weight alternatives like Meta’s Llama and nudging OpenAI to offer more open options earlier in the year.
Performance is not uniformly weak, NIST noted. On science Q&A and knowledge tests, DeepSeek models performed comparably to U.S. peers; DeepSeek V3.1 ranked near the top on scientific, mathematical and reasoning tasks.
But American models generally led in software engineering and cybersecurity, according to CAISI’s findings.
Kashyap Kompella, CEO of RPA2AI Research, said the assessment shows how large language models tend to reflect the values and constraints of their creators. He argued censorship is structurally embedded in China‑based systems due to domestic rules and remains even when models are open‑sourced or locally hosted. He also noted that U.S. models are constrained by corporate guardrails and commercial considerations.
Some industry analysts questioned DeepSeek’s earlier performance claims. David Nicholson of Futurum Group said projections that enterprises would retool around custom GPU software stacks to replicate DeepSeek’s gains are unrealistic. The more pressing issue, he said, is how organizations can ensure security as they rely more on LLMs and generate fresh data continuously.
For companies that still want to test DeepSeek, Nicholson advised containing risk by accessing the models through enterprise‑grade platforms such as AWS Bedrock or Microsoft Azure and operating within tightly controlled environments.
Taken together, NIST’s findings intensify scrutiny of DeepSeek at a time when global model ecosystems appear to be specializing—Chinese systems showing strength in scientific reasoning, and U.S. models leading in software and security.
For enterprise buyers, the takeaway is pragmatic: validate performance claims in real workloads, audit security and privacy behavior, and weigh geopolitical and compliance exposure before deployment.