5 min read · October 3, 2025

NIST flags security risks in DeepSeek AI

insideaimedia

Inside AI Media · October 3, 2025

In this article

NIST flags security risks in DeepSeek AI

NIST says China’s DeepSeek trails U.S. rivals in cybersecurity and reasoning, shows censorship and data‑sharing risks. Experts urge cautious, controlled use.

Overview

A U.S. government assessment has raised red flags about Chinese AI vendor DeepSeek, finding its models underperform U.S. rivals on key security and reasoning measures and exhibit politically aligned censorship and data‑sharing behaviors.

What NIST Found

Released Sept. 30 by the National Institute of Standards and Technology’s Center for AI Standards and Innovation (CAISI), the report says DeepSeek systems lag OpenAI’s GPT‑5 and Anthropic’s Claude Opus 4 across several benchmarks. NIST found the models are more vulnerable to “agent hijacking” attacks—attempts to capture credentials or redirect actions—and more likely to follow malicious instructions.

More vulnerable to “agent hijacking” attempts to capture credentials or redirect actions.
More likely to follow malicious instructions.
Political bias consistent with positions favored by China’s government, including claims that Taiwan is part of China.
Models share user data with outside parties, naming ByteDance among them.

Policy Context

CAISI said it produced the evaluation in response to President Donald Trump’s AI Action Plan, which asked the agency to assess models from China.

Market Background

The report lands about a year after DeepSeek‑R1 briefly stunned the market by approaching Western performance while using far less compute and capital. Although DeepSeek’s open‑source releases never matched the popularity of ChatGPT, they drew a global user base and influenced rivals—reducing momentum for some open‑weight alternatives like Meta’s Llama and nudging OpenAI to offer more open options earlier in the year.

Performance Highlights

Performance is not uniformly weak, NIST noted. On science Q&A and knowledge tests, DeepSeek models performed comparably to U.S. peers; DeepSeek V3.1 ranked near the top on scientific, mathematical and reasoning tasks.

But American models generally led in software engineering and cybersecurity, according to CAISI’s findings.

Expert Perspectives

Censorship and Model Values

Kashyap Kompella, CEO of RPA2AI Research, said the assessment shows how large language models tend to reflect the values and constraints of their creators. He argued censorship is structurally embedded in China‑based systems due to domestic rules and remains even when models are open‑sourced or locally hosted. He also noted that U.S. models are constrained by corporate guardrails and commercial considerations.

Skepticism on Earlier Claims

Some industry analysts questioned DeepSeek’s earlier performance claims. David Nicholson of Futurum Group said projections that enterprises would retool around custom GPU software stacks to replicate DeepSeek’s gains are unrealistic. The more pressing issue, he said, is how organizations can ensure security as they rely more on LLMs and generate fresh data continuously.

Enterprise Guidance

For companies that still want to test DeepSeek, Nicholson advised containing risk by accessing the models through enterprise‑grade platforms such as AWS Bedrock or Microsoft Azure and operating within tightly controlled environments.

Access via vetted enterprise platforms (e.g., AWS Bedrock, Microsoft Azure).
Operate within tightly controlled, monitored environments.
Prioritize providers that meet security and regulatory expectations.

The Bottom Line

Taken together, NIST’s findings intensify scrutiny of DeepSeek at a time when global model ecosystems appear to be specializing—Chinese systems showing strength in scientific reasoning, and U.S. models leading in software and security.

For enterprise buyers, the takeaway is pragmatic: validate performance claims in real workloads, audit security and privacy behavior, and weigh geopolitical and compliance exposure before deployment.