Major technology companies including Google DeepMind, OpenAI, and Anthropic are intensifying their efforts to address a fundamental security weakness in their artificial intelligence models. This vulnerability, known as an indirect prompt injection attack, allows malicious actors to embed hidden commands in external content like websites and emails, potentially tricking the AI into leaking confidential information or performing unauthorized actions.
The issue stems from the core design of large language models (LLMs), which are built to follow instructions without easily distinguishing between a legitimate user's command and a hidden, malicious one. As businesses and individuals increasingly integrate these AI tools, this security gap presents a growing risk of sophisticated scams and data breaches.
Key Takeaways
- Top AI developers are working to fix a critical flaw called "indirect prompt injection" in their models.
- This vulnerability allows attackers to hide malicious commands in websites or emails to trick AI systems.
- Another significant threat is "data poisoning," where malicious data is inserted during AI training to create backdoors.
- Cybersecurity has become the most cited risk for S&P 500 companies adopting AI in 2024.
- While AI presents new risks, it is also being used to create more adaptive and proactive cybersecurity defenses.
The Nature of the Threat: Prompt Injection and Data Poisoning
The primary concern for AI developers is the indirect prompt injection attack. Unlike a direct attack where a user tries to trick an AI, an indirect attack uses a third-party source. For example, an AI assistant asked to summarize a webpage could unknowingly execute a malicious command hidden within that page's code.
This is a fundamental challenge because LLMs are designed to process and follow instructions from the data they are given. They currently lack a reliable mechanism to determine the trustworthiness of the source of those instructions. This same characteristic makes them susceptible to "jailbreaking," where users craft prompts to bypass the AI's built-in safety rules.
Jacob Klein, who leads the threat intelligence team at AI startup Anthropic, confirmed the widespread nature of the problem. "AI is being used by cyber actors at every chain of the attack right now," Klein stated, highlighting the urgency of developing robust defenses.
A Deeper Vulnerability: Data Poisoning
Beyond prompt injections, researchers have identified another major vulnerability known as data poisoning. This occurs when attackers insert malicious material into the vast datasets used to train AI models. This can create hidden backdoors or cause the model to behave unpredictably under certain conditions.
Recent research from Anthropic, the UK’s AI Security Institute, and the Alan Turing Institute found that these data poisoning attacks are easier to execute than previously thought. This raises concerns about the integrity of the foundational models that power countless applications.
UK Government Issues Warning
In May, the UK’s National Cyber Security Centre issued a formal warning about the threat posed by these AI vulnerabilities. The agency cautioned that the flaw could expose millions of companies and individuals to highly sophisticated phishing attacks and other forms of fraud, as criminals leverage AI to create more convincing and targeted scams.
The Industry's Response: A Defensive Arms Race
In response to these emerging threats, AI companies are deploying a range of defensive strategies. They are actively hiring external testers, often called red teams, to find and exploit weaknesses before criminals do. They are also developing sophisticated AI-powered tools to monitor for and detect malicious activity in real-time.
"When we find a malicious use, depending on confidence levels, we may automatically trigger some intervention or it may send it to human review," explained Anthropic's Jacob Klein, describing their multi-layered approach to security.
Google DeepMind is employing a technique it calls "automated red teaming." This involves using its own internal systems to constantly and realistically attack its Gemini model to uncover security weaknesses proactively. This continuous testing allows the company to patch vulnerabilities as they are discovered.
AI as Both a Weapon and a Shield
While AI empowers attackers, experts emphasize that it is also revolutionizing cybersecurity defense. Ann Johnson, a corporate vice-president at Microsoft, noted that for years, attackers had the advantage because they only needed to find one weakness.
"Defensive systems are learning faster, adapting faster, and moving from reactive to proactive," Johnson said. AI can analyze massive amounts of data to identify threats and automate responses far more quickly than human teams alone, helping to level the playing field.
The Escalating Use of AI in Cybercrime
The accessibility of generative AI has significantly lowered the barrier to entry for cybercriminals. It provides novice hackers with tools to write malicious software and helps professional syndicates automate and scale their operations on an unprecedented level.
Alarming Statistics
- A recent MIT study found that 80% of ransomware attacks examined involved the use of AI.
- In 2024, AI-linked phishing scams and deepfake-related fraud have seen a 60% increase.
- One cybersecurity firm reported a shift from seeing one deepfake attack per month in 2023 to seven per day per customer now.
Jake Moore, a global cybersecurity adviser at ESET, explained that LLMs allow hackers to rapidly generate new malicious code that hasn't been seen before, making it much harder for traditional antivirus software to detect.
The Rise of Hyper-Personalized Attacks
Criminals are using AI to conduct detailed reconnaissance on potential victims. LLMs can efficiently scan public sources like social media profiles, company websites, and news articles to gather personal information. This data is then used to craft highly convincing social engineering attacks.
Vijay Balasubramaniyan, CEO of the voice fraud security firm Pindrop, noted the dramatic increase in realistic voice deepfakes. "Back in 2023, we’d see one deepfake attack per month across the entire customer base. Now we’re seeing seven per day per customer," he said.
This trend makes corporations particularly vulnerable. AI can analyze employees' LinkedIn posts to learn what software and internal systems a company uses, then use that information to identify specific vulnerabilities to exploit. Anthropic recently detailed a case where an actor used its Claude model to automate reconnaissance, credential harvesting, and system infiltration across 17 organizations in an attempt to extort up to $500,000.
As the technological race continues, experts advise companies to remain vigilant, restrict access to sensitive data, and carefully manage the deployment of AI tools. "It doesn’t take much to be a crook nowadays," said Paul Fabara, Visa’s chief risk and client services officer. "You get a laptop, $15 to download the cheap bootleg version of gen AI in the dark web and off you go."





