Prompt Injection: Red Teaming AI & Securing LLMs

Joe Cooney • April 2, 2024

EDITORS NOTE

As enterprises rush to deploy LLMs, few are asking: "Can this be tricked?" In this insight, Principal Software Engineer Joe Cooney explores Prompt Injection, the new SQL Injection of the AI era. By running internal "Red Team" events, we ensure our developers know how to break AI models so they can build ones that hold up in production.

Red-team challenges have been a fun activity for PZ team members in the past, so we recently conducted a small challenge at our fortnightly brown-bag session, focusing on the burgeoning topic of prompt injection. 


Injection vulnerabilities all follow the same basic pattern – un-trusted input is inadvertently treated as executable code, causing the security of the system to be compromised. SQL injection (SQLi) and cross-site scripting (XSS) are probably two of the best-known variants, but other technologies are also susceptible. Does anyone remember XPath injection? 


As generative models get incorporated into more products, user input can be used to subvert the model. This can lead to the model revealing its system prompt or other trade secrets, reveal information about the model itself which may be commercially valuable, subvert or waste computation resources, perform unintended actions if the model is hooked up to APIs, or cause reputational damage to the company if the model can be coerced into doing amusing or inappropriate things. 


As an example, entrepreneur and technologist Chris Bakke was recently able to trick a Chevy dealership’s ChatGPT-powered bot into agreeing to sell him a Chevy Tahoe for $1. Although the U.S. supreme court has yet to rule on the legal validity of a “no takesies backsies” contract (as an employee of X Chris is probably legally obligated to drive a Tesla anyway) it is not hard to imagine a future scenario with steeper financial consequences. 

For this challenge PZers were taking on Gandalf https://gandalf.lakera.ai/ – a CTF created by AI security start-up Lakera https://www.lakera.ai/ (Gandalf is doubtless a way for them to capture valuable training data for their security product). Gandalf progresses in difficulty from young and naive level 1 Gandalf, who is practically begging to give you the password, to level 8 – Gandalf the White 2.0, who is substantially more difficult to trick. 

We time-boxed the challenge to only 20 minutes, and a couple of people were able to beat Gandalf the White 2.0 in this time. Several PZers also found the challenge so absorbing they were still going an hour or more later. Some people found prompts that worked well for several levels, allowing them to rapidly progress to the higher levels of the challenge, only to hit a wall when their chosen technique stopped working. Others were beguiled into solving riddles that Gandalf seemed to be posing to them in the hope that it would give them clues to the secret word for each level. 


Overall, it was a fun and approachable challenge for anyone looking to become more familiar with the issue of prompt injection. 

Key Takeaways for AI Leaders

  • The New Attack Surface: Just as SQL Injection plagued early web apps, Prompt Injection is the Achilles' heel of Generative AI. Hackers can trick models into revealing trade secrets, refunding money, or damaging your brand.
  • Financial Risk: The "Chevy Tahoe for $1" example isn't just a funny story; it's a warning. If your AI agent has transactional power, a prompt injection vulnerability is a direct financial liability.
  • Defensive Culture: You can't secure AI with a firewall. Security requires a culture of "Red Teaming", where developers proactively try to trick their own models to find weaknesses before the public does.

Build Secure AI


Secure Your AI Strategy


Don't launch a vulnerability. Partner with engineers who understand the dark side of LLMs.

Validate Your AI Security

Share This Post

Get In Touch

Recent Posts

Dan Dekel of Patient Zero holding a handwritten love letter next to the OpenClaw lobster logo
By Dan Dekel March 10, 2026
AI's potential has shifted from "future speculation" to "current reality." A hands-on look at OpenClaw agents, automated travel booking, and the security risks of the new frontier.
By Demelza Green March 9, 2026
Patient Zero sponsors Southstart 2026 in Adelaide. Explore the engineering and technical stewardship behind the Zero-Employee Startup lab.
Finalists for the 2026 Women in ICT Awards: Demelza Green and Irina Kudryavtseva
March 6, 2026
Demelza Green and Irina Kudryavtseva named finalists in the 2026 Women in ICT Awards (WIICTA) for innovation and technical excellence.
Finalists for the 2026 Women Leading Tech Awards: Demelza Green and Hanieh Madad.
March 5, 2026
Demelza Green and Hanieh Madad named finalists in the 2026 Women Leading Tech Awards for leadership in Sovereign AI and Engineering Excellence.
More Posts