Prompt Injection Capture-the-flag – Red Team x AI

Joe Cooney • April 2, 2024

Red-team challenges have been a fun activity for PZ team members in the past, so we recently conducted a small challenge at our fortnightly brown-bag session, focusing on the burgeoning topic of prompt injection. 


Injection vulnerabilities all follow the same basic pattern – un-trusted input is inadvertently treated as executable code, causing the security of the system to be compromised.  SQL injection (SQLi) and cross-site scripting (XSS) are probably two of the best-known variants, but other technologies are also susceptible. Does anyone remember XPath injection? 


As generative models get incorporated into more products, user input can be used to subvert the model. This can lead to the model revealing its system prompt or other trade secrets, reveal information about the model itself which may be commercially valuable, subvert or waste computation resources, perform unintended actions if the model is hooked up to APIs, or cause reputational damage to the company if the model can be coerced into doing amusing or inappropriate things. 


As an example, entrepreneur and technologist Chris Bakke was recently able to trick a Chevy dealership’s ChatGPT-powered bot into agreeing to sell him a Chevy Tahoe for $1. Although the U.S. supreme court has yet to rule on the legal validity of a “no takesies backsies” contract (as an employee of X Chris is probably legally obligated to drive a Tesla anyway) it is not hard to imagine a future scenario with steeper financial consequences. 


For this challenge PZers were taking on Gandalf https://gandalf.lakera.ai/  – a CTF created by AI security start-up Lakera https://www.lakera.ai/ (Gandalf is doubtless a way for them to capture valuable training data for their security product). Gandalf progresses in difficulty from young and naive level 1 Gandalf, who is practically begging to give you the password, to level 8 – Gandalf the White 2.0, who is substantially more difficult to trick. 

We time-boxed the challenge to only 20 minutes, and a couple of people were able to beat Gandalf the White 2.0 in this time. Several PZers also found the challenge so absorbing they were still going an hour or more later. Some people found prompts that worked well for several levels, allowing them to rapidly progress to the higher levels of the challenge, only to hit a wall when their chosen technique stopped working. Others were beguiled into solving riddles that Gandalf seemed to be posing to them in the hope that it would give them clues to the secret word for each level. 


Overall, it was a fun and approachable challenge for anyone looking to become more familiar with the issue of prompt injection. 

Share This Post

Get In Touch

Recent Posts

May 20, 2025
We’re proud to announce that Hanieh Madad has been named the winner of the Technical Award at the prestigious 2025 ARN Women in ICT Awards.
Copies of the book DesignedUp are stacked on top of each other on a pink background
By Lennah kuskoff May 5, 2025
At PZ, we’re always exploring how design and technology can better complement each other. We recently hosted a Lunch & Learn featuring Emma Carter, Experience Design Leader and author of DesignedUp, whose talk was a candid, experience-rich exploration of what it takes to create great products, and even better collaboration between disciplines.
By Joe Cooney May 5, 2025
A friend and former colleague reached out to me recently to ask if I could help him fix a couple of bugs in a small project he’d been working on. He was not a developer, but had worked in and around developers for his whole 20+ year career as a business analyst, product owner and program manager. With the advent of tools like Cursor and Lovable his lack of coding ability was (maybe) no longer a barrier to getting some ideas he’d been incubating in his mind for a while, out into the world. With credit card in hand, he dived headfirst into the world of “vibe” coding. We met for coffee, and he showed me the prototype he’d built. I was quite impressed with what he showed me (running on his laptop…deploying it anywhere was a bridge he had not crossed yet) – a capable working prototype that demonstrated the ideas he was trying to prove out. I asked him about the “development experience” and he said it had been great at first, and he’d been able to make a lot of progress quickly, but at some point he hit a bit of a wall where each change he tried to make introduced more issues, and he felt like it was pointless to continue. He’d switched between a few different AI coding tools in an effort to see if the problems he encountered were specific to the tool he’d started with, but without success. The vibes had run out.
By Joe Cooney April 3, 2025
Making cybersecurity fun and engaging with capture-the-flag (CTF) events—boost team collaboration, enhance security skills, and turn dry security practices into an exciting challenge!
More Posts