Artificial Intelligence
This blog was co-developed by HackerOne and Hack The Box as part of our AI Red Teaming CTF collaboration. It first appeared on HackerOne’s blog.
Large language models (LLMs) and other generative AI systems are changing how we build products, defend networks, and even how attackers operate.
Back at Black Hat, we dropped a live AI red teaming simulation with HackerOne: three escalating prompt injection challenges, one model locked down with safety guardrails, and a brutal time limit.
Hundreds lined up, some cracked it, most didn’t. But everyone walked away with the same takeaway: LLMs aren’t as predictable as they seem.
Now we’re scaling that energy into something bigger: introducing AI Red Teaming CTF: [ai_gon3_rogu3]; a 10-day adversarial challenge starting in September; designed to be creative, hands-on, and just a little unhinged.
Traditionally, red teaming means simulating real-world attackers to find security gaps. When it comes to AI red teaming, it becomes all about targeting models, apps, and systems using language, context, and creative misuse.
Why do this? We can’t evaluate AI safety by inspection alone. Currently, there are no universally accepted best practices for AI red teaming, and the practice is still “a work in progress.”
LLM red‑teaming doesn’t guarantee safety. It only provides a snapshot of how a system behaves under specific adversarial conditions. To improve model resilience, organizations need to test models under realistic attack scenarios and iterate on their defenses.
The field took off after the first public GenAI Red Teaming event at DEF CON 31, organized by the AI Village, but despite the ongoing buzz, most teams still don’t know where to start.
That’s what this CTF is for.
Built by HackerOne and Hack The Box, this CTF reflects the kinds of misuse paths that are tested for in the real world in most AIRT engagements.
The CTF drops you into scenario-based challenges that feel uncomfortably real. You might find yourself:
Convincing a model to escalate a fake crisis
Extracting insider content through indirect prompt manipulation
Bypassing multi-user access controls via language alone
Tricking safety filters into letting something slip
The deeper you go, the more the lines blur between prompt, guidelines, and system behavior. And that’s exactly the point. You’ll see just how much of an AI system’s integrity relies on language, and how easy it is to turn that language against it.
The AI red teaming community is still figuring out best practices, and challenges like [ai_gon3_rogu3] help push the field forward. Whether you’re a veteran researcher or just curious about adversarial AI, bring your creativity and get ready to jailbreak some models.
If this is your first deep dive into AI or ML adversarial testing, you don’t have to go in blind.
HTB Academy recently launched an AI Red Teamer job‑role path in collaboration with Google’s red teamers. The course covers prompt injection techniques, jailbreak strategies, bias exploitation, and other attack vectors—skills closely aligned with what you’ll face in the CTF.
Working through that path or completing other AI‑focused Challenges on the HTB CTF marketplace is a great way to warm up, alone or with your colleagues.
Choose from an extended content library covering multiple categories and featuring the latest techniques to get started, all powered by the Hack The Box content creators. Measure and motivate your security team!
🛠️ Create your free Hack The Box account
🔗 Register for the CTF
📅 Save the dates September 9–18!
🌍 It’s fully remote, so open to all
👕 Limited edition swag for top players
We can’t wait to see what you break and what you learn. The AI red‑teaming community is still figuring out best practices, and these events help push the industry forward. Whether you’re a veteran researcher or just curious about adversarial AI, grab a coffee, bring your creativity and get ready to jailbreak some robots.