Overview
This technically-focused webinar dives into red teaming AI models and agents – i.e. stress-testing AI systems to uncover vulnerabilities. We will discuss methods used to jailbreak large language models, induce prompt injections, exploit tool integrations, and chain exploits that force AI agents off their intended path.
Real examples (like how researchers tricked GPT-4 into revealing secrets or behaving maliciously) will illustrate the stakes. Attendees will learn a structured approach to adversarially testing AI: from crafting malicious prompts and inputs, to probing for model blind spots and safety bypasses. We will also cover how
findings from AI red team exercises lead to more robust models (via fine-tuning and safety patches).
In short, this session shows why breaking your AI is the best way to fix it – and how Hack The Box platforms provide a safe sandbox to do exactly that.