Overview
As organizations experiment with AI for security, a key question arises – how do we measure an AI agent’s skills and improvements?
This webinar dives deeper into benchmarking and standard evaluations for AI in cyber operations. We will discuss what performance metrics actually matter: Accuracy in detecting threats? Success rate in exploiting vulnerabilities? speed of response? Mean time between failure?
The session will highlight the approach and methodology of HTB’s AI Range in providing board‑ready scorecards and leaderboards to compare AI models on common security problems. For instance, we will walk through a sample leaderboard of AI agents tasked with an OWASP Top 10 Web App framework – illustrating how the main foundational models stack up in terms of vulnerabilities found, time taken, and more. By establishing accurate comparisons, attendees will learn how to prove whether a security AI tool is effective or getting better over time, crucial for justifying investments in AI.