Benchmarking AI models: Metrics, evaluations & leaderboards

Measuring whether your AI security systems are actually getting better.

Presented by Hack The Box

Benchmarking AI models: Metrics, evaluations & leaderboards

HACK THE BOX WEBINAR

25 March 2026

4 PM GMT / 11 AM EST

Online

Free

Limited Spaces Available

Overview

As organizations experiment with AI for security, a key question arises – how do we measure an AI agent’s skills and improvements?

This webinar dives deeper into benchmarking and standard evaluations for AI in cyber operations. We will discuss what performance metrics actually matter: Accuracy in detecting threats? Success rate in exploiting vulnerabilities? speed of response? Mean time between failure?

The session will highlight the approach and methodology of HTB’s AI Range in providing board‑ready scorecards and leaderboards to compare AI models on common security problems. For instance, we will walk through a sample leaderboard of AI agents tasked with an OWASP Top 10 Web App framework – illustrating how the main foundational models stack up in terms of vulnerabilities found, time taken, and more. By establishing accurate comparisons, attendees will learn how to prove whether a security AI tool is effective or getting better over time, crucial for justifying investments in AI.

Key Takeaways:

Learn which KPIs and metrics are most useful for evaluating AI in security and depth of telemetry that HTB AI Range can provide with its benchmarks.
See how the HTB methodology can drive improvement by benchmarking AI agents on a continuously updated pool of challenges.
Get an update on industry efforts toward standardized AI security evaluations to align your team’s testing with broader best practices, and speak the same language of performance as your peers/competitors.

Speakers

Giacomo Bertollo

Strategic Product Marketing Manager, AI Solutions @ Hack The Box

Niko Maroulis

VP Artificial Intelligence @ Hack The Box

HACK THE BOX WEBINAR

BENCHMARKING AI MODELS

25 March 2026

4 PM GMT / 11 AM EST

Online

Free

Limited Spaces Available

Resource Hub

Live Sessions

From the Blog

Industry Reports

Global Cyber Skills Benchmark Report 2025

Why Hack The Box?

Work @ Hack The Box

Featured News

Resource Hub

Live Sessions

Global Cyber Skills Benchmark Report 2025 🔬

Work @ Hack The Box

Benchmarking AI models: Metrics, evaluations & leaderboards

Measuring whether your AI security systems are actually getting better.

Overview

Key Takeaways:

Speakers

Products

Teams

Individuals

Solutions

Job Roles

Industries

Use Cases

Resources

Programs

Company

Contact Us

Partners

Store

Products

Teams

Solutions

Job Roles

Resources

Company

Resource Hub

Live Sessions

Federal judiciary cyber breach: How SolarWinds and Legacy Systems left US courts exposed

Global Cyber Skills Benchmark Report 2025 🔬

Work @ Hack The Box

Access specialized courses with HTB Academy Gold annual plan.

Benchmarking AI models: Metrics, evaluations & leaderboards

Measuring whether your AI security systems are actually getting better.

Overview

Key Takeaways:

Speakers

Never miss another webinar

Products

Teams

Individuals

Solutions

Job Roles

Industries

Use Cases

Resources

Programs

Company

Contact Us

Partners

Store

Products

Teams

Solutions

Job Roles

Resources

Company