Human Competition (Capture The Flag) Isn’t Dead. It’s Becoming the Benchmark for the AI Era.
For years, Capture The Flag events have held a familiar place in cybersecurity culture. They were competitions. Training grounds for our minds. Community rituals to showcase mastery. A way for practitioners to test themselves against realistic technical challenges, compare performance with peers, and sharpen the instincts that only come from hands-on experience.
Today, that model is under pressure.
Some now call them cyber ranges. Some call them benchmarks. Others question whether CTFs still matter in a world where AI can solve certain challenges in minutes, accelerate exploitation workflows, and give less experienced users access to techniques that once required years of practice.
It is a fair question. But it is also the wrong conclusion.
CTFs are not irrelevant. They are evolving into something more important: a proving ground for how humans and AI will work together in cybersecurity.
The future of cybersecurity will not be human-only. It will not be AI-only either. It will be human-led, AI-augmented, and increasingly dependent on whether teams can make the right decisions under pressure when automation gets them part of the way there, but not all the way.
AI is already changing the shape of cyber work. It can accelerate static analysis. It can parse source code. It can help with cryptography, reverse engineering, secure coding, and other challenge categories where reasoning is bounded, the inputs are clear, and the path to a solution is more deterministic. In challenge categories where models can read, extract, and reason over available information, AI assistance can move quickly. Black-box environments, by contrast, remain significantly harder because they require exploration, hypothesis testing, and a stronger ability to pivot when the obvious path fails.
That does not mean AI can “do all cybersecurity.” It means AI can do parts of cybersecurity faster.
The harder question is what happens when a challenge requires multi-step reasoning, pivot points, contextual judgment, and the ability to recognize that the current path is not working. That is where today’s models still struggle. For example, Hack The Box’s range named “Cooling Tower” was one of the bespoke ranges used in the AI Security Institute's latest paper on multi-step cyber attack scenarios. In Cooling Tower, the AI models still made limited progress. AISI’s Mythos research is important because it shows both sides of the current reality: frontier models are making real progress on extended attack chains, but they still stall when a range demands sustained context, specialist reasoning, and repeated pivots. For range and lab content creation, that raises the bar: high-fidelity scenarios must test how models and humans reason across connected steps, not just whether they can solve an isolated exploit. The lesson is not that AI is weak; it is that meaningful benchmarking now depends on content sophisticated enough to reveal where AI is advancing, where it breaks down, and where human judgment is still essential.
That same principle translates directly into real-time CTF environments. The UK Financial Services Hackathon, which took place in late April sponsored by Lloyds Banking Group, Google, and Hack The Box, offered a high-fidelity example. One hard challenge remained uncaptured and the top two teams only solved 74/76 challenges, and third place solved 73/76. This requires deeper reasoning steps and the ability to move beyond the most obvious exploitation paths to investigate further underlying issues. In practice, that is exactly the kind of logical flow where a skilled human can still outperform autonomous tooling: knowing when to stop, reassess, and try a different approach.
That is the point too often missed in the “AI will solve everything” narrative. Models can be fast. They can be useful. They can even be impressive. But they are still brittle when the environment demands judgment, adaptation, and context.
At Hack The Box, we have seen this in practice. At the UK Financial Services Hackathon, the winning team was not simply “an AI team.” It reflected a new kind of cyber team: machine learning expertise combined with senior penetration testing experience. The team of two used AI orchestration as part of the workflow, operating multiple agents while still relying on human direction, cost management, and technical judgment to make progress.
That combination is a powerful signal for where the industry is right now and where it’s heading.
The strongest teams will not be the ones that ban AI and pretend the world has not changed. Nor will they be the ones that hand everything to an agent and assume the scoreboard proves capability. The strongest teams will be those that know how to use AI well, understand its limitations, and retain the human expertise required to validate, redirect, and complete the work.
AI is here to stay (well until someone pulls the plug!). The question is no longer whether cybersecurity professionals will use it. They will. The more relevant question is whether organizations can build environments where AI use is transparent, measurable, and meaningful.
This is why CTFs need to evolve.
A modern CTF cannot only ask, “Who captured the flag first?” It also needs to ask: How did they get there? Did they rely on AI? Where did AI help? Where did it hallucinate? How many wrong flags were submitted before the right one? What did the team spend in compute or tokens? Which tasks were accelerated, and which still required expert human intervention?
Those questions begin to turn CTFs from competitions into operational benchmarks.
This is particularly relevant for enterprises. HTB’s workforce research shows that structured, organization-led training programs consistently drive high engagement. Across enterprise CTF data from 2023 to 2025, team interaction rates exceeded 80% for three consecutive years, while year-over-year growth included a 3.5x increase in CTF events, 2.7x growth in participating organizations, and 2.9x growth in total teams.
In other words, organizations are not walking away from hands-on cybersecurity exercises. They are investing more in them.
CTFs provide something AI alone cannot: a safe environment for teams to build confidence, fail fast, recover, collaborate, and learn how they respond under pressure. They give leaders a way to observe not just technical outcomes, but problem-solving behaviors. They reveal where a team is strong, where it over-relies on tooling, and where human judgment still needs to mature.
They also support a broader shift happening across cybersecurity roles. The old separation between offensive and defensive skill sets is breaking down. HTB’s workforce research found increasing overlap between offensive and defensive development: defensive practitioners are engaging in both defensive and offensive training, while offensive practitioners are also incorporating defensive capabilities. The report frames this as a move toward more collaborative, integrated security models — the kind of purple-team thinking modern organizations increasingly need.
A penetration tester who understands detection is more valuable. A defender who understands exploitation is more effective. A machine learning expert who understands offensive workflows can help build new automations. A senior security practitioner who understands AI’s limits can keep those automations grounded in reality.
The UK Financial Services Hackathon demonstrated that shift in practice. It brought together cybersecurity experts across the financial services sector in a high-pressure, live training environment designed to test resilience, strengthen collaboration, and prepare teams for a threat landscape where AI is both a defensive opportunity and an offensive accelerant.
The enduring value of CTFs is not the leaderboard alone. It is not the prize. It is not the fastest solve. The value is in building the human capability to operate in complex environments where the answer is not obvious, where AI may be useful but incomplete, and where the practitioner must know enough to decide whether the machine is right.
This is why lab content quality becomes even more important, not less.
If AI can one-shot a challenge by reading source code and producing a flag, the challenge has not become useless. It has revealed something important about what that type of exercise now measures. It may still be useful for certain learning goals, but it is no longer enough for benchmarking advanced human capability or AI-augmented team performance.
The next generation of CTFs must be more adaptive, more realistic, and more deliberate in what they measure. They must include environments where the path is not linear. They must test pivoting, tradecraft, assumptions, collaboration, and consequence-aware decision-making. They must include challenges that require humans to interpret uncertainty and guide AI rather than simply delegate to it.
At Hack The Box, we are evolving CTFs and cyber ranges to reflect the way cybersecurity work is changing. That means creating content that remains challenging in the age of AI. It means designing scenarios that test both human capability and AI-augmented workflows. It means exploring detection mechanisms and behavioral analysis so organizations can better understand how AI is being used during events. And it means supporting different modes of assessment: environments where AI use is allowed, environments where human-only capability is being tested, and environments where the interaction between the two becomes the benchmark.
Because the truth is simple: banning AI will not prepare teams for the future. Blindly trusting it will not either.
The future belongs to teams that can use AI as a teammate without forgetting that teammates need supervision, challenge, and training. AI may be your teammate, but you still have to train it.
That is a useful way to think about the next phase of cybersecurity readiness. AI will accelerate work. It will change workflows. It will reshape what entry-level and expert-level tasks look like. But it will not remove the need for foundational skills. In fact, it raises the stakes for foundational skills because the human role shifts from doing every step manually to knowing when the machine’s answer is incomplete, misleading, or wrong.
That requires practice. CTFs have always offered that. Now they must offer it for an AI-augmented era. The industry does not need to retire Capture The Flag. It needs to redefine what CTFs are for.
They are no longer just competitions for individual technical skill. They are becoming benchmarks for cyber readiness, AI-augmented collaboration, and the human judgment that remains essential when automation reaches its limits.
AI will keep getting faster. Challenges will keep getting harder. The scoreboard will keep changing.
But the core lesson remains the same: cybersecurity is learned by doing. And in the AI era, doing means learning how to think with machines — without letting machines do all the thinking.
So no, Capture The Flag is not dead. The checkbox version of it is. What replaces it must be harder, more adversarial, and more relevant to the humans who defend systems and the AI agents that will increasingly operate alongside them.