AI Pentesting Frameworks: Impressive, Impractical, and Inevitable
OpinionPentesting
A couple of days ago, I published a curated list of every open-source AI-assisted penetration testing tool and framework I could find. The repo grew out of a systematic review anchored by the paper “What Makes a Good LLM Agent for Real-world Penetration Testing?” by Deng et al., which surveyed 28 LLM-based pentesting systems. I also expanded it with additional agents, CTF-focused tools, and benchmarks discovered through further research.
I have played with a number of these frameworks, and built my own multi-agent pentest system using Python smolagents. What follows is an honest assessment of where AI pentesting actually stands today, from someone who has spent considerable time trying to make these tools work in practice.
The Premium Model Problem
Here is the uncomfortable truth about every framework on that list: they almost universally depend on frontier models to function. OpenAI, Anthropic services, and their immediate peers. The moment you step down to smaller, cheaper, or self-hosted models, performance falls off. It is not a gentle degradation, they quickly go off the rails. Planning collapses. Tool selection becomes erratic. Multi-step attack chains that a premium model handles competently become impossible.
The Deng et al. paper quantifies this indirectly. Their failure taxonomy identifies Type B failures, complexity barriers, as responsible for 58% of agent failures. These include context forgetting, premature commitment, and multi-step chain breakdowns. These are exactly the failure modes that worsen dramatically with smaller models. A 70B parameter model might handle a single reconnaissance step, but ask it to chain credential discovery into lateral movement into privilege escalation, and it will lose the plot within two or three steps.
PENTESTGPT V2, the paper’s proposed architecture, achieved 85% on XBOW and compromised 12 of 13 machines in their benchmark, but it did so running on GPT-5. The gap between that result and what a self-hosted model delivers is not incremental. It is categorical.
Where They Break
Even with premium models, these agents have failure modes that any practitioner will recognise immediately. They are susceptible to prompt injection, which is darkly ironic. They make confident mistakes, sometimes hallucinating tool output, sometimes giving advice. They miss obvious things that a junior tester may catch in seconds. And they go off the rails, fixating on a dead-end attack path while ignoring a straightforward vulnerability sitting in plain sight.
The paper calls this the exploration-exploitation imbalance, accounting for 12% of failures. In practice, it feels like more. An agent that has committed to a particular hypothesis about a target becomes remarkably resistant to abandoning it, even when the evidence contradicts it. Experienced pentesters recognise this as a human cognitive bias too, of course, but we have the ability to step back and reassess. Current agents do not, at least not reliably.
The Confidentiality Show-stopper
The capability limitations are real but solvable, at least in principle. Architectures will improve, models will get better, tool interfaces will mature. The problem that has no obvious near-term solution is confidentiality.
Every one of these frameworks, when running on a cloud-hosted model, sends your tool calls, command outputs, target responses, and discovered credentials to a third-party API. Every nmap scan result, every internal IP address, every vulnerability finding, password hashes etc.. All of it transits through, and is processed by, models and infrastructure you do not control.
For a CTF or a personal lab, this is fine. For a real penetration test against a client environment, it is a not fine at all. Penetration test scoping agreements routinely include clauses about data handling, storage, and transmission. Sending your client’s internal network topology to the OpenAI API is not something you can hand-wave past in a post-engagement debrief.
Yes, you can run local models. But as established above, local models cannot currently do the job reliably.
Some frameworks are exploring hybrid approaches. Run the planning on a cloud model, execute tools locally, sanitise outputs before sending them back for analysis. These are interesting engineering exercises, but they add complexity and introduce new failure modes. Sanitisation is itself a hard problem, and incomplete sanitisation is arguably worse than no sanitisation, because it creates a false sense of security.
The Restricted Environment Problem
Related to confidentiality, but distinct, is the question of restricted environments. Many targets of interest exist in isolated networks, classified environments, or infrastructure with strict egress controls. An AI agent that requires a persistent connection to an external API is simply unusable in these contexts. This is not a niche concern. A meaningful proportion of the most sensitive and therefore most important penetration tests occur in exactly these kinds of environments.
What We Tell Ourselves
Here is where I should acknowledge the elephant in the room. The security community’s scepticism about AI pentesting tools is not entirely objective. We have professional incentives to emphasise the limitations. If an AI agent can reliably perform penetration testing, the demand for human pentesters decreases. Not to zero, and not immediately, but meaningfully.
Every concern I have raised above is genuine. The confidentiality issues are real. The capability gaps with smaller models are real. The tendency to miss obvious findings is real. But I would be lying if I said these arguments were not also convenient. They happen to support the conclusion that experienced human testers remain essential, which is exactly the conclusion that experienced human testers want to reach.
The trajectory is clear. PENTESTGPT V2 went from 40% to 85% on XBOW by improving architecture alone, without a model upgrade. Shannon claims 96% on the same benchmark. These numbers were unthinkable two years ago. The confidentiality problem will be addressed, whether through better local models, confidential computing, or contractual frameworks that accommodate AI-assisted testing. The restricted environment problem will narrow as edge inference improves.
We are all updating our CVs. Not because the tools are ready today, but because the gap between where they are and where they need to be is closing faster than any of us are comfortable admitting. Also, we know what the cost drivers are.
Where This Leaves Us
I put together the awesome-ai-pentest repo because this field is moving fast enough to need tracking. New frameworks appear regularly, and we are learning and improving. Benchmarks are being standardised. Architecture patterns are converging on multi-agent systems with structured memory, tool abstraction layers, and evidence-guided planning. The Deng et al. paper provides a useful taxonomy for understanding why agents fail, which is the first step toward making them fail less.
Full autonomy is not here yet, and the confidentiality constraints mean it may not be deployable in many real-world contexts for some time. But the direction of travel is unmistakable. The question is not whether AI will transform penetration testing. It is whether the current generation of practitioners will be the ones driving that transformation, or watching it happen.