Multi-agent LLM builds digital twins to test exploits

Agents

Published: Mon, Apr 27, 2026 • By Lydia Stratus

Multi-agent LLM builds digital twins to test exploits

A new multi-agent framework pairs LLM-driven reconnaissance with a digital twin of the target to stabilise exploits before live use. It exfiltrates executables, syncs runtime state, and iterates crashes in isolation, then executes a one-shot payload. Tests across eight scenarios report strong efficiency and reduced live-fire risk, with clear operational limits.

Most agent demos look slick until they brick a service at 3am. This research aims at the operational gap: stitch reconnaissance to exploitation and keep the real target upright while you iterate. The system, Automation-Exploit, uses multiple Large Language Model (LLM) agents to exfiltrate executables and context, build a replica, and only then fire a refined payload.

The pipeline is pragmatic. A Navigator agent prunes dead ends to conserve resources. An adversarial hand-off splits planning from code generation: a local uncensored model drafts the rough exploit, a cloud model refines it. A two-stage auditing pass cross-checks outputs to kill off hallucinated findings. Extracted intelligence lands in a structured knowledge base with hash deduplication, then polyglot sandboxes and an autopsy loop stabilise unstable payloads.

The difference-maker is the conditional digital twin. If the target binary comes back, the system spins a cross-platform replica with tight state alignment. It synchronises libc versions and hooks file descriptors so the twin sees what the target would see. Memory-corruption work happens in the twin until the payload runs clean, then a one-shot goes to the live service. In tests, reconnaissance dominated time at 77.2 percent, twin setup was only 2.9 percent, and the replica absorbed 14 denial-of-service conditions that never touched the real box. Action executability held between 85 and 100 percent in five of eight scenarios after the multi-agent sanitisation, and resource use looked sane with efficiency over 96 percent in five cases. Time-to-compromise ranged from about 91 to 171 minutes, mostly gated by I/O for exfiltration and context-building rather than the twin.

Operationally, this architecture lowers the risk of live-fire testing and reduces the skill needed to push from web logic bugs to binary exploitation. It also shows a clear route around cloud safety filters by offloading initial exploit synthesis to local models. The limitations are honest: single-node, air-gapped testbeds only; noisy exfiltration without stealth; no lateral movement or persistence; commercial LLM costs and latency; and an auditing step that still depends on another model. This is not a quiet, enterprise-scale breach kit. It is, however, a credible blueprint for reliable single-target compromise where pulling back a binary is feasible. The open question is how quickly defenders can detect and throttle that exfiltration step, because once the twin is built, the rest of the runbook gets a lot less chaotic.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Automation-Exploit: A Multi-Agent LLM Framework for Adaptive Offensive Security with Digital Twin-Based Risk-Mitigated Exploitation

Authors: Biagio Andreucci and Arcangelo Castiglione

The offensive security landscape is highly fragmented: enterprise platforms avoid memory-corruption vulnerabilities due to Denial of Service (DoS) risks, Automatic Exploit Generation (AEG) systems suffer from semantic blindness, and Large Language Model (LLM) agents face safety alignment filters and "Live Fire" execution hazards. We introduce Automation-Exploit, a fully autonomous Multi-Agent System (MAS) framework designed for adaptive offensive security in complex black-box scenarios. It bridges the abstraction gap between reconnaissance and exploitation by autonomously exfiltrating executables and contextual intelligence across multiple protocols, using this data to fuel both logical and binary attack chains. The framework introduces an adaptive safety architecture to mitigate DoS risks. While it natively resolves logical and web-based vulnerabilities, it employs a conditional isomorphic validation for high-risk memory-corruption flaws: if the target binary is successfully exfiltrated, it dynamically instantiates a cross-platform digital twin. By enforcing strict state synchronization, including libc alignment and runtime file descriptor hooking, potentially destructive payloads are iteratively debugged in an isolated replica. This enables a highly risk-mitigated "one-shot" execution on the physical target. Empirical evaluations across eight scenarios, including undocumented zero-day environments to rule out LLM data contamination, validate the framework's architectural resilience, demonstrating its ability to prevent "live fire" crashes and execute risk-mitigated compromises on actual targets.

🔍 ShortSpan Analysis of the Paper

Problem

The paper addresses the absence of an autonomous, end‑to‑end framework that can bridge high‑level web reconnaissance and low‑level binary exploitation while avoiding live‑fire hazards. Existing enterprise platforms avoid memory‑corruption exploits to prevent denial of service, Automatic Exploit Generation systems suffer semantic blindness and require supplied binaries, and Large Language Model agents encounter safety alignment refusals and risk crashing targets when executing destructive payloads. This fragmentation prevents safe, fully automated exploitation in complex black‑box environments.

Approach

The authors present Automation‑Exploit, a multi‑agent LLM framework that coordinates specialised personas to perform reconnaissance, asset exfiltration, exploit synthesis and risk‑mitigated execution. Key mechanisms include autonomous extraction of target executables and contextual artifacts, an Adversarial Hand‑off that separates semantic planning from code generation (using a local uncensored model to bootstrap sketches and cloud models for refinement), a Navigator agent that performs Adaptive Pruning of futile vectors, and a Two‑Stage Adversarial Auditing protocol that cross‑checks results. For high‑risk memory‑corruption flaws the system conditionally instantiates a cross‑platform Digital Twin that enforces strict state synchronisation (including libc alignment and runtime file descriptor hooking) so destructive payloads are iteratively debugged in an isolated replica before a single validated "one‑shot" delivery to the physical target. The implementation centralises extracted intelligence in a structured knowledge base, applies hash‑based deduplication, uses just‑in‑time polyglot execution sandboxes, and a forensic Autopsy and Self‑Healing loop to refine unstable payloads.

Key Findings

The framework achieved high resource efficiency: the Global Efficiency Ratio exceeded 96% in five evaluated scenarios, showing effective Adaptive Pruning of unproductive attack paths.
Action Executability Rate remained stable between 85% and 100% in five of eight scenarios, indicating strong structural reliability of generated payloads after multi‑agent segregation and sanitisation.
Digital Twin safety layer materially reduced live‑fire risk: reconnaissance consumed 77.2% of total time while Digital Twin instantiation required only 2.9% of time, and the replica absorbed 14 critical denial‑of‑service conditions across evaluated memory‑corruption scenarios, enabling successful one‑shot executions.
False positives from generative hallucinations were eliminated by the Two‑Stage Auditing protocol: none of the Stage‑1 false positives were confirmed after independent Reviewer validation.
Time‑to‑Compromise varied by scenario (examples reported between about 91 and 171 minutes), driven mainly by sequential I/O‑bound exfiltration and contextualisation rather than the Digital Twin overhead.

Limitations

Evaluations were limited to authorised, air‑gapped testbeds and targeted single‑node engagements rather than large‑scale or live production networks. The system generates noisy exfiltration traffic and currently lacks stealth modules to evade modern perimetric defences. High Time‑to‑Compromise and reliance on commercial LLMs incur cost and latency. The Two‑Stage auditing depends on another LLM reviewer, so a residual risk of shared hallucination remains. The framework does not implement post‑exploitation lateral movement or persistent stealth capabilities.

Implications

Offensively, an operator could use such an architecture to automate complete attack chains from reconnaissance to reliable memory‑corruption compromise while minimising the chance of crashing targets during testing. The Digital Twin enables iterative stabilisation of destructive payloads and deterministic register‑level feedback for refining exploits, and the Adversarial Hand‑off can circumvent cloud safety filters by delegating initial synthesis to local models. These capabilities lower the technical barrier for executing deep, single‑target compromises and demonstrate how autonomous agents could streamline complex offensive workflows if adapted by malicious actors.

Links Original paper on arXiv

Multi-agent LLM builds digital twins to test exploits

📋 Original Paper Title and Abstract

Automation-Exploit: A Multi-Agent LLM Framework for Adaptive Offensive Security with Digital Twin-Based Risk-Mitigated Exploitation

🔍 ShortSpan Analysis of the Paper

Problem

Approach

Key Findings

Limitations

Implications

Related Articles

Study maps agentic AI attack surface and risks

LLM agents struggle to reproduce web vulnerabilities

Coding Agents Expose Chains for Silent Compromise

Related Research

Get the Weekly AI Security Digest