Malicious LLM routers hijack agents via tool-call rewrites
Agents
If you run Large Language Model (LLM) agents, you probably front them with whatever cheap API router will multiplex requests across models and vendors. Those routers terminate TLS, forward JSON tool calls, and sometimes chain to other routers. Every hop sees plaintext. There is no cryptographic integrity tying what the provider produced to what your client receives. That is a classic supply-chain trust gap, now sitting on the execution path of autonomous systems.
How it breaks
The paper formalises two core classes: payload injection (AC-1) and secret exfiltration (AC-2). A malicious or compromised router can rewrite function arguments, alter file paths, or slip an extra pip install, then pass the lot downstream as if nothing happened. Dependency-targeted injection (AC-1.a) focuses on package install calls to land persistence. Conditional delivery (AC-1.b) fires only under certain model IDs, tools, or times, so audits miss it.
In the wild, they found 1 paid and 8 free routers actively injecting code; 2 used adaptive triggers; 17 touched researcher-owned AWS canary credentials; one routed traffic that drained ETH from a researcher’s private key. Because each hop is a plaintext proxy, the chain has a weakest-link property: any one bad router can tamper, and honest ones later cannot detect or undo it.
They also poisoned the ecosystem to measure blast radius. An intentionally leaked OpenAI key racked up about 100M GPT-5.4 tokens and more than seven Codex sessions. Weakly configured relay decoys served roughly 2B GPT-5.x tokens, exposed 99 credentials across 440 Codex sessions in 398 projects, and 401 sessions were already running in autonomous YOLO mode. That last bit matters; if the agent executes tools automatically, a trivial JSON rewrite is enough.
To show practicality, the authors built Mine, a research proxy that implements all four behaviours against four public agent frameworks. It achieved 100% rewrite compatibility for payload injection and 99.6% for dependency-targeted cases, with buffered rewriting adding about 0.013 ms per request. In other words, attackers pay basically nothing in latency to stay inline.
Client-side countermeasures helped but did not close provenance. A fail-closed policy gate blocked payload injections and their dependency-targeted variant with a 1.0% false positive rate. Response-side anomaly screening flagged 89% of injections at a chosen operating point. Append-only transparency logs preserved forensics at about 12 MB per 1,000 sessions. None can prove the response really came from the provider.
Limitations apply: the measurement focuses on public markets and researcher-controlled probes; black-box testing can miss triggers; defences were scored on synthetic traffic. Still, the commercial read is straightforward. If you put third-party routers on the path of LLM agents that auto-execute tools, you are outsourcing change control on code and keys. Until providers ship end-to-end response integrity, watch this space.
Additional analysis of the original ArXiv paper
📋 Original Paper Title and Abstract
Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain
🔍 ShortSpan Analysis of the Paper
Problem
This paper studies the security of large language model (LLM) API routers, intermediary services that accept agent requests, terminate client TLS, and forward tool-calling JSON to upstream model providers. Because each hop terminates and re‑originates TLS, routers have full plaintext access to requests and responses and no deployed cryptographic integrity binds a provider-origin tool call to what a client finally receives. The authors characterise this as an LLM supply‑chain trust boundary and show it enables active payload manipulation and silent secret exfiltration, creating real-world risks to agents that auto‑execute tool calls.
Approach
The authors formalise a threat model and taxonomy of adversarial router behaviours: payload injection (AC-1), secret exfiltration (AC-2), and two adaptive evasion variants, dependency-targeted injection (AC-1.a) and conditional delivery (AC-1.b). They measured 28 paid routers purchased from public marketplaces and 400 free routers discovered in community lists and configuration dumps. Two poisoning studies intentionally leaked a researcher OpenAI key and deployed weak relay decoys to observe how ostensibly benign routers can be pulled into the threat surface. They implemented Mine, a research proxy that reproduces the four attack classes and used it to test compatibility against four agent frameworks and to evaluate three deployable client-side defences: a fail‑closed policy gate, response‑side anomaly screening, and append‑only transparency logging.
Key Findings
- Attack taxonomy and weakest‑link property: a single malicious or compromised router anywhere in a multi‑hop chain can rewrite tool‑call arguments or harvest secrets; downstream honest routers cannot detect or undo modifications.
- Field measurement: among 28 paid and 400 free routers, 1 paid and 8 free routers actively injected malicious code; 2 routers deployed adaptive evasion triggers; 17 free routers touched researcher‑owned AWS canary credentials; and 1 router drained ETH from a researcher private key.
- Poisoning blast radius: an intentionally leaked OpenAI key generated 100M GPT‑5.4 tokens and more than seven Codex sessions. Weakly configured decoys served roughly 2B GPT‑5.x tokens, exposed 99 credentials across 440 Codex sessions spanning 398 projects, and 401 sessions were already running in autonomous YOLO mode, where simple injection suffices.
- Compatibility and practicality: Mine achieved 100% rewrite compatibility for AC-1 across four public agent frameworks and 99.6% for AC-1.a on package‑install calls; buffered rewriting adds negligible latency (average pause 0.004–0.005 ms; per‑request overhead 0.013 ms).
- Client‑side mitigations: a fail‑closed policy gate blocked AC-1 and AC-1.a samples with a 1.0% false positive rate; response‑side anomaly screening flagged 89% of AC-1 samples at a chosen operating point; transparency logging preserved forensic evidence at modest storage cost (about 12 MB per 1,000 sessions).
Limitations
The study targets publicly reachable commodity router markets and researcher‑controlled probes; it does not exhaustively enumerate private or enterprise deployments. Black‑box probing cannot reveal all latent server‑side triggers, so some adaptive behaviours may remain unobserved. Defence evaluations use synthetic corpora and controlled artifact tests rather than live production traffic. The authors do not release the Mine artefact to limit dual‑use risk.
Implications
Attackers who control or compromise a router can inject arbitrary tool‑call arguments to cause remote code execution, persist via poisoned dependencies, or quietly harvest API keys and other secrets from plaintext traffic. Conditional triggers allow malicious behaviour to remain hidden during finite audits. The findings show a practical supply‑chain vector for compromising agent workflows and argue that provider‑backed response integrity is required to close the provenance gap; until then, failing closed on high‑risk tool workflows, anomaly screening, and transparency logs reduce exposure but do not prove origin authenticity.