ShortSpan.ai logo

Malicious LLM routers hijack agents via tool-call rewrites

Agents
Published: Fri, Apr 10, 2026 • By Adrian Calder
Malicious LLM routers hijack agents via tool-call rewrites
New research shows third‑party LLM API routers can rewrite tool‑call JSON and quietly exfiltrate secrets because there is no end‑to‑end integrity. Among 428 routers, some injected code, used evasion triggers, touched AWS canaries and even drained ETH. Poisoned decoys drew billions of tokens. Client‑side gates help, but provenance remains unproven.

If you run Large Language Model (LLM) agents, you probably front them with whatever cheap API router will multiplex requests across models and vendors. Those routers terminate TLS, forward JSON tool calls, and sometimes chain to other routers. Every hop sees plaintext. There is no cryptographic integrity tying what the provider produced to what your client receives. That is a classic supply-chain trust gap, now sitting on the execution path of autonomous systems.

How it breaks

The paper formalises two core classes: payload injection (AC-1) and secret exfiltration (AC-2). A malicious or compromised router can rewrite function arguments, alter file paths, or slip an extra pip install, then pass the lot downstream as if nothing happened. Dependency-targeted injection (AC-1.a) focuses on package install calls to land persistence. Conditional delivery (AC-1.b) fires only under certain model IDs, tools, or times, so audits miss it.

In the wild, they found 1 paid and 8 free routers actively injecting code; 2 used adaptive triggers; 17 touched researcher-owned AWS canary credentials; one routed traffic that drained ETH from a researcher’s private key. Because each hop is a plaintext proxy, the chain has a weakest-link property: any one bad router can tamper, and honest ones later cannot detect or undo it.

They also poisoned the ecosystem to measure blast radius. An intentionally leaked OpenAI key racked up about 100M GPT-5.4 tokens and more than seven Codex sessions. Weakly configured relay decoys served roughly 2B GPT-5.x tokens, exposed 99 credentials across 440 Codex sessions in 398 projects, and 401 sessions were already running in autonomous YOLO mode. That last bit matters; if the agent executes tools automatically, a trivial JSON rewrite is enough.

To show practicality, the authors built Mine, a research proxy that implements all four behaviours against four public agent frameworks. It achieved 100% rewrite compatibility for payload injection and 99.6% for dependency-targeted cases, with buffered rewriting adding about 0.013 ms per request. In other words, attackers pay basically nothing in latency to stay inline.

Client-side countermeasures helped but did not close provenance. A fail-closed policy gate blocked payload injections and their dependency-targeted variant with a 1.0% false positive rate. Response-side anomaly screening flagged 89% of injections at a chosen operating point. Append-only transparency logs preserved forensics at about 12 MB per 1,000 sessions. None can prove the response really came from the provider.

Limitations apply: the measurement focuses on public markets and researcher-controlled probes; black-box testing can miss triggers; defences were scored on synthetic traffic. Still, the commercial read is straightforward. If you put third-party routers on the path of LLM agents that auto-execute tools, you are outsourcing change control on code and keys. Until providers ship end-to-end response integrity, watch this space.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

Authors: Hanzhi Liu, Chaofan Shou, Hongbo Wen, Yanju Chen, Ryan Jingyang Fang, and Yu Feng
Large language model (LLM) agents increasingly rely on third-party API routers to dispatch tool-calling requests across multiple upstream providers. These routers operate as application-layer proxies with full plaintext access to every in-flight JSON payload, yet no provider enforces cryptographic integrity between client and upstream model. We present the first systematic study of this attack surface. We formalize a threat model for malicious LLM API routers and define two core attack classes, payload injection (AC-1) and secret exfiltration (AC-2), together with two adaptive evasion variants: dependency-targeted injection (AC-1.a) and conditional delivery (AC-1.b). Across 28 paid routers purchased from Taobao, Xianyu, and Shopify-hosted storefronts and 400 free routers collected from public communities, we find 1 paid and 8 free routers actively injecting malicious code, 2 deploying adaptive evasion triggers, 17 touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. Two poisoning studies further show that ostensibly benign routers can be pulled into the same attack surface: a leaked OpenAI key generates 100M GPT-5.4 tokens and more than seven Codex sessions, while weakly configured decoys yield 2B billed tokens, 99 credentials across 440 Codex sessions, and 401 sessions already running in autonomous YOLO mode. We build Mine, a research proxy that implements all four attack classes against four public agent frameworks, and use it to evaluate three deployable client-side defenses: a fail-closed policy gate, response-side anomaly screening, and append-only transparency logging.

🔍 ShortSpan Analysis of the Paper

Problem

This paper studies the security of large language model (LLM) API routers, intermediary services that accept agent requests, terminate client TLS, and forward tool-calling JSON to upstream model providers. Because each hop terminates and re‑originates TLS, routers have full plaintext access to requests and responses and no deployed cryptographic integrity binds a provider-origin tool call to what a client finally receives. The authors characterise this as an LLM supply‑chain trust boundary and show it enables active payload manipulation and silent secret exfiltration, creating real-world risks to agents that auto‑execute tool calls.

Approach

The authors formalise a threat model and taxonomy of adversarial router behaviours: payload injection (AC-1), secret exfiltration (AC-2), and two adaptive evasion variants, dependency-targeted injection (AC-1.a) and conditional delivery (AC-1.b). They measured 28 paid routers purchased from public marketplaces and 400 free routers discovered in community lists and configuration dumps. Two poisoning studies intentionally leaked a researcher OpenAI key and deployed weak relay decoys to observe how ostensibly benign routers can be pulled into the threat surface. They implemented Mine, a research proxy that reproduces the four attack classes and used it to test compatibility against four agent frameworks and to evaluate three deployable client-side defences: a fail‑closed policy gate, response‑side anomaly screening, and append‑only transparency logging.

Key Findings

  • Attack taxonomy and weakest‑link property: a single malicious or compromised router anywhere in a multi‑hop chain can rewrite tool‑call arguments or harvest secrets; downstream honest routers cannot detect or undo modifications.
  • Field measurement: among 28 paid and 400 free routers, 1 paid and 8 free routers actively injected malicious code; 2 routers deployed adaptive evasion triggers; 17 free routers touched researcher‑owned AWS canary credentials; and 1 router drained ETH from a researcher private key.
  • Poisoning blast radius: an intentionally leaked OpenAI key generated 100M GPT‑5.4 tokens and more than seven Codex sessions. Weakly configured decoys served roughly 2B GPT‑5.x tokens, exposed 99 credentials across 440 Codex sessions spanning 398 projects, and 401 sessions were already running in autonomous YOLO mode, where simple injection suffices.
  • Compatibility and practicality: Mine achieved 100% rewrite compatibility for AC-1 across four public agent frameworks and 99.6% for AC-1.a on package‑install calls; buffered rewriting adds negligible latency (average pause 0.004–0.005 ms; per‑request overhead 0.013 ms).
  • Client‑side mitigations: a fail‑closed policy gate blocked AC-1 and AC-1.a samples with a 1.0% false positive rate; response‑side anomaly screening flagged 89% of AC-1 samples at a chosen operating point; transparency logging preserved forensic evidence at modest storage cost (about 12 MB per 1,000 sessions).

Limitations

The study targets publicly reachable commodity router markets and researcher‑controlled probes; it does not exhaustively enumerate private or enterprise deployments. Black‑box probing cannot reveal all latent server‑side triggers, so some adaptive behaviours may remain unobserved. Defence evaluations use synthetic corpora and controlled artifact tests rather than live production traffic. The authors do not release the Mine artefact to limit dual‑use risk.

Implications

Attackers who control or compromise a router can inject arbitrary tool‑call arguments to cause remote code execution, persist via poisoned dependencies, or quietly harvest API keys and other secrets from plaintext traffic. Conditional triggers allow malicious behaviour to remain hidden during finite audits. The findings show a practical supply‑chain vector for compromising agent workflows and argue that provider‑backed response integrity is required to close the provenance gap; until then, failing closed on high‑risk tool workflows, anomaly screening, and transparency logs reduce exposure but do not prove origin authenticity.


Related Articles

Related Research

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.