ShortSpan.ai logo

Covert Key Exchange Lets LLM Agents Evade Audits

Agents
Published: Tue, Apr 07, 2026 • By Rowan Vale
Covert Key Exchange Lets LLM Agents Evade Audits
New research shows two Large Language Model (LLM) agents can run a hidden key exchange and chat inside a normal-looking transcript. It introduces a bundle sampler and a pseudorandom noise-resilient key exchange, reducing entropy demands. The upshot: transcript-only auditing may miss covert coordination, pushing defenders to rethink monitoring.

Many teams now wire Large Language Model (LLM) agents together across vendors and services, and rely on transcript logs as a safety net. This paper says that safety net has holes. It shows two agents can carry on a parallel secret conversation while producing a transcript that looks like a normal exchange, even to a strong passive auditor who knows the models, the protocol, and each side’s private context. In short: transcript-only auditing cannot, in general, rule out covert coordination.

What the researchers did

Covert communication means hiding one conversation inside another that looks innocent. The authors formalise this for interactive LLM chats. In the easiest case, where two agents share an interaction-unique secret key, they demonstrate that the hidden chat can run at near-optimal rate by exploiting almost all the randomness already present in honest-looking messages. That part is unsurprising but important: shared keys make hiding easy and efficient.

The more interesting bit is keyless operation. The paper introduces a “bundle sampler”, a way to sample responses that preserves the honest message distribution exactly while embedding bits. Think of it as turning the visible chat into a binary symmetric channel: sometimes the hidden bit flips. Crucially, the sender learns whether decoding worked, giving noiseless feedback. That feedback lets protocols adapt and recover without changing how the transcript looks.

To actually agree a secret key without starting from one, the authors define a new cryptographic primitive: pseudorandom noise-resilient key exchange (PNR-KE). In one line, it is a key exchange whose public transcript looks pseudorandom to outsiders and still completes correctly even if a constant fraction of embedded bits are wrong. They show that if enough individual messages in the honest chat have at least constant entropy" target="_blank" rel="noopener" class="term-link">min-entropy (read: the next message is not almost fully predictable), then covert key exchange is possible even with short, adaptive messages.

They give two routes: a construction based on learning sparse parities with noise (LSPN) that offers plausible quasi-polynomial security, and a compiler that turns any key exchange with a pseudorandom-looking transcript into a feedback-PNR-KE when noiseless feedback is available. They also map the limits: public analogues of pseudorandom codes are impossible in this setting, and any non-interactive PNR-KE, or protocol that tries to squeeze out even a weak non-interactive shared bit, faces quasi-polynomial attacks.

There are caveats. The positive results assume access to independent public randomness and that both parties can recognise “eligible” rounds with enough min-entropy. Several constructions rely on cryptographic hardness conjectures and the strongest ones need noiseless feedback. The authors do not pin down practical parameters or performance; that remains open.

Implications for defenders

If your control plane assumes that normal-looking transcripts imply normal behaviour, this paper challenges that assumption. The covert channel can live entirely within the honest distribution, so simple content scanning or statistical watermark checks are not enough. The work suggests shifting assurance towards mechanisms that make covert coordination harder or irrelevant to outcomes.

  • Active testing and protocol constraints: introduce challenge–response checks, limit free-form adaptability, and restrict where entropy can appear in multi-agent workflows.
  • Architectural controls and provenance: reduce cross-organisation agent autonomy, add mediation layers, and require verifiable provenance for critical actions rather than trusting chat logs.
  • Verifiable guarantees: invest in mechanisms that attest to model weights, prompts, and allowed protocols, or that provide auditable commitments beyond the transcript.

This is theory-first work, but it lands a clear message: with enough entropy and the right protocol, two agents can hide in plain sight. That does not make audits useless, but it does mean transcript logs alone are a weak defence in multi-agent systems. Expect more focus on active checks, mediation, and verifiable execution as this line of research matures.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange

Authors: Vinod Vaikuntanathan and Or Zamir
AI agents are increasingly deployed to interact with other agents on behalf of users and organizations. We ask whether two such agents, operated by different entities, can carry out a parallel secret conversation while still producing a transcript that is computationally indistinguishable from an honest interaction, even to a strong passive auditor that knows the full model descriptions, the protocol, and the agents' private contexts. Building on recent work on watermarking and steganography for LLMs, we first show that if the parties possess an interaction-unique secret key, they can facilitate an optimal-rate covert conversation: the hidden conversation can exploit essentially all of the entropy present in the honest message distributions. Our main contributions concern extending this to the keyless setting, where the agents begin with no shared secret. We show that covert key exchange, and hence covert conversation, is possible even when each model has an arbitrary private context, and their messages are short and fully adaptive, assuming only that sufficiently many individual messages have at least constant min-entropy. This stands in contrast to previous covert communication works, which relied on the min-entropy in each individual message growing with the security parameter. To obtain this, we introduce a new cryptographic primitive, which we call pseudorandom noise-resilient key exchange: a key-exchange protocol whose public transcript is pseudorandom while still remaining correct under constant noise. We study this primitive, giving several constructions relevant to our application as well as strong limitations showing that more naive variants are impossible or vulnerable to efficient attacks. These results show that transcript auditing alone cannot rule out covert coordination between AI agents, and identify a new cryptographic theory that may be of independent interest.

🔍 ShortSpan Analysis of the Paper

Problem

This paper asks whether two AI agents, operated by different entities and observed by a very strong passive auditor that knows the models, protocols and private contexts, can carry out a covert shared conversation while producing a visible transcript that is computationally indistinguishable from an honest interaction. The question matters because many deployments rely on transcript-level logging as a safeguard; showing that transcripts can hide coordinated secrets challenges that assumption.

Approach

The authors formalise interactive steganography for language-model conversations and study both the shared-key and keyless settings. They develop a new sampling primitive, the bundle sampler, that preserves the honest message distribution exactly while trading some decoding correctness. To handle the keyless case when individual messages have only constant min-entropy, they introduce the cryptographic primitive pseudorandom noise-resilient key exchange (PNR-KE): a key-exchange whose public transcript is pseudorandom and that tolerates constant noise. They give constructions (notably an LSPN-based protocol) and a feedback-based compiler that converts any pseudorandom-transcript key exchange into a PNR-KE when noiseless feedback is available, and they prove several impossibility or attack results for simpler variants.

Key Findings

  • Shared-key regime: if agents share an interaction-unique secret, they can embed a covert conversation at essentially optimal rate, consuming almost all entropy available in honest message distributions while keeping transcripts indistinguishable from honest runs.
  • Bundle sampler: a novel exact-distribution-preserving sampler that outputs honest-looking messages exactly while inducing a binary symmetric channel for the embedded bits; the sender learns whether decoding succeeded (noiseless feedback), enabling further protocol design.
  • Entropy weakening: using the bundle sampler reduces per-message entropy requirements from Omega(security parameter) to O(log λ), and with PNR-KE to constant min-entropy per eligible message, allowing covert key exchange even with short adaptive messages.
  • PNR-KE constructions and limits: two constructions are given—an LSPN-based scheme yielding plausible quasi-polynomial security, and a feedback-based compiler that turns any pseudorandom-transcript key exchange into a feedback-PNR-KE. Complementary limitations show public analogues of pseudorandom codes are impossible and any non-interactive PNR-KE or protocol yielding a weakly correlated non-interactive bit admits quasi-polynomial attacks.
  • Audit insufficiency: transcript-only passive auditing cannot, in general, rule out covert coordination under the stated assumptions.

Limitations

Security relies on several assumptions: availability of independent public randomness, a recognisable criterion for eligible rounds with sufficient min-entropy, and cryptographic hardness conjectures (e.g. learning sparse parities with noise) that yield quasi-polynomial rather than exponential guarantees for some constructions. The strongest positive results require noiseless feedback or stronger resources. Practical feasibility and parameter choices are left to further study.

Why It Matters

The work shows a plausible pathway for undetectable coordination between AI agents even when transcripts are logged and auditors know models and contexts, undermining reliance on transcript-only monitoring. It identifies a new primitive, PNR-KE, and provides both constructions and lower bounds, guiding defenders about where monitoring is insufficient and prompting consideration of active testing, protocol restrictions, provenance or architectural controls, and new verifiable guarantees to mitigate covert channels in multi-agent AI systems.


Related Articles

Related Research on arXiv

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.