EM side-channel flags hijacked LLM agent workflows
Agents
LLM agents don’t just chat; they pull data, run tools, and act. That workflow can be hijacked: attackers insert, reorder, or swap skills while keeping the final answer plausible. If the host is compromised, your logs happily corroborate the lie. This paper asks a rude but useful question: can we check the workflow from outside the box?
ClawGuard does exactly that by watching the machine’s electromagnetic (EM) noise. Two nearby software-defined radios (SDRs) listen to the chassis while the agent runs. Each agent skill has a different mix of CPU, memory, I/O, and idle time, which produces a measurable, seconds-scale EM “envelope.” After a per-rig calibration to pick CPU- and memory-correlated frequency bands, the system treats those envelopes as physical evidence of what actually ran.
How it works
The pipeline is drift-aware. It slices each skill into overlapping fine windows inside a coarse envelope, then computes 320 features spanning spectral, temporal, and cross-receiver signals. It normalises per cycle, removes temperature drift, and does feature selection within the training fold. Ensemble classifiers score each window for both skill and attack state; those decisions roll up to a record-level verdict. The evaluation used HackRF One SDRs with temperature logging.
On a 7.82 TB RF corpus with 12,232 records, the production split (11,800 records) hit ROC AUC 0.9945 and PR AUC 0.9305. They pick an operating point with 100% true-positive rate and 1.16% false-positive rate. Replicating after measured carrier re-selection to an 80 MHz and 800 MHz pair yielded 83.6% sub-window accuracy, 88.3% record-vote accuracy, and 90.3% record-level attack recall on the surviving classes. Median post-feature inference latency was 18 ms (p99 29 ms), with batched inference around 0.15 ms per record. They include cost and data-rate notes for planning.
Where it breaks
This is not a magic rootkit detector. It needs physical RF access and per-deployment calibration because useful bands are hardware- and environment-specific. Thermal and cycle drift are loud enough to drown class signals, which makes broad, open-set recognition brittle. The threat model excludes physical tampering and active jamming. An adaptive attacker could try to mimic benign envelopes, chop work into tiny bursts, or fiddle dynamic voltage and frequency scaling (DVFS) to move the harmonics.
So does it matter? Yes, in the narrow but important case where you fear host compromise and can control the physical environment. In a shared rack with RF noise and no proximity guarantee, this will creak. The team also sketches sequence-level edit-distance checks but stops short of scale testing. Still, as a forge-resistant, out-of-band integrity signal for LLM agent workflows, this is one of the few ideas that puts physical reality back in the loop—and the numbers say it’s more than a lab toy.
Additional analysis of the original ArXiv paper
📋 Original Paper Title and Abstract
ClawGuard: Out-of-Band Detection of LLM Agent Workflow Hijacking via EM Side Channel
🔍 ShortSpan Analysis of the Paper
Problem
The paper studies workflow hijacking in autonomous LLM agents, where an attacker inserted, omitted, reordered or substituted tool and skill invocations while preserving plausible high-level semantics. This is critical because conventional defences rely on host-internal telemetry such as audit logs or provenance graphs, which can be forged if the host OS or agent runtime is fully compromised. The work asks whether a passive, out-of-band physical channel can provide a forge-resistant integrity check of agent workflows.
Approach
ClawGuard is an out-of-band monitor that passively captures electromagnetic emanations from a target host using two external software-defined radios positioned close to the chassis. The system treats agent skills as seconds-scale compositional workloads whose mixtures of CPU, DRAM, I/O and idle intervals produce macroscopic EM envelopes. A measured carrier-selection calibration is performed per deployment to pick complementary CPU- and memory-correlated bands. A drift-aware coarse–fine pipeline extracts overlapping fine windows inside coarse skill envelopes and computes 320-dimensional spectral, temporal and cross-receiver features. Preprocessing includes cycle-local normalisation, temperature detrending and training-fold-only ANOVA feature selection. Inference uses ensemble classifiers to produce fine-window skill and attack-state evidence, which is aggregated into a record-level verdict. Evaluations use a large RF corpus collected with HackRF One SDRs and temperature logging.
Key Findings
- Large RF corpus: ClawGuard was evaluated on a 7.82 TB RF corpus with 12,232 records covering 16 benign skills and 22 attack skills, plus a separate replication corpus collected after carrier re-selection.
- High detection performance: On the production split (11,800 records) ClawGuard achieved ROC AUC 0.9945 and PR AUC 0.9305, with an operating point giving 100% true-positive rate and 1.16% false-positive rate.
- Cross-band replication: After measured carrier re-selection to an (80 MHz, 800 MHz) pair, the same pipeline reached 83.6% sub-window accuracy, 88.3% record-vote accuracy and 90.3% record-level attack recall on the surviving class subset.
- Coarse–fine benefit: Decomposing records into fine windows substantially improved detection of short malicious payloads that would be diluted by whole-record features; one coarse–fine configuration reported record accuracies from 0.9252 to 0.9398 and attack recall around 0.83–0.86.
- Operational practicality: Median post-feature inference latency was 18 ms (p99 29 ms); batched inference amortises to ~0.15 ms per record. System cost and data-rate estimates were provided for deployment planning.
Limitations
ClawGuard requires physical RF access and per-rig carrier calibration because informative bands are deployment-specific. Thermal and cycle drift are substantial and can dominate class information, limiting open-set multi-class recognition; large flat classifiers across many skills are fragile under cross-run shifts and small per-class sample sizes. The system assumes the defender controls the SDRs and policy channel; physical-layer attacks such as sensor tampering, active jamming or host-controlled emitters are out of scope. Sequence-level edit-distance verification was designed but not exhaustively evaluated at scale.
Implications
As an offensive implication, adversaries who can compromise the host software can still be detected by an out-of-band EM monitor, but a motivated attacker could adapt to evade ClawGuard by crafting payloads that mimic benign EM envelopes, fragmenting operations into very short bursts, manipulating DVFS or governor settings to alter harmonics, or attempting physical jamming or sensor tampering. These adaptive strategies indicate that attackers might raise the bar but not trivially defeat a properly calibrated out-of-band monitor; ClawGuard therefore provides a hard-to-forge physical integrity signal that complements host telemetry.