New to ShortSpan? We distil the AI-security research that matters into practitioner takeaways — edited by Ben Williams (NCC Group). Get the weekly email

// Analysis

Agent-native immunity targets runtime hijacks in LLMs

Published: Mon, Jun 29, 2026 • By Lydia Stratus

Agents

Agent-native immunity targets runtime hijacks in LLMs

New research argues agents need defences inside their reasoning, not just perimeter rules or training-time alignment. It maps how memory poisoning, tool-chain manipulation and multi-agent protocol attacks hijack behaviour, then proposes an agent-native immune system with layered controls, adaptive parametric vaccines and self-monitoring loops to counter fast-evolving runtime threats.

Move past chatbots and into agents with memory and tools, and the attack surface shifts under your feet. The paper argues the obvious but often ignored point: once an agent reads, writes and calls tools, the real action is inside the reasoning loop. Perimeter filters and training-time alignment do not help when the compromise rides in through the agent’s own memory or coordination protocols.

How the hijack lands

The authors group attacks as agent viruses. The practical ones are familiar in spirit. Memory poisoning plants crafted records in persistent storage, then lets the agent dutifully retrieve them as ground truth. They report that just three malicious memory entries can steer tool selection with over 70 percent success. That sort of backdoor outlives a single chat, and in a team of agents it can propagate as shared context.

Tool-chain manipulation targets the glue code and protocols that let an agent act. If you can nudge the planner to pick a different tool or alter execution parameters, you reroute behaviour without touching the base model. Multi-agent protocol attacks go wider: use collusion or coordinated message patterns to push one agent’s beliefs onto others, spreading a thought virus across a swarm.

What the immune system adds

The proposed Agent-Native Immune System lays out six layers, from a hardware trust root (L0) and a non-cognitive barrier layer (L1) that handles physical and logical isolation, through innate cognitive checks (L2), adaptive parametric vaccines (L3), ecological governance (L4) and collective immunity (L5). The point is defence-in-depth that runs with the agent, not around it.

The sharp distinction is between non-parametric defences and parametric vaccines. Rules and prompts are easy to interpret but easy to route around with context-window tricks or multi-turn setups. Parametric vaccines, such as steering vectors, LoRA adapters and defensive embeddings, change internal representations to block harmful reasoning paths without touching base weights.

A self-monitoring Harness Triad runs continual immune learning: a Self harness flags anomalies, a Meta harness tests candidate vaccines and tracks an Autoimmunity Rate for false positives, and an Auto harness builds and deploys code or adapters. There is also a Thymus Simulator for autoimmune testing and an attestation-backed scheme to distribute vaccines across agent swarms.

There are caveats. The work is largely architectural; large-scale validation under messy, real workloads is still to come. Monitoring and constant vaccine evaluation adds latency and cost. Thresholds trade false positives for missed attacks. Today it targets text agents; cross-modal immunity and standardised protocols remain open. Still, if you thought training-time alignment would save you at 3am, this makes the case it will not.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Agent-Native Immune System: Architecture, Taxonomy, and Engineering

Authors: Bo Shen, Lifeng Chang, Tianyuan Wei, Yunpeng Li, Feng Shi, Yichen Han, Peijie Gao, Shiyi Kuang, Xin Chang, and Dehui Li

The transition from static chat bots to autonomous agents--equipped with persistent memory, tool-use protocols, and multi-agent collaboration--has fundamentally expanded the AI threat landscape. Current defense mechanisms, such as perimeter security and training-time alignment, remain external to the agent's active reasoning loop. Consequently, they fall short: a fully aligned agent remains highly vulnerable to runtime hijacking via memory poisoning, tool-chain manipulation, or multi-agent protocol attacks. To address this critical gap, we introduce the Agent-Native Immune System (ANIS), the first biologically inspired, endogenous defense architecture embedded directly within the agent's cognitive loop. Our framework presents four primary contributions. First, we design a six-layer Immune Tower (L0-L5), distinctly incorporating Barrier Immunity (L1) as a non-cognitive, physical-and-logical isolation layer. Second, we establish a unified taxonomy of Agent Viruses and Agent Vaccines, formalizing the critical distinction between superficial non-parametric defenses and robust parametric vaccines. Third, we conceptualize the Harness Triad--Meta, Self, and Auto--a self-monitoring, meta-cognitive automation backbone that drives Continual Immune Learning (CIL), enabling vaccines to dynamically adapt to novel threats. Finally, we establish a rigorous theoretical demarcation between model alignment and agent immunity: while alignment provides a static "constitutional" value foundation during training, ANIS serves as the dynamic "law enforcement" mechanism during runtime. We conclude by framing open challenges for the field, including immune protocol standardization, novel evaluation metrics such as the Autoimmunity Rate (false-positive intervention rate), and the co-evolutionary dynamics between pathogens and vaccines within collective intelligence ecosystems.

🔍 ShortSpan Analysis of the Paper

Problem

The paper studies runtime threats to autonomous agents that traditional defences fail to address because they are external to the agent's reasoning loop. As agents gain persistent memory, tool-use and multi-agent collaboration, attackers can hijack goals and behaviour via memory poisoning, tool-chain manipulation and protocol-level attacks. Static training-time alignment and perimeter controls cannot reliably prevent such runtime hijacks, so the paper argues for embedding endogenous, adaptive defences inside the agent's cognitive loop.

Approach

The authors propose the Agent-Native Immune System (ANIS), a biologically inspired, six-layer Immune Tower (L0–L5) that combines non-cognitive barrier controls with adaptive cognitive vaccines. They formalise an ontology of agent viruses and vaccines, distinguish non-parametric (rules, prompts) from parametric (steering vectors, LoRA adapters, defensive embeddings) vaccines, and introduce the Harness Triad (Meta, Self, Auto) to operationalise Continual Immune Learning (CIL). The design includes quantitative health indicators—Cognitive Consistency Score, Behavioral Legitimacy Index and Ecological Order Coefficient—and engineering constructs such as steering vectors, LoRA vaccines, a Thymus Simulator for autoimmune testing, and a hardware trust root and attestation-backed vaccine distribution for collective immunity.

Key Findings

The Immune Tower provides a defence-in-depth architecture: L0 (hardware trust root), L1 (barrier immunity), L2 (innate cognitive checks), L3 (adaptive parametric vaccines), L4 (ecological governance) and L5 (collective immunity) to span single-agent and multi-agent threats.
A clear taxonomy unifies attack vectors as agent viruses defined by attack surface, target capability, payload and exploitation mechanism, enabling precise vaccine targeting; memory poisoning examples can persist as backdoors and three crafted memory records have been shown to hijack tool selection with over 70% success.
Parametric vaccines (steering vectors, LoRA) can alter internal representations to block harmful reasoning paths without changing base weights, while non-parametric vaccines are interpretable but vulnerable to context-window or multi-turn circumvention.
The Harness Triad creates an operational loop: Self-harness detects anomalies and requests vaccines; Meta-harness evaluates candidates and measures Autoimmunity Rate; Auto-harness synthesises and deploys harness code or parametric vaccines, enabling Continual Immune Learning and distributed vaccine propagation across swarms.
Collective mechanisms model pathogen spread with epidemiological parameters and require standardised attested vaccine messages so learned immunity can be shared and audited across agents.

Limitations

This work is primarily conceptual and architectural; empirical validation of parametric vaccines, the Harness Triad and AIR under realistic attacks is pending. Endogenous monitoring and continual vaccine evaluation impose computational overhead and latency that may be unacceptable in some real-time settings. There is an inherent autoimmunity trade-off in setting intervention thresholds. The current framework focuses on text-based agents and does not yet address cross-modal (visual or auditory) immunity. Protocol, format and audit standardisation remain unresolved.

Implications

Offensive security implications are central: attackers can exploit persistent memory to install lasting backdoors, manipulate tool-selection chains to reroute actions, and weaponise multi-agent protocols to propagate "thought viruses" or collude for belief manipulation. Because these attacks operate inside the cognitive loop they can bypass perimeter defences and static alignment. In swarms an infected agent can distribute malicious inputs or vaccines, enabling rapid epidemic-like spread. High-frequency vaccine pressure may also drive accelerated evolution of attack techniques, creating a co-evolutionary arms race. The framework therefore highlights novel avenues and metrics for attack development and evaluation, including Autoimmunity Rate, vaccine escape and escape latency.

Links Original paper on arXiv

Agent-native immunity targets runtime hijacks in LLMs

How the hijack lands

What the immune system adds

📋 Original Paper Title and Abstract

Agent-Native Immune System: Architecture, Taxonomy, and Engineering

🔍 ShortSpan Analysis of the Paper

Problem

Approach

Key Findings

Limitations

Implications

Related Articles

LLM Agents Shift Risk to Runtime Supply Chains

Securing agentic LLMs as they meet the web

AgentCanary tests autonomous agents in real environments

Related Research

Get the weekly digest