Healthcare deployment cages LLM agents with zero trust

Agents

Published: Thu, Mar 19, 2026 • By Adrian Calder

A healthcare tech company ran nine autonomous Large Language Model (LLM) agents under a zero trust architecture for 90 days. Kernel isolation, credential proxies, strict egress allowlists, and a prompt integrity layer cut real risks. An audit agent found and helped fix four high-severity issues. Tooling ships as open source.

Autonomous agents are turning up in production with powers most developers would deny junior engineers: shell access, file system reads, database queries and free rein on the network. In healthcare, that is a fast path to exposing Protected Health Information (PHI). Recent red teaming has shown these agents follow the wrong instructions, leak data, and fall for indirect prompt injection. The question is not whether this is risky, but whether anyone has shipped a defensible design.

What they built

This paper describes a live deployment at a healthcare technology company that put nine agents behind a zero trust wall for 90 days. The threat model spans six domains: credentials, execution abuse, network egress, prompt integrity, database access and fleet drift. The defence has four layers. First, kernel-level isolation using gVisor on Kubernetes keeps workloads fenced. Second, a credential proxy sidecar means containers never see raw secrets and all calls are policy-enforced. Third, per-agent egress allowlists restrict outbound traffic. Fourth, a prompt integrity framework carries structured, cryptographically verifiable metadata and marks untrusted content so the model can treat it accordingly.

They added an automated security audit agent to continuously scan and remediate configuration issues, and hardened the fleet across three virtual machine image iterations. Over the run, the audit agent surfaced four high-severity findings which were fixed within a day. By the end, six of nine virtual machines had no issues above low severity.

On performance, gVisor introduced 2 to 5 ms extra latency on TCP connections and slowed sequential file reads by 20 to 40 percent. That cost was swamped by model inference times of 500 to 3000 ms, so throughput barely moved.

Mapping to published attack patterns, the stack directly mitigated nine out of eleven cases, partially covered one and left one out of scope. The infrastructure controls did the heavy lifting against execution abuse, credential leakage and data exfiltration. The prompt integrity layer reduced trivial spoofing and some indirect injection, but it cannot guarantee clean separation because instructions and data still sit in the same Large Language Model (LLM) context.

Operationally, the credential proxy stopped secrets sprawl and enforced rate limits and destination allowlists at the edge. Egress controls broke common exfiltration chains, but they needed constant care as DNS and content delivery network addresses rotated, and developers asked for exceptions. The audit agent delivered value quickly, but it also became a privileged target, which the team acknowledged and scoped tightly.

So what for security teams

This is not a grand breakthrough. It is the application of familiar zero trust and Kubernetes hygiene to a new class of untrusted workload. That is exactly what most organisations need. Treat agents as potentially hostile processes even when you own the code. Isolate them, proxy their credentials, pin their egress and be explicit about what counts as trusted input.

The only notably new piece is the prompt integrity framework. It is sensible engineering, and it helps, but it still relies on the model following policy. Prompt injection and identity spoofing are reduced, not solved. If your risk tolerance depends on perfect adherence, you will be disappointed.

Commercially, the implications are straightforward. If you are in a regulated environment, this blueprint maps cleanly to HIPAA Security Rule expectations and shows a viable path from a soft baseline to a hardened fleet. Outside healthcare, the same pattern generalises to any environment where agents touch sensitive systems. The controls and tooling are open source, which lowers adoption friction. The open question remains at the model layer: stronger guarantees for prompt integrity will require capabilities that sit beyond infrastructure. Until then, build the wall, watch the egress, and assume the agent can be turned against you.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

Authors: Saikat Maiti

Autonomous AI agents powered by large language models are being deployed in production with capabilities including shell execution, file system access, database queries, and multi-party communication. Recent red teaming research demonstrates that these agents exhibit critical vulnerabilities in realistic settings: unauthorized compliance with non-owner instructions, sensitive information disclosure, identity spoofing, cross-agent propagation of unsafe practices, and indirect prompt injection through external resources [7]. In healthcare environments processing Protected Health Information, every such vulnerability becomes a potential HIPAA violation. This paper presents a security architecture deployed for nine autonomous AI agents in production at a healthcare technology company. We develop a six-domain threat model for agentic AI in healthcare covering credential exposure, execution capability abuse, network egress exfiltration, prompt integrity failures, database access risks, and fleet configuration drift. We implement four-layer defense in depth: (1) kernel level workload isolation using gVisor on Kubernetes, (2) credential proxy sidecars preventing agent containers from accessing raw secrets, (3) network egress policies restricting each agent to allowlisted destinations, and (4) a prompt integrity framework with structured metadata envelopes and untrusted content labeling. We report results from 90 days of deployment including four HIGH severity findings discovered and remediated by an automated security audit agent, progressive fleet hardening across three VM image generations, and defense coverage mapped to all eleven attack patterns from recent literature. All configurations, audit tooling, and the prompt integrity framework are released as open source.

🔍 ShortSpan Analysis of the Paper

Problem

This paper examines security risks of autonomous AI agents deployed in healthcare, where agents have capabilities such as shell execution, file access, database queries, HTTP requests and multi‑party communication. Empirical red teaming has shown agents can follow non‑owner instructions, disclose sensitive data, be identity‑spoofed, propagate unsafe practices between agents and be corrupted via indirect prompt injection. In environments handling Protected Health Information, each failure mode can become a HIPAA breach. The work seeks a practical, deployable architecture to mitigate these threats.

Approach

The authors produced a six‑domain threat model for agentic AI in healthcare: credential exposure, execution capability abuse, network egress exfiltration, prompt integrity failures, database access risks and fleet configuration drift. They implemented a four‑layer defence in depth for nine production agents over a 90 day deployment: (1) kernel‑level workload isolation using gVisor on Kubernetes; (2) credential proxy sidecars so agent containers never hold raw secrets; (3) per‑agent network egress allowlists enforced via Kubernetes NetworkPolicy; and (4) a prompt integrity framework comprising a cryptographically structured trusted metadata envelope and explicit untrusted content labelling. They also deployed an automated fleet security audit agent to continuously scan and remediate issues, and performed progressive hardening across three VM image generations.

Key Findings

Operational deployment detected and remediated four HIGH severity findings within one day via the automated audit agent; six of nine VMs reached a clean posture above LOW severity.
The four defence layers map to empirical attack patterns: infrastructure controls (gVisor, credential proxy, egress policies) provide robust mitigation for execution abuse, credential leakage and exfiltration; the prompt integrity framework reduces but cannot eliminate prompt injection and identity spoofing risks.
gVisor added modest overhead: roughly 2–5 ms extra TCP connection latency and 20–40 percent slower sequential file reads, which was negligible compared with model inference latency of 500–3000 ms.
Credential proxy sidecars prevented raw API keys from residing in agent containers and enforced request policies such as rate limiting and destination allowlisting, addressing scattered credential exposures found in the baseline fleet.
Network egress allowlisting blocked outbound connections to unauthorised endpoints, breaking common exfiltration chains; operational challenges include DNS/IP rotation and exception management for development.
The prompt integrity layer preserved useful inter‑agent collaboration while reducing susceptibility to trivial spoofing and indirect injection; however it remains the most brittle layer since it depends on the model following policy.
Overall, the architecture provided direct mitigation for nine of eleven documented attack cases, partially mitigated one and identified one case as out of scope.

Limitations

The prompt integrity framework cannot guarantee prevention of prompt injection because instructions and data share the same model context. The audit agent itself is a high‑value privileged target and, although scoped and logged, introduces recursive risk. Egress allowlists require operational upkeep to handle DNS and CDN IP changes. One documented attack remained outside the deployment scope.

Why It Matters

The paper demonstrates a practical, zero‑trust security architecture that maps controls to HIPAA Security Rule provisions and empirical attack patterns, showing a viable path from an unhardened baseline to a hardened production fleet. The combination of infrastructure‑level controls and application‑level mitigations materially reduces the likelihood and impact of PHI exposure. All configurations, audit tooling and the prompt integrity framework are released open source to enable replication and further evaluation across healthcare deployments.

Links Original paper on arXiv

Healthcare deployment cages LLM agents with zero trust

What they built

So what for security teams

📋 Original Paper Title and Abstract

Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

🔍 ShortSpan Analysis of the Paper

Problem

Approach

Key Findings

Limitations

Why It Matters

Related Articles

LLM agents break trust boundaries; favour deterministic controls

Study exposes agentic AI security gaps across models

OpenClaw Case Study Exposes Real Risks in AI Agents

Related Research

Get the Weekly AI Security Digest