ShortSpan.ai logo

Calculus grounds safer LLM agents and conversations

Agents
Published: Thu, Feb 26, 2026 • By Theo Solander
Calculus grounds safer LLM agents and conversations
New research proposes LLMBDA, a formal model for Large Language Model (LLM) agents that interleave prompts, tool calls and code. It captures how prompt injection propagates and proves noninterference guarantees under clear conditions. The authors also show practical defences: quarantined sub-conversations, isolated generated code, and label-checked tool interfaces.

Agent frameworks now stitch together Large Language Model (LLM) calls, tool invocations and bits of generated code. That convenience creates a close coupling that attackers can exploit through prompt injection. We have patterns and patches, but not much in the way of first principles. A new paper offers exactly that: a formal calculus for agent conversations and their security properties.

What the calculus adds

The work introduces LLMBDA, an untyped call-by-value lambda calculus extended for prompt-response workflows and information-flow control (IFC). It models a primitive that calls an LLM by serialising a value into a prompt and parsing the response as a new term. Two further primitives, fork and clear, let an agent branch or quarantine conversation context. Every value and conversation carries confidentiality and integrity labels drawn from a join-semilattice, so the semantics can track what influences what.

The semantics makes conversations explicit. It uses functions to erase sensitive parts, serialise prompts, generate model outputs and parse them back into terms. A big-step relation maps an initial labelled conversation to an updated one and a labelled result, which is enough to reason about how injected content can steer later computation, trigger tool calls or corrupt final outputs. The model reproduces familiar planner loops and attack paths, including an example where a prompt injection subverts a tool-calling agent.

On the defensive side, the paper formalises patterns practitioners already reach for: quarantined sub-conversations, isolation of generated code and label-based checks at tool boundaries. A dual-LLM setup appears in the examples: a privileged model generates code, while a quarantined peer processes untrusted data. Dynamic label assertions on tool APIs prevent untrusted inputs from influencing trusted outputs. The authors also implement an interpreter in Python that follows the semantics and drives an LLM via the OpenAI Responses API, providing concrete runs of these patterns.

Why this matters

The central result is a termination-insensitive noninterference (TINI) theorem. In short, information at level m cannot affect observations at level n unless the label lattice permits that flow. That covers both confidentiality and integrity in the presence of LLM calls. There is a sharp caveat on label tests: a naive, low-level boolean test of a label breaks noninterference in multi-level lattices. The paper identifies safe routes instead, such as disallowing label tests, restricting to a single non-bottom label, or ensuring tests return results at a high enough level.

If this sounds abstract, it has a familiar ring. Early information-flow work gave operating systems a way to explain, not just enforce, isolation. The macro-virus era taught us to sandbox documents and to treat generated code as suspect until proven otherwise. LLMBDA brings that lineage to agents by turning conversation state into a first-class, labelled object and by making the LLM call an explicit point of policy.

There are limits. The model assumes deterministic LLM responses, which elides sampling noise and non-determinism. Tools and external data are represented inside the language rather than as arbitrary side effects. That keeps proofs manageable, but it also means performance, latency and messy real-world integrations sit outside the frame.

For security teams, the practical thread to pull is clear enough. Treat conversation history as a data flow to be contained. Use quarantined sub-conversations when handling untrusted inputs. Isolate generated code. Put label-aware assertions on tool interfaces. If you must test labels, do so in a way that preserves noninterference. The formalism gives you a way to justify these choices, not just hope they work.

As agents take on more autonomy, we will need probabilistic semantics and richer models of tools. For now, LLMBDA offers a workable foundation for threat modelling and a common language for designing guardrails that hold up under scrutiny.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

The LLMbda Calculus: AI Agents, Conversations, and Information Flow

Authors: Zac Garby, Andrew D. Gordon, and David Sands
A conversation with a large language model (LLM) is a sequence of prompts and responses, with each response generated from the preceding conversation. AI agents build such conversations automatically: given an initial human prompt, a planner loop interleaves LLM calls with tool invocations and code execution. This tight coupling creates a new and poorly understood attack surface. A malicious prompt injected into a conversation can compromise later reasoning, trigger dangerous tool calls, or distort final outputs. Despite the centrality of such systems, we currently lack a principled semantic foundation for reasoning about their behaviour and safety. We address this gap by introducing an untyped call-by-value lambda calculus enriched with dynamic information-flow control and a small number of primitives for constructing prompt-response conversations. Our language includes a primitive that invokes an LLM: it serializes a value, sends it to the model as a prompt, and parses the response as a new term. This calculus faithfully represents planner loops and their vulnerabilities, including the mechanisms by which prompt injection alters subsequent computation. The semantics explicitly captures conversations, and so supports reasoning about defenses such as quarantined sub-conversations, isolation of generated code, and information-flow restrictions on what may influence an LLM call. A termination-insensitive noninterference theorem establishes integrity and confidentiality guarantees, demonstrating that a formal calculus can provide rigorous foundations for safe agentic programming.

🔍 ShortSpan Analysis of the Paper

Problem

This paper studies the semantics and security of automated AI agents that build prompt-response conversations with large language models (LLMs). Such agents interleave model calls with tool invocations and code execution, creating a coupling that expands the attack surface: injected prompts can steer later reasoning, cause unsafe tool calls, or corrupt final outputs. There is no widely accepted formal foundation for reasoning about these behaviours or for proving that proposed defences actually enforce confidentiality and integrity.

Approach

The authors introduce LLMBDA, an untyped call-by-value lambda calculus extended with primitives that model prompt-response conversations and dynamic information-flow control. New primitives are: an at-operator (@e) that serialises a value, sends it as a prompt to an LLM and parses the response; fork to temporarily branch conversation context; and clear to quarantine conversation history. Values and conversations carry labels from a join-semilattice to track confidentiality and integrity. The semantics uses erase, serialise, generate and parse functions to capture what is sent to and returned from the LLM. A big-step semantics relates an initial labelled conversation and program counter to an updated conversation and labelled value. The paper proves a termination-insensitive noninterference theorem under well-defined restrictions on label testing. The authors also implemented an interpreter in Python that follows the semantics and invokes an LLM via the OpenAI Responses API for experiments and examples.

Key Findings

  • The LLMBDA calculus faithfully models planner loops, prompt injection vectors and typical agent behaviours including generated code, tool calls and repair loops; examples include postcode extraction, retry loops for code generation, and synthesis-with-tests.
  • A termination-insensitive noninterference theorem (TINI) is established: information at level m cannot affect observations at level n unless m may flow to n, subject to constraints on label tests and the label lattice.
  • Label testing requires care: a general label-test that returns a low-level boolean can break noninterference in multi-level lattices. The paper identifies three safe alternatives that restore TINI: disallowing label tests, restricting programs to a single non-bottom label (covering the common two-level case), or restricting tests to assertions or a strong test form that return results at an appropriately high level.
  • The calculus models and reproduces a prompt injection attack on a tool-calling agent and demonstrates that a dual-LLM/quarantine pattern (a privileged P-LLM that generates code and a quarantined Q-LLM that processes untrusted data) together with dynamic label checks can prevent the attack.
  • The interpreter shows that dynamic label checks implemented as assertions on tool APIs are semantically sound: trusted outputs cannot be influenced by untrusted inputs under the proven noninterference conditions.

Limitations

The formal model assumes deterministic LLM responses; probabilistic model behaviour is not modelled and would require a probabilistic semantics and corresponding noninterference variants. Tool and external data interactions are modelled as program-resident values rather than arbitrary external calls, which simplifies but limits realism. The examples focus on key patterns rather than exhaustive benchmarks or a full formal model of systems like CaMeL.

Why It Matters

The paper delivers the first lambda-calculus-style formalisation of agentic programming with LLMs together with a provable information-flow guarantee under clear constraints. This gives a rigorous foundation for reasoning about prompt injection, quarantining conversations, isolating generated code and enforcing dynamic label-based policies on tool APIs. Practically, the results help threat modelling, guide design of defensive patterns such as dual-LLM quarantine and label-checked tool interfaces, and supply a basis for future work that adds probabilistic modelling and richer tool interactions.


Related Articles

Related Research on arXiv

Get the Monthly AI Security Digest

Top research and analysis delivered to your inbox once a month. No spam, unsubscribe anytime.

Subscribe