Revisiting the Blackboard to Test Agent Security

Agents

Published: Fri, Oct 17, 2025 • By Theo Solander

Terrarium repurposes the blackboard architecture to study safety, privacy and security in Large Language Model (LLM) based multi-agent systems. The testbed maps attack surfaces—misalignment, malicious agents, compromised communication and data poisoning—and reproduces high-impact attacks, including 100% successful privacy and availability exploits in experiments. It helps teams prototype defences like provenance, access controls and anomaly detection.

Large Language Model (LLM) based multi-agent systems are moving from demos into real tasks such as scheduling and home automation. That shift exposes familiar patterns: coordination multiplies capability and, with it, attack surface. Terrarium revisits an old idea, the blackboard architecture, as a deliberately simple, configurable testbed to analyse how those risks play out.

What Terrarium does

Terrarium reuses the blackboard concept as a communication proxy where agents post observations and proposals. It defines five abstractions—agents, environment, blackboards, tools and the communication protocol—and composes them to solve instruction-augmented distributed constraint optimisation problems (DCOPs). The implementation supports different model backbones, transcript logging for forensics, per-message recipient controls and optional encryption and authentication. The authors implement three representative scenarios—meeting scheduling, a personal assistant and a smart home—to exercise real-world coordination patterns and attack modes.

The framework maps a clear set of attack vectors. Misalignment can arise when one agent pursues incompatible objectives. Malicious agents can conspire to exfiltrate private values. Communication channels can be compromised and poisoned inputs can steer planning rounds. In the reported experiments privacy and availability attacks reached 100% success in some runs, and a single adversarial agent or an external poisoning attack could meaningfully misalign outcomes. The paper also notes model-size effects: smaller models sometimes fail to assign actions comprehensively, which affects joint utility and the completeness of evaluations.

Terrarium is explicitly a research platform, not a production design. Its value lies in reproducible, seeded configurations and a modular surface for testing mitigations such as secure inter-agent protocols, data provenance, access controls and anomaly detection. The authors stress that real deployments would need protocol and memory optimisations beyond the prototype.

What the pattern suggests teams should do now

The historical lesson is simple: whenever systems grow more connected, attack opportunities grow faster than teams appreciate. Practitioners should treat multi-agent deployments as higher risk than single-model services. That means exercising the same defensive priorities at the agent level that we use across networks: enforce least privilege for tools and data access, enable per-recipient controls and cryptographic authentication of messages, and keep full, tamper-evident transcripts for forensics.

Beyond hygiene, use Terrarium-style red teams to probe misalignment and data-leakage scenarios before deployment. Prototype data-provenance checks and anomaly detection tuned for coordination failures, and stress-test for poisoning by varying the number and timing of malicious inputs. Finally, monitor model behaviour under partial failure; the paper shows smaller models can produce incomplete plans, which can mask or amplify attacks.

Terrarium does not close the door on multi-agent systems, but it reminds teams of a familiar cycle: new coordination capability invites new adversaries. The practical response is not to stop building, but to build with visibility, containment and iterative testing so that a useful system does not become an efficient vector for harm.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies

Authors: Mason Nakamura, Abhinav Kumar, Saaduddin Mahmud, Sahar Abdelnabi, Shlomo Zilberstein, and Eugene Bagdasarian

A multi-agent system (MAS) powered by large language models (LLMs) can automate tedious user tasks such as meeting scheduling that requires inter-agent collaboration. LLMs enable nuanced protocols that account for unstructured private data, user constraints, and preferences. However, this design introduces new risks, including misalignment and attacks by malicious parties that compromise agents or steal user data. In this paper, we propose the Terrarium framework for fine-grained study on safety, privacy, and security in LLM-based MAS. We repurpose the blackboard design, an early approach in multi-agent systems, to create a modular, configurable testbed for multi-agent collaboration. We identify key attack vectors such as misalignment, malicious agents, compromised communication, and data poisoning. We implement three collaborative MAS scenarios with four representative attacks to demonstrate the framework's flexibility. By providing tools to rapidly prototype, evaluate, and iterate on defenses and designs, Terrarium aims to accelerate progress toward trustworthy multi-agent systems.

🔍 ShortSpan Analysis of the Paper

Problem

The paper proposes Terrarium, a modular testbed for studying safety privacy and security in large language model based multi agent systems MAS. It reframes inter agent collaboration within a configurable blackboard architecture to enable fine grained analysis of how private data constraints and adversarial actions influence coordination. The work identifies key attack vectors including misalignment malicious agents compromised communication and data poisoning, and aims to map attack surfaces and prototype defenses to support trustworthy multi agent systems in real world tasks such as scheduling and planning.

Approach

Terrarium repurposes the classical blackboard design as a communication proxy to support inter agent coordination within instruction augmented distributed constraint optimisation problems DCOPs. It defines five abstractions agents environment blackboards tools and the communication protocol and implements them across multiple problem levels. Agents are largely based on large language models with access to tools and an environment that returns observations. The joint objective is ground truth and the system uses a factor graph to configure blackboards and agent memberships. Terrarium supports a modular configurable architecture enabling different backbones back end servers and communication protocols and builds upon model context protocols with a layered stack to control agent capabilities personalities and objectives. The framework enables rapid prototyping evaluation and iteration on defenses and is designed to be observable for forensics with transcript logging and per message recipient controls including optional encryption and authentication.

Key Findings

Terrarium demonstrates solid utility in solving instruction augmented DCOP problems with LLM based agents across three domains while enabling systematic study of safety privacy and security aspects.
Three scenarios Meeting Scheduling Personal Assistant and Smart Home provide ground truth objectives and controllable configurations for evaluating attacks and defenses.
Privacy and availability attacks can be highly successful in controlled experiments: a privacy attack achieved 100 per cent accuracy in retrieving private information and a context overflow availability attack also reached 100 per cent accuracy in the reported runs.
Misalignment can be induced by a single adversarial agent or by an external attacker through poisoning; a single adversarial agent can misalign the system and an external attacker can induce misalignment in one planning round though multiple rounds can improve effectiveness.
Attack efficacy correlates with the number of poisoning shots; joint utility generally decreases under attacks but the reductions are modest in the reported evaluations, with higher poisoning doses increasing attack success.
Smaller models may struggle to assign actions to all owned variables, affecting the completeness of evaluations and the reported joint objective values.
The framework provides a reproducible evaluation environment with seeded configurations and open source access, supporting analysis of attack surfaces and the testing of mitigations such as secure inter agent protocols data provenance access controls and anomaly detection.

Limitations

The authors acknowledge Terrarium as a simple abstract framework not intended for deployment and note that optimisations in communication protocol and memory management would be needed for real world use. The current design focuses on cooperative instruction augmented DCOPs and may not capture all dynamics of competitive or negotiation heavy MAS. Further work is proposed to explore additional MAS environments and defence mechanisms and to broaden attack surfaces beyond those studied.

Why It Matters

The work addresses important privacy and surveillance concerns arising when MAS handle private data and coordinate across agents and tools. By providing a modular testbed with configurable attack vectors and measurable outcomes, Terrarium supports rapid evaluation of mitigations such as secure inter agent protocols data provenance access controls and anomaly detection. It also highlights dual use risks where attackers could coordinate data exfiltration or manipulation at scale, emphasising the need for robust safeguards before real world deployment in critical tasks.

Attribution Original paper on arXiv