ASTRIDE exposes agent-specific attack surfaces in AI

Agents

Published: Fri, Dec 05, 2025 • By Lydia Stratus

ASTRIDE introduces an automated, diagram-driven way to find security issues in agentic AI systems. It adds an A category to STRIDE for agent-specific attacks and uses vision-language models plus a reasoning LLM to map risks like prompt injection, unsafe tool use and reasoning subversion onto system components, so teams can fix them early.

ASTRIDE is a practical attempt to move threat modelling for agentic AI out of slide decks and into something you can run against architecture diagrams. The platform extends the classic STRIDE mnemonic with a new A for AI agent-specific attacks, covering prompt injection, unsafe tool invocation and reasoning subversion. It then uses fine-tuned vision-language models (VLMs) and a reasoning Large Language Model (LLM) to automate analysis of visual data flow diagrams.

What the research actually says

The prototype chains three VLMs for visual parsing, a reasoning LLM for synthesis, a small data lake of annotated diagrams and an orchestration layer of LLM agents. That combination produces structured observations mapped to components and flows, and a final cohesive threat model. The paper reports better coverage of AI-specific threats than a plain STRIDE pass, but also flags limits: signs of overfitting, a synthetic training set of about 1,200 annotated records and potential generalisation gaps when diagrams or domains look different from the training data.

Translate that to infrastructure risk language and you get concrete attack surfaces: exposed model endpoints that accept unsanitised prompts; GPU hosts or inference nodes where multi-tenant workloads could leak or be poisoned; vector databases that store embeddings of sensitive material and accept untrusted content; tool integrations that allow agents to call external APIs or run code; and the flow of secrets and provenance metadata across services.

Diagram-in-words: User input -> Agent -> Tool(s) -> LLM endpoint -> Vector DB -> External API. Every arrow is a trust boundary that can be punctured.

Quick mitigation pillars

Input provenance and filtering: validate and tag inputs, apply strict prompt templates and reduce contextual windows for untrusted data.
Tool governance and sandboxing: restrict what tools an agent can invoke, enforce least privilege, and run tool calls in containers with resource and network limits.
Monitoring and detection: instrument model endpoints, embed canaries in vectors, and log agent-to-agent messages with integrity checks.

Practical runbook for an incident

Step 1: Detect. Alert on anomalous prompt patterns, sudden vector-store churn, unexpected model calls or GPU memory spikes. Canary prompts mapped to known safe responses give quick validation of model integrity.

Step 2: Contain. Disable the offending tool binding or isolate the agent process on the inference host. If possible, revoke temporary credentials and block the path from the agent to external APIs while preserving logs.

Step 3: Remediate. Rebuild affected vector shards from verified backups, rotate secrets that crossed trust boundaries, retune prompt templates and redeploy fine-grained allowlists for tool invocation.

Step 4: Hardening. Add provenance metadata for inputs, reduce context windows for untrusted content, enforce schema checks before embedding, and introduce deterministic approval gates for high-risk actions. Test these controls by simulating prompt injections and unsafe tool requests during design reviews.

ASTRIDE is useful because it ties diagram analysis to concrete mitigations, but it is not a silver bullet. The automation depends on models that can be biased or brittle and on a training set that is still small. Treat ASTRIDE outputs as an informed draft: use them to find gaps early, then validate with adversarial testing and pegged metrics for detection coverage. For busy SRE and security teams, that workflow — parse, surface, validate, harden — is what turns research into safer systems.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications

Authors: Eranga Bandara, Amin Hass, Ross Gore, Sachin Shetty, Ravi Mukkamala, Safdar H. Bouk, Xueping Liang, Ng Wee Keong, Kasun De Zoysa, Aruna Withanage, and Nilaan Loganathan

AI agent-based systems are becoming increasingly integral to modern software architectures, enabling autonomous decision-making, dynamic task execution, and multimodal interactions through large language models (LLMs). However, these systems introduce novel and evolving security challenges, including prompt injection attacks, context poisoning, model manipulation, and opaque agent-to-agent communication, that are not effectively captured by traditional threat modeling frameworks. In this paper, we introduce ASTRIDE, an automated threat modeling platform purpose-built for AI agent-based systems. ASTRIDE extends the classical STRIDE framework by introducing a new threat category, A for AI Agent-Specific Attacks, which encompasses emerging vulnerabilities such as prompt injection, unsafe tool invocation, and reasoning subversion, unique to agent-based applications. To automate threat modeling, ASTRIDE combines a consortium of fine-tuned vision-language models (VLMs) with the OpenAI-gpt-oss reasoning LLM to perform end-to-end analysis directly from visual agent architecture diagrams, such as data flow diagrams(DFDs). LLM agents orchestrate the end-to-end threat modeling automation process by coordinating interactions between the VLM consortium and the reasoning LLM. Our evaluations demonstrate that ASTRIDE provides accurate, scalable, and explainable threat modeling for next-generation intelligent systems. To the best of our knowledge, ASTRIDE is the first framework to both extend STRIDE with AI-specific threats and integrate fine-tuned VLMs with a reasoning LLM to fully automate diagram-driven threat modeling in AI agent-based applications.

🔍 ShortSpan Analysis of the Paper

Problem

AI agent based systems enable autonomous decision making, dynamic task execution and multimodal interactions, but introduce security challenges not well captured by traditional threat modelling. These include prompt injection, context poisoning, model manipulation and opaque agent to agent communication, which can compromise integrity, availability and trust. The paper studies how to systematically model these threats for agent based AI applications and why this matters for secure design.

Approach

ASTRIDE extends the classical STRIDE framework by adding a new category, AI Agent Specific Attacks, to cover vulnerabilities such as instruction manipulation, unsafe tool use and reasoning subversion that arise in agent workflows. It automates threat modelling by combining a consortium of fine tuned vision language models with the OpenAI gpt oss reasoning large language model to perform end to end analysis directly from visual architecture diagrams such as data flow diagrams. LLM agents orchestrate the threat modelling process by coordinating interactions between the VLM consortium and the reasoning LLM. The platform comprises four core components: Data Lake for storing threat modelling diagrams and annotations, a consortium of fine tuned VLMs, the OpenAI gpt oss reasoning LLM for high level synthesis, and LLM Agents for orchestration. Fine tuning uses the Unsloth library with QLoRA based quantisation to enable efficient deployment on accessible hardware, and the VLMs are hosted on the Ollama runtime. The approach is evaluated using synthetic and real world diagrams to demonstrate accuracy, scalability and explainability of diagram driven threat modelling for AI agent based systems.

Four functional stages are defined: Data Lake Setup, Vision Language Model Fine-Tuning, Threat Prediction by the fine tuned VLMs, and Final Prediction by the OpenAI gpt oss reasoning LLM. Each VLM processes input diagrams independently to produce structured threat observations which include AI agent specific threats and STRIDE inspired threats, mapped to system components and data flows. The LLM Agent layer assembles these observations into a unified prompt that OpenAI gpt oss then reasons over to generate a coherent threat model. A consensus based reasoning mechanism mitigates model bias by cross referencing multiple VLM outputs before final synthesis.

Key Findings

ASTRIDE extends STRIDE with AI agent specific threats and presents a unified framework for capturing prompt injection, unsafe tool invocation and reasoning subversion alongside traditional STRIDE threats.
The platform automates diagram driven threat modelling by leveraging a consortium of fine tuned VLMs and a reasoning LLM to analyse visual system diagrams and produce structured threat predictions at the component level.
A final cohesive threat model is created by OpenAI gpt oss, which synthesises outputs from multiple VLMs, resolves conflicts and provides explainable risk assessments and mitigation recommendations.
A functional prototype demonstrates feasibility with three fine tuned VLMs and the gpt oss reasoning model; the data lake stores annotated threat diagrams and Mermaid style diagrams; fine tuning used Unsloth with QLoRA and deployment via Ollama.
Evaluations on synthetic and real world diagrams show enhanced accuracy of threat identification, improved coverage of AI specific threats and reduced dependence on human experts, with results framed as scalable and interpretable threat modelling for next generation intelligent systems.

Limitations

Limitations include signs of overfitting observed in validation loss during fine tuning, a synthetic dataset of about 1200 annotated records used for training, and potential generalisation limits when confronted with novel diagram styles or domains. The approach relies on large language models and vision language models, which may introduce biases or hidden failure modes. The evaluation is based on a prototype and its real world applicability may require further validation across varied architectures and domains.

Why It Matters

Practically, ASTRIDE offers a scalable and explainable method to reason about security early in system design, guiding mitigations such as input and tool usage controls, guardrails for agent reasoning and monitoring strategies to reduce exploitation in AI agent based systems. The societal impact is not central to the paper, but strengthening threat modelling for AI agents can reduce security risks in applications involving surveillance, manipulation or automation that affect people by highlighting and mitigating AI specific attack surfaces.

Attribution Original paper on arXiv