Coding Agents Expose Chains for Silent Compromise
Agents
Modern coding agents are not simple autocomplete tools. They are autonomous systems driven by Large Language Models (LLM) that call tools, read and write files, run terminals and fetch web content. A recent systematic audit of eight real-world agents exposes 15 distinct security issues and shows how those issues can be chained into end-to-end compromises without any user interaction.
The paper documents practical attack paths. Adversaries can use malicious workspace configuration, poisoned files, attacker controlled web resources and crafted tool descriptions to influence an agent. Indirect prompt injection is a recurring theme: content that looks legitimate to the model causes the agent to make unsafe tool calls. The researchers achieved arbitrary command execution in five agents and global data exfiltration in four. Some vendors fixed issues and two fixes received CVEs; others considered certain behaviours intended and left them unremediated.
Why this matters
The pattern is familiar to anyone who watched earlier platform shifts. In the 1990s macro viruses and unchecked browser plugins turned benign features into attack conduits. The novelty here is composability: separate weaknesses in prompt handling, tool validation and filesystem access combine to produce outcomes far worse than any single bug. That makes threat modelling across components essential rather than optional.
The audit highlights several specific weak points. Tool calling layers sometimes accept arbitrary instructions from the LLM or fail to validate which tool is being invoked. File and terminal operations can read or write outside the intended workspace, bypass approvals, or persist malicious startup configuration. Directory listings and symbolic link handling leak sensitive paths. Renderers and web fetch tools can carry exfiltrated data to attacker servers without prompting the user. One vendor API in the study was observed to lack validation for tool calls, illustrating how ecosystem assumptions can become exploitable.
Practical implications for teams
The study points to defensive measures that are straightforward in principle, even if messy in practice. First, adopt least privilege for agent processes: combine fine-grained OS controls with containerisation so agents run with only the permissions they need. Second, harden the tool calling interface: validate tool identifiers, parameter types and allowed operations at run time and reject LLM-sourced instructions that attempt unsupported calls. Third, treat all IO as untrusted: separate instruction data from user data, filter outputs from renderers, restrict directory listings and tighten symbolic link handling.
Operationally, add runtime monitoring and anomaly detection that looks for unexpected shell invocations, network calls to unknown endpoints and unusual file access patterns. Include coding agents in regular threat models and security testing programmes rather than assuming they are benign development helpers. Finally, demand transparency from vendors about their validation logic and default approval behaviours; patches and CVEs are useful, but some vendors will prioritise usability over safety unless pushed.
The audit is a timely reminder that autonomy increases attack surface. The technical fixes align with old lessons: isolate, limit privilege and validate inputs. Teams that treat agents as new privileged components will be better placed to retain control when these systems inevitably learn new tricks.
Additional analysis of the original ArXiv paper
📋 Original Paper Title and Abstract
Takedown: How It's Done in Modern Coding Agent Exploits
🔍 ShortSpan Analysis of the Paper
Problem
The paper conducts a systematic security analysis of eight real world coding agents, which are large language model driven tools specialised in software development. Unlike basic code completion assistants, these agents can autonomously undertake multi step programming tasks, access local files and system commands, and extend capabilities via external tools. This combination raises serious concerns about confidentiality, integrity and safety in practical development environments. The study seeks to close a gap left by fragmented prior work by examining internal workflows and identifying security threats across multiple components, with the aim of understanding end to end exploit possibilities without user interaction.
Approach
The authors perform a systematic, component level security analysis of eight coding agents, focusing on internal workflows such as tool calling, file operations and terminal interactions. They employ preliminary analysis to map attack vectors, followed by dynamic analysis using a LiteLLM proxy to intercept prompts and capture interactions, and where possible, reverse engineering of publicly available source code to generalise findings to closed source agents. The threat model includes adversaries inside the workspace and outside the user machine, with indirect prompt injection via attacker controlled web content as a key consideration. The study also benchmarks tool calling behaviours, monitors workspace configurations, and assesses how security weaknesses can combine to enable end to end exploitation. Evidence and patterns are synthesised into a comprehensive set of security issues and potential attack chains, illustrated through end to end exploitation scenarios. A responsible disclosure process was undertaken with the vendors of the assessed agents.
Key Findings
- The analysis identifies 15 security issues across the eight coding agents, arising mainly from insufficient security policies and mis implementation bugs; these issues can be exploited in combination to realise end to end attacks.
- End to end exploitation is demonstrated as feasible, with arbitrary command execution achieved in five agents and global data exfiltration achieved in four agents, all without user interaction or explicit approval.
- Attack vectors include malicious workspace configuration, malicious file resources, malicious web resources and malicious tool descriptions, with indirect prompt injection being a common entry point across all agents.
- Tool calling vulnerabilities are highlighted, including unreliable validation of tool calls from LLM provided interfaces and the lack of consistent validation in custom tool calling. In particular the Anthropic API is shown to lack validation for tool calls, enabling arbitrary tool invocation, and some custom tool calling implementations also allow unsupported tool execution.
- File and terminal operation weaknesses are a core channel for exploitation, including reading and writing outside the workspace, bypassing approvals, and commandeering configuration files to disable safety checks or to persist malicious commands on startup.
- Directory listings and symbolic link handling can be abused to reveal or access sensitive files outside the workspace, enabling data exfiltration and command execution without direct user approval.
- Malicious extensions and external tool descriptions delivered through MCP and extensions can bypass built in approvals, creating strong supply chain like risks where attacker controlled tool descriptions influence agent behaviour.
- Extensive end to end attack chains are shown in figures, including scenarios where indirect prompt injection via web content, directory listings in prompts, and malicious Git submodules enable command execution or exfiltration without user consent.
- Renderer components such as Mermaid diagrams and web fetch tools are shown to facilitate exfiltration by embedding sensitive data into rendered content or requests to attacker controlled servers, with some tools lacking user approval requirements.
- Vendors were informed of the findings; fifteen issues were disclosed, with two fixed and assigned CVEs; some issues were deemed intended behaviour by vendors and not remediated, reflecting real world trade offs in risk remediation.
Limitations
The study anonymises the eight agents and relies on available internal details for some cases, leaving a portion of the results dependent on dynamic analysis rather than direct code inspection. Some agents use proprietary servers, limiting access to full implementation details. The threat model focuses on two adversary types and indirect prompt injection via external content; societal level impacts are not the focus, though enterprise and software supply chain risks are highlighted. Finally, some issues were deemed intended by vendors and were not fixed, illustrating the tension between security findings and product design decisions.
Why It Matters
The findings underscore the importance of comprehensive threat modelling, strong isolation and least privilege execution in AI enabled coding tools. They advocate hardened prompt and IO handling, robust input output filtering and runtime monitoring, and require security testing across all agent components rather than surface features alone. The work emphasises that security flaws can be chained to produce end to end exploits with no user interaction, potentially impacting software supply chains and enterprise development workflows. The paper also highlights practical mitigations including instruction data separation, LLM guardrails for tool calls, and sandboxing strategies such as containerisation or fine grained OS controls to limit the impact of an agent.