New to ShortSpan? We distil the AI-security research that matters into practitioner takeaways — edited by Ben Williams (NCC Group). Get the weekly email

// Analysis

Edge AI accelerators misused as confused deputies

Published: Tue, May 19, 2026 • By Theo Solander

Attacks

Edge AI accelerators misused as confused deputies

New research shows unprivileged apps can trick edge AI accelerators into reading or writing protected memory. An LLM-assisted tool, DeputyHunt, maps the message paths that enable these confused deputy attacks across six of seven tested devices, affecting over 128 SoCs and 100 million devices. A CVE is assigned; mitigations look feasible.

Speed has a habit of outrunning safety checks. We saw it with early bus‑mastering peripherals, and we are seeing it again with edge AI accelerators. The paper “Speed Kills” dissects how specialised inference hardware slips past operating system guardrails and, with a nudge from a user app, does the OS’s dirty work for it.

Here the deputy is the accelerator: a TPU, NPU or GPU wired for zero‑copy performance, light on operating system context, and often running with relaxed I/O memory management unit (IOMMU) rules. The authors show that a user process can craft Shared Memory Identifiers (SMIDs) that the accelerator trusts, then ask it to read or write where it should not. That turns the accelerator into a confused deputy, performing privileged operations on the app’s behalf.

How the attack lands

The path in is the message interface between driver and accelerator. SMIDs describe buffers. If the device or driver does not validate those SMIDs against current process permissions, the accelerator’s direct memory access will happily touch kernel pages or another process’s data. Time‑of‑check to time‑of‑use gaps make it worse: stale mappings linger just long enough to pivot.

The team built DeputyHunt, an analysis framework that instruments kernel drivers to log DMA and user‑copy calls, traces syscalls during a simple inference run, and uses a Large Language Model (LLM) to sift driver code for likely IOCTL handlers, message structs and SMID fields. It then flips candidate fields and watches what the hardware does. The result: a sharply reduced code search space and high‑confidence attack surfaces without reverse‑engineering entire firmware blobs.

On seven real accelerators, six broke. NXP and Hailo NPUs allowed arbitrary read/write of system memory by constructing SMIDs that map chosen physical pages. Texas Instruments MMA and AWS Inferentia enabled limited arbitrary access within device or accelerator memory. Google Edge TPU and NVIDIA GPU exposed fixed‑region access via stale mappings, a classic TOCTOU flavour. Rockchip’s NPU avoided trouble by not using zero‑copy shared memory.

The blast radius is wide: more than 128 system‑on‑chips and over 100 million devices are implicated. Vendors acknowledged the issues, with a CVE assigned (CVE‑2025‑66425). For mitigation, the authors propose on‑demand validation where the accelerator checks SMIDs with the kernel at runtime, using caching and deferred unmapping. In Gem5‑salam simulation it added about 15% overhead, far less than strict IOMMU enforcement.

If this sounds familiar, it is. Any time we add a fast coprocessor and shave off context, the OS loses sight of who is allowed to touch what. The interesting question is not whether accelerators can go faster, but how we make their semantics visible enough that speed stops buying the attacker privilege.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Speed Kills: Exploring Confused Deputy Attacks Through Edge AI Accelerators

Authors: Datta Manikanta Sri Hari Danduri and Aravind Kumar Machiry

AI Accelerator (AIA) are specialized hardware e.g., Tensor Processing Unit (TPU), that enable optimal and efficient execution of AI applications and on-device inference. The growing demand for AI applications has led to the widespread adoption of AIAs on Edge or embedded devices on Edge or embedded devices. Unlike applications, AIAs are not bound by Operating System (OS) restrictions and have limited visibility into Application Processor (AP) security mechanisms (e.g., kernel vs. application memory, process isolation). This semantic gap can lead to confused deputy vulnerabilities, i.e., AIA can be tricked by a malicious application to perform privileged operations on their behalf. In this paper, we conducted the first in-depth study of Confused Deputy Attacks (CDAs) using AIA. We design DeputyHunt, a Large Language Model (LLM) assisted framework to extract CDA relevant information for a given AIA through a combination of dynamic and static analysis. We used this information to explore the feasibility of CDA on seven different AIAs from popular vendors, i.e., Google, NVIDIA, Hailo, Texas Instruments, NXP, AWS, and Rockchip. Our analysis revealed that CDA is feasible on six out of the seven AIAs, impacting over 128 System On Chips (SOCs) and over 100 million devices. Our findings highlight critical security risks posed by AIA on system security. Our work has been acknowledged by the corresponding vendors and assigned the CVE-2025-66425. We propose an on-demand validation defense against CDA, and evaluation on the Gem5- salam simulator shows that it incurs minimal runtime overhead (i.e., ~15%).

🔍 ShortSpan Analysis of the Paper

Problem

This paper investigates whether edge AI accelerators (AIAs) can be leveraged to violate host security by performing privileged memory operations on behalf of unprivileged user applications. AIAs commonly use zero-copy transfers to meet performance and power constraints, which creates a semantic gap between the accelerator and the host operating system. Because AIAs often bypass or do not employ IOMMU protections in edge deployments, they may accept Shared Memory Identifiers (SMIDs) supplied by user applications without correct validation. That gap can enable confused deputy attacks (CDAs) in which an AIA is tricked into reading or writing kernel memory or other processes memory, potentially enabling severe host compromise.

Approach

The authors developed DeputyHunt, an LLM-assisted analysis framework that combines source instrumentation, dynamic tracing and static analysis to extract AIA message formats, SMID semantics and kernel driver entry points. DeputyHunt instruments kernel drivers to log DMA and user-copy calls, collects ordered syscall and kernel traces while running a simple inference application, extracts candidate functions and message structures, and uses a large language model (gpt-4o-mini) with an analysis agent to identify AIA-relevant functions, KD ioctl handlers and SMID fields. Candidate SMIDs and message formats are then validated by modifying messages to refer to restricted memory and observing whether the AIA accesses those regions. The methodology was applied to seven real-world edge AIAs using development boards and kernel sources: Google Edge TPU, NXP NPU, Texas Instruments MMA, Hailo NPU, NVIDIA GPU, AWS Inferentia, and Rockchip NPU.

Key Findings

Confused deputy attacks are feasible on six of seven evaluated AIAs; Rockchip NPU was not vulnerable because it does not use zero-copy shared memory.
DeputyHunt substantially reduces analyst effort: on average it reduced the code inspection scope by about 97% and produced high-confidence candidates for manual verification.
Different AIAs yield different CDA modalities: NXP and Hailo NPUs permit arbitrary read/write access to system memory by constructing SMIDs that map arbitrary physical pages; TMMA and AWS Inferentia allow limited arbitrary access within device or accelerator memory; Google TPU and NVIDIA GPU permit fixed-region CDAs enabled by stale mappings that create a time-of-check to time-of-use vulnerability.
Practical impact is large: the authors report affected designs across over 128 system-on-chips and more than 100 million deployed devices; NXP assigned CVE-2025-66425 and vendors acknowledged the issues; the team developed working exploits to validate findings.
On-demand validation, where the AIA consults the kernel driver to validate SMIDs at runtime with caching and deferred unmapping, is an effective mitigation in simulation, incurring modest runtime overhead of about 15% on average in Gem5-salam experiments while strict IOMMU enforcement imposes far higher costs.

Limitations

The investigation is constrained by closed-source vendor libraries and black-box AIA firmware, requiring cross-layer logging and manual validation; DeputyHunt is an analyst-aid rather than a fully automated exploit generator. Experiments used development boards and a simulator for defence evaluation, which may not capture every real-world platform nuance. The Rockchip device was excluded from CDA results because its USB-based design lacks zero-copy transfers.

Implications

An attacker controlling an unprivileged userspace application that can interact with an AIA can craft messages to cause the accelerator to read or write privileged kernel memory or other processes memory. That can enable kernel compromise, data exfiltration, persistent corruption of accelerator page tables to extend access, denial of service of shared accelerators, and cross-tenant impact in cloud settings. The presence of stale SMIDs creates TOCTOU windows that make exploitation practical in devices where IOMMU protections are bypassed for performance.

Links Original paper on arXiv

Edge AI accelerators misused as confused deputies

How the attack lands

📋 Original Paper Title and Abstract

Speed Kills: Exploring Confused Deputy Attacks Through Edge AI Accelerators

🔍 ShortSpan Analysis of the Paper

Problem

Approach

Key Findings

Limitations

Implications

Related Articles

LLMs Tackle Hardware Security Verification, With Evidence

Embed Hardware Off-Switches to Secure AI Accelerators

Agentic coding assistants become the attacker's shell

Related Research

Get the weekly digest