FeatureBleed leaks hidden attributes via accelerator timing

Attacks

Published: Tue, Jun 16, 2026 • By Lydia Stratus

FeatureBleed leaks hidden attributes via accelerator timing

FeatureBleed shows that zero-skipping in AI accelerators leaks private backend-enriched attributes through end-to-end timing. By issuing normal API queries and measuring latency, an attacker can infer hidden features across CPUs and GPUs, with high accuracy in some cases. Padding to worst-case time mitigates leakage with modest overhead.

Plenty of teams hide sensitive attributes behind backend enrichment, confident that if the API never returns the features, they stay secret. FeatureBleed says otherwise. The paper shows you can recover hidden attributes using nothing more than request timing, thanks to sparsity optimisations in modern accelerators. Not a cache trick, not power analysis, just end-to-end latency. If you thought the risk lived in misconfigured buckets, welcome to the timing path.

How the attack works

The target is a service that retrieves private features server-side during inference. An attacker sends legitimate, label-only queries tied to identifiers, measures response times, and uses auxiliary profiling on similar identifiers to learn timing fingerprints. They then train a Gradient Boosted Decision Tree to map measured latency, the non-sensitive inputs, and returned labels to the hidden attribute. No dynamic voltage and frequency scaling tweaks, no shared-cache co-residency, no power probes. It is a remote timing channel driven by how fast the accelerator chews through zeros.

The results generalise across hardware and models. The authors evaluate Intel AVX and Intel AMX on CPUs, and an NVIDIA A100 GPU. Deep Neural Networks (DNNs) on GPUs leak the most; Convolutional Neural Network (CNN) image pipelines leak less but stay above chance; making models wider amplifies the channel. On a 10-class surgical procedure task they report 60.98 percent accuracy on AVX, 61.06 percent on AMX, and 70.30 percent on A100, with adversarial advantage up to 98.87 percentage points on specific classes. A large fully connected model hit 99.37 percent accuracy in their tests.

Why it leaks

Two ingredients line up. First, the model’s internal representations combined with Rectified Linear Unit (ReLU) activations produce class-dependent sparsity. Second, accelerators perform zero-skipping, executing fewer operations when inputs are zero. Runtime then varies with attribute-dependent sparsity. The behaviour shows up on Intel AMX TDPBSSD and mirrors on NVIDIA A100 Tensor Cores, and it persists even with frequency scaling disabled.

Mitigations carry real cost. Turning off zero-skipping on Intel AMX raises per-operation energy by up to 25 percent and doubles inference latency. A practical software defence is to pad responses to the worst-case execution time, which the authors measure at 7.24 percent average performance overhead with no additional power cost. Leakage strength does depend on dataset and architecture, and some class timing distributions overlap, but the channel remains usable under those constraints. The attack assumes the ability to query identifiers tied to backend features and to profile from the same population.

One more point for operators: masking logits or confidences does not help here. This is a hardware-level timing problem bleeding into your API surface, and the optimisations that make your GPUs fast are the same ones sketching private attributes into response time.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

FeatureBleed: Inferring Private Enriched Attributes From Sparsity-Optimized AI Accelerators

Authors: Darsh Asher, Farshad Dizani, Joshua Kalyanapu, Rosario Cammarota, Aydin Aysu, and Samira Mirbagher Ajorpaz

Backend enrichment is now widely deployed in sensitive domains such as product recommendation pipelines, healthcare, and finance, where models are trained on confidential data and retrieve private features whose values influence inference behavior while remaining hidden from the API caller. This paper presents the first hardware-level backend retrieval data-stealing attack, showing that accelerator optimizations designed for performance can directly undermine data confidentiality and bypass state-of-the-art privacy defenses. Our attack, FEATUREBLEED, exploits zero-skipping in AI accelerators to infer private backend-retrieved features solely through end-to-end timing, without relying on power analysis, DVFS manipulation, or shared-cache side channels. We evaluate FEATUREBLEED on three datasets spanning medical and non-medical domains: Texas-100X (clinical records), OrganAMNIST (medical imaging), and Census-19 (socioeconomic data). We further evaluate FEATUREBLEED across three hardware backends (Intel AVX, Intel AMX, and NVIDIA A100) and three model architectures (DNNs, CNNs, and hybrid CNN-MLP pipelines), demonstrating that the leakage generalizes across CPU and GPU accelerators, data modalities, and application domains, with an adversarial advantage of up to 98.87 percentage points. Finally, we identify the root cause of the leakage as sparsity-driven zero-skipping in modern hardware. We quantify the privacy-performance-power trade-off: disabling zero-skipping increases Intel AMX per-operation energy by up to 25 percent and incurs 100 percent performance overhead. We propose a padding-based defense that masks timing leakage by equalizing responses to the worst-case execution time, achieving protection with only 7.24 percent average performance overhead and no additional power cost.

🔍 ShortSpan Analysis of the Paper

Problem

The paper studies whether server-side backend enrichment, where sensitive features are retrieved at inference time but hidden from API callers, can be leaked via hardware timing side-channels in modern AI accelerators. This matters because many real-time ML systems in healthcare, finance and recommendation services rely on confidential backend features to make accurate decisions, and a timing channel could allow an attacker who issues legitimate label-only queries to infer private attributes without access to logits, confidences or co-located resources.

Approach

The authors introduce FEATUREBLEED, a remote timing attack that exploits sparsity-driven zero-skipping optimisations in accelerators. The adversary issues legitimate queries tied to identifiers, measures end-to-end response latency, and builds timing fingerprints by profiling auxiliary identifiers. They cluster timing traces and train a Gradient Boosted Decision Tree classifier that uses measured latency, non-sensitive inputs and returned labels to infer the hidden backend attribute. The attack is evaluated on three datasets (Texas-100X clinical records, OrganAMNIST medical imaging, Census-19 socioeconomic data), three hardware backends (Intel AVX CPU, Intel AMX on-core accelerator, NVIDIA A100 GPU), and multiple model families (DNNs, CNNs, hybrid CNN–MLP pipelines).

Key Findings

FEATUREBLEED reliably infers hidden backend-retrieved attributes from end-to-end latency alone, achieving high adversarial advantage. For a 10-class surgical-procedure task, accuracies were 60.98% (AVX), 61.06% (AMX) and 70.30% (A100), with adversarial advantage up to 98.87 percentage points on specific classes.
Leakage generalises across accelerators, data modalities and architectures. DNNs on GPUs show the strongest leakage (up to 70.3% accuracy and F1 up to 98.7), CNN-based image pipelines leak less but remain above random guessing, and increasing model width and size amplifies the channel (a large fully connected model reached 99.37% accuracy in tests).
Root cause is twofold: neural representations plus ReLU induce class-dependent activation sparsity, and accelerator zero-skipping (operand-dependent execution) causes runtime to vary with sparsity. The behaviour is observed on Intel AMX TDPBSSD and mirrored on NVIDIA A100 Tensor Core instructions, and persists even with frequency scaling disabled.
Mitigations incur trade-offs. Disabling zero-skipping raises per-operation energy on Intel AMX by up to 25% and doubles inference latency (100% overhead). A practical software defence of fixed-time padding that delays API responses to the worst-case inference time masks timing leakage with an average performance overhead of 7.24% and no measured energy increase.

Limitations

Leakage strength varies by dataset and model type; CNNs and dense visual representations attenuate but do not eliminate the channel. Timing distributions can overlap for some classes, reducing accuracy for those attributes. The attack assumes the attacker can query identifiers tied to backend features and has access to auxiliary profiling data drawn from the same population.

Why It Matters

This work identifies a realistic hardware-level attack surface in production ML pipelines that retrieve private features server-side. It shows that performance optimisations in accelerators can undermine confidentiality and that common defences that mask outputs or confidences do not prevent timing leakage. The results motivate hardware–software co-design for constant-time or timing-obfuscated kernels and prompt operators of sensitive services to consider timing-equalising mitigations when protecting backend-enriched attributes.

Links Original paper on arXiv

FeatureBleed leaks hidden attributes via accelerator timing

How the attack works

Why it leaks

📋 Original Paper Title and Abstract

FeatureBleed: Inferring Private Enriched Attributes From Sparsity-Optimized AI Accelerators

🔍 ShortSpan Analysis of the Paper

Problem

Approach

Key Findings

Limitations

Why It Matters

Related Articles

Researchers Expose KV-Cache Trojan Flipping Single Bit

Backdoored LLM agents leak data via tool calls

Attackers Hide Imperceptible Backdoors in Federated SSL

Related Research

Get the Weekly AI Security Digest