ShortSpan.ai logo

Latent Geometric Chords slip past robust vision models

Attacks
Published: Mon, Jun 01, 2026 • By Natalie Kestrel
Latent Geometric Chords slip past robust vision models
New research on Latent Geometric Chords (LGC) shows a hard-label, black-box adversarial attack that stays highly perceptual, needs only a few thousand queries, and defeats adversarially trained models. A residual trick sidesteps decoder artefacts and expands the search space, enabling stealthy, transferable attacks across datasets and popular vision architectures.

Decision-based black-box attacks, the kind that only see top-1 labels, used to mean ugly, high-frequency noise and a lot of guesswork. Latent Geometric Chords (LGC) changes the playbook: it keeps perturbations semantically clean, converges in a few thousand queries, and still topples adversarially trained models.

The setup is simple enough. You invert the image into an autoencoder’s latent space and search for the decision boundary with a curvature-aware, semicircular move rather than a straight push. That trims the query budget because you spend less time skidding along flat regions and more time tracing where the classifier actually flips.

The trick they almost buried

Residual-based Adversarial Generation (RAG) is the real exploit here. Instead of trusting the decoder’s reconstruction, LGC decodes both the original and the perturbed latents, takes their difference (a geometric chord), and pastes that residual directly onto the source image. Two wins: you dodge the decoder’s artefacts, and you expand the search space beyond the tight generative manifold. Under a Lipschitz generator, the paper argues these chord residuals give you up to 2k effective dimensions if the latent is k-dimensional. In practice, that means you can step off-manifold just enough to find adversarial pockets the encoder–decoder would otherwise hide from you, without blowing visual fidelity.

The numbers back the pitch. On ImageNet, Places365 and CelebAMask-HQ, LGC reports SSIM above 0.99 and LPIPS below 0.01 at 5,000 queries in some settings, and it lands targeted misclassifications against ResNet-50 with far smaller L2 perturbations than prior methods, including up to a sixfold reduction in one comparison. It transfers across datasets using a single ImageNet-trained autoencoder and breaks a spread of architectures: Vision Transformers, ResNets, VGG and DenseNet. The variant LGC-H trades a bit of finesse for speed.

From an operator’s angle, this is tailor-made for endpoints that expose only labels or accept/reject signals. You get query efficiency, human-plausible edits, and a way around latent-consistency checks. Any defence counting on pixel norms, reconstruction error, or the assumption that “on-manifold checks are enough” will have a bad day.

There are caveats. Performance depends on the autoencoder backbone; a VGG16 latent behaved more coherently than ResNet-50 in their tests. The dimension-doubling story leans on Lipschitz continuity of the generator. And yes, thousands of queries may still be a non-starter against tight rate limits. But the chord residual idea is the seam to pull: it neatly sidesteps the generative bottleneck that has constrained latent attacks for years, while keeping the visuals clean enough to glide under most perceptual radars. Code is out, so the only open question is how quickly defenders adapt when “stay on the manifold” stops being a safety blanket.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Latent Geometric Chords for Query-Efficient Decision-Based Adversarial Attacks

Authors: Ei Hmue Khine, Yao Li, Jiebao Sun, Shengzhu Shi, Zhichang Guo, and Boying Wu
While decision-based black-box adversarial attacks present a severe security threat, current methodologies suffer from fundamental limitations. Pixel-wise attacks frequently introduce unnatural, high-frequency visual artifacts, while latent-space frameworks are confined by the limited search space of low-dimensional manifolds and inherent reconstruction flaws. To resolve these limitations, we propose Latent Geometric Chords (LGC) for Query-Efficient Decision-Based Adversarial Attacks alongside a variant, LGC-H. At its core, LGC navigates decision boundaries by executing a curvature-aware geometric search within a compressed semantic manifold. To guarantee high visual fidelity and circumvent dimensionality bottlenecks, we introduce a Residual-based Adversarial Generation (RAG) mechanism. RAG isolates semantic perturbations as geometric chords and superimposes them directly onto the original source image. RAG substantially resolves baseline reconstruction flaws and effectively doubles the permissible search space dimensions. Experimental results demonstrate that LGC achieves robust cross-dataset transferability and substantially outperforms state-of-the-art baselines. Notably, our method, LGC, minimizes perturbation magnitudes while achieving state-of-the-art visual fidelity--with a Structural Similarity Index Measure (SSIM) exceeding 0.99 and a Learned Perceptual Image Patch Similarity (LPIPS) below 0.01 at 5000 queries--and sustaining high attack success rates under stringent perceptual constraints, successfully compromising adversarially trained robust models. The source code is available at: https://github.com/eihmuekhine/Latent-Geometric-Chords.

🔍 ShortSpan Analysis of the Paper

Problem

This paper studies query-efficient decision-based black-box adversarial attacks that must operate using only top-1 labels. Existing pixel-wise attacks create unnatural, high-frequency artifacts and do not align with human perception, while prior latent-space attacks are constrained by low-dimensional generative manifolds and suffer reconstruction errors from encoder–decoder inversion. These limitations lead to poor visual fidelity, slow convergence and failure against robust models; addressing them matters because practical attackers often only have hard-label access and aim to produce perceptually stealthy inputs with few queries.

Approach

The authors propose Latent Geometric Chords (LGC) and a faster variant LGC-H. LGC operates in the compressed latent space of an autoencoder and carries out a curvature-aware, semicircular geometric search to trace decision boundaries efficiently. A central component, Residual-based Adversarial Generation (RAG), computes the residual between decoded baseline and decoded perturbed latents (a geometric chord) and directly superimposes that residual onto the original image, rather than using the decoder output alone. This both mitigates decoder reconstruction artefacts and expands the admissible search space: mathematically the chord construction raises the Hausdorff dimension of perturbations to at most 2k when the generator is k-dimensional. Experiments use ImageNet, Places365 and CelebAMask-HQ; target models include ViT, ResNet-50, VGG16, DenseNet and ResNet-18. Evaluation metrics include attack success rate versus queries, L2 norm, Structural Similarity Index Measure (SSIM) and LPIPS perceptual distance.

Key Findings

  • LGC produces high-fidelity adversarial examples with very few queries: reported SSIM exceeding 0.99 and LPIPS below 0.01 at 5,000 queries in some settings, and rapid convergence within roughly 2,000–5,000 queries under strict perceptual constraints.
  • RAG substantially reduces decoder reconstruction errors and effectively doubles the latent-search dimensionality to a space of chord residuals, enabling navigation of adversarial regions that standard latent optimisation misses; the paper proves the chord set has Hausdorff dimension at most 2k under a Lipschitz assumption on the generator.
  • Compared with state-of-the-art baselines, LGC achieves far better perceptual stealth while remaining competitive or superior in L2 norms—targeted attacks on ResNet-50 report up to a sixfold reduction in perturbation magnitude versus prior methods while maintaining near-perfect structural similarity. LGC also successfully attacks adversarially trained Vision Transformers and generalises across datasets using a single ImageNet-trained autoencoder.

Limitations

The approach depends on a pre-trained autoencoder backbone; performance varies by backbone (VGG16 latent space gave more predictable, visually coherent results than ResNet-50). The theoretical expansion to 2k dimensions assumes Lipschitz continuity of the generator. RAG mitigates but does not eliminate all reconstruction risks and the method still requires thousands of queries, which may be impractical in some deployed settings.

Implications

Offensive security implications are clear: attackers with only hard-label access can craft highly stealthy, semantically grounded adversarial inputs that preserve human-perceived image structure and require relatively few queries to succeed, including against adversarially trained models. Defences that rely on low-dimensional latent checks, reconstruction errors or focus solely on pixel-level perturbation norms may not detect or prevent such attacks. The authors release code, enabling reproducible assessment and threat modelling of vision systems against this more stealthy class of decision-based attacks.


Related Articles

Related Research

Get the Weekly AI Security Digest

Top research and analysis delivered to your inbox every week. No spam, unsubscribe anytime.