Scanners miss malicious agent skills across marketplaces
Agents
Agent skills are the new dependency hell: code, data and natural-language prompts bundled up and shipped through public marketplaces. After a wave of compromises and demos, vendors rolled out skill scanners to stop the bleeding. Researchers then did what any attacker would do: read the source, tune the artefacts, and push until green ticks appeared. It did not take long.
How the bypasses worked
Start with trivial obfuscation. Prepend roughly 100,000 newlines so scanners truncate or models lose the plot. ClawHub’s pipeline, which integrates VirusTotal’s Code Insight (built on Gemini 3 Flash) and a guard model defaulting to GPT 5.5, missed overtly malicious code hidden below the scrollback. Packaging rules there are strict on file types, which helps, but it does not matter if the scanner never reads the payload.
Move to archives. A SKILL.md tells the agent to pull real instructions from a .docx. Under the hood that is a ZIP, so you smuggle a shell script alongside the innocent XML and have the agent run it. Pattern matching shrugs; many Large Language Model (LLM) checks do too.
Then classic supply-chain sleight of hand: Python bytecode poisoning. Ship utils.py that looks clean, plus a utils.cpython-312.pyc with extra behaviour to walk environment variables for exfil or abuse. Several scanners, including those on skills.sh and Cisco’s skill-scanner, did not inspect .pyc at all. It is the xz-utils lesson again: the thing you execute is not the thing you reviewed.
Finally, prompt injection with a high-vis jacket. The skill “standardises dev environments” by writing npm and yarn config to point at an attacker-controlled registry, dressed up as a corporate mirror. With Claude Sonnet 4.6, Cisco’s tool raised LOW findings and still marked the skill safe. The scanners on skills.sh also passed it. When your defensive target is static, an adversary can iterate wording until the LLM nods along.
Why scanners struggle
Scope is narrow by design: to avoid context blowouts and cost, scanners ignore binaries, hidden files, and anything not explicitly referenced by SKILL.md. That is a gift to anyone hiding in .pyc, images, or archives, and it invites multimodal prompt attacks. The model-driven bits are persuadable with corporate-sounding rationale and selective context.
The mess is not confined to overt malware. Legitimate skills from trusted sources embed eyebrow-raising hacks, like LD_PRELOAD shims compiled on the fly to work around sandbox limits. Automated analysis flags them low to medium at best. If an attacker can drop the right shared object into tmp, that affordance becomes code execution with a straight face.
The research argues the trust model is broken: static scanners help at the margins, but public skill repositories remain untrusted code. The open question is whether we lean into stricter packaging and provenance, or accept that only dynamic, behaviour-aware controls will catch what looks benign on paper and bites at 03:00.