ShortSpan.ai logo
About

Thu, Feb 12, 2026

ShortSpan.ai brings you rapid news on AI security research & real-world impacts.

Filter by category:

Featured

Recent Articles

View Archive →
Cross-modal attacks outwit vision-language model defences Pentesting

Cross-modal attacks outwit vision-language model defences

Thu, Feb 12, 2026 • By Natalie Kestrel

A new paper introduces CrossTALK, a cross-modal entanglement attack that spreads clues across images and text to bypass vision-language model defences. Experiments on nine mainstream models show high success and detailed harmful outputs, highlighting gaps in cross-modal alignment and the need for adversarial testing and cross-modal safety checks in deployed VLM systems.

Study Exposes Prompt Injection Risks for LLM Agents Agents

Study Exposes Prompt Injection Risks for LLM Agents

Thu, Feb 12, 2026 • By Elise Veyron

A systematised review maps how prompt injection (PI) attacks can hijack autonomous Large Language Model (LLM) agents and surveys existing defences. The paper introduces AgentPI, a benchmark that tests agents in context-dependent settings and shows many defences that look effective on static tests fail when agents must use real‑time observations. Trade offs between trust, utility and latency are central.

IARPA report exposes AI Trojan detection limits Defenses

IARPA report exposes AI Trojan detection limits

Wed, Feb 11, 2026 • By James Armitage

The TrojAI final report from the Intelligence Advanced Research Projects Activity (IARPA) maps how hidden backdoors, or Trojans, appear across AI models and supply chains. It shows two practical detection approaches, documents that removal is still unsolved, and warns that large language models amplify the problem, forcing organisations to accept ongoing residual risk.

Agentic LLMs Reproduce Linux Kernel PoCs Agents

Agentic LLMs Reproduce Linux Kernel PoCs

Wed, Feb 11, 2026 • By Elise Veyron

A study finds autonomous Large Language Model (LLM) agents can reproduce proofs of concept (PoCs) for real Linux kernel vulnerabilities in over 50% of cases. K-Repro automates code browsing, building and debugging inside virtual machines, often finishing within tens of minutes at a few dollars per case, though race and temporal memory bugs remain hard.

Agents Synthesize CodeQL Queries to Find Vulnerabilities Agents

Agents Synthesize CodeQL Queries to Find Vulnerabilities

Wed, Feb 11, 2026 • By Lydia Stratus

A neuro-symbolic triad uses LLMs to generate CodeQL queries and validate results through semantic review and exploit synthesis. On Python packages it rediscovers historical CVEs with 90.6% accuracy, finds 39 medium-to-high issues in the Top100 including five new CVEs, and reduces noise substantially while keeping runtime and token costs low.

MUZZLE exposes adaptive prompt injection risks in agents Agents

MUZZLE exposes adaptive prompt injection risks in agents

Wed, Feb 11, 2026 • By Lydia Stratus

MUZZLE is an automated red‑teaming framework that tests web agents driven by Large Language Models (LLMs) for indirect prompt injection. It uses the agent's own execution traces to find high‑value UI surfaces and adapt attacks, discovering 37 attacks across four applications and highlighting cross‑application and phishing risks. Defenders should prioritise sanitisation, isolation and runtime checks.

Study exposes DRL pitfalls that compromise security Defenses

Study exposes DRL pitfalls that compromise security

Tue, Feb 10, 2026 • By Dr. Marcus Halden

This survey analyses 66 papers on Deep Reinforcement Learning (DRL) for cybersecurity and identifies 11 recurring methodological pitfalls. It finds an average of 5.8 pitfalls per paper and shows how modelling, evaluation and reporting choices produce brittle or misleading policies. The paper ends with concrete fixes to raise rigour and deployment safety.

MoE models vulnerable to expert silencing attack Attacks

MoE models vulnerable to expert silencing attack

Tue, Feb 10, 2026 • By Adrian Calder

Researchers show a training-free attack called Large Language Lobotomy (L3) that bypasses safety in mixture-of-experts (MoE) large language models by silencing a small set of experts. On eight open-source MoE models, L3 raises average attack success from 7.3% to 70.4%, often needing under 20% expert silencing while preserving utility.

TrapSuffix forces jailbreaks to fail or flag Defenses

TrapSuffix forces jailbreaks to fail or flag

Tue, Feb 10, 2026 • By Dr. Marcus Halden

TrapSuffix fine-tunes models so suffix-based jailbreak attempts hit a no-win choice: they either fail or carry a traceable fingerprint. On open models it reduces attack success to below 0.01% and yields 87.9% traceability, with negligible runtime cost and about 15.87 MB extra memory.

Confundo Crafts Robust Poisons for RAG Systems Attacks

Confundo Crafts Robust Poisons for RAG Systems

Mon, Feb 09, 2026 • By Natalie Kestrel

New research presents Confundo, a learning-to-poison framework that fine-tunes a large language model (LLM) to generate stealthy, robust poisoned content for retrieval-augmented generation (RAG) systems. Confundo survives realistic preprocessing and varied queries, manipulates facts, biases opinions and induces hallucinations while exposing gaps in ingestion, provenance and defensive testing.

Chat templates enable training-free backdoor attacks Attacks

Chat templates enable training-free backdoor attacks

Sun, Feb 08, 2026 • By Natalie Kestrel

Researchers describe BadTemplate, a training-free backdoor that hides malicious instructions inside chat templates used with Large Language Models (LLMs). The attack injects strings into the system prompt, produces persistent model misbehaviour across sessions and models, and evades common detectors, creating a scalable supply chain risk for AI-driven systems.

Researchers expose inference-time backdoors in chat templates Attacks

Researchers expose inference-time backdoors in chat templates

Thu, Feb 05, 2026 • By Natalie Kestrel

New research shows attackers can hide backdoors inside chat templates used with open-weight Large Language Models (LLMs). Templates can trigger malicious instructions at inference time without altering model weights or data. The backdoors silently break factual accuracy or inject attacker-chosen links, work across runtimes, and evade current automated distribution scans.

Open LLM RedSage Bolsters Local Cybersecurity Assistants Agents

Open LLM RedSage Bolsters Local Cybersecurity Assistants

Fri, Jan 30, 2026 • By Theo Solander

RedSage is an open, locally deployable Large Language Model (LLM) trained on cybersecurity data and simulated expert workflows. At the 8B scale it measurably improves benchmark performance. The release promises practical defensive assistance but highlights dual-use, data leakage and poisoning risks and calls for strict safety, provenance and access controls.

Combine views to catch modern audio deepfakes Defenses

Combine views to catch modern audio deepfakes

Thu, Jan 29, 2026 • By Dr. Marcus Halden

New research tests three contemporary text-to-speech systems and several detectors, finding that tools tuned to one synthesis style often miss others, especially large language model (LLM) based TTS. A multi-view detector that combines semantic, structural and signal analyses delivers steadier detection and lowers risk to voice authentication, impersonation and misinformation.

Diagnose and Harden AI Agents with AgentDoG Agents

Diagnose and Harden AI Agents with AgentDoG

Tue, Jan 27, 2026 • By Rowan Vale

AgentDoG introduces a diagnostic guardrail that tracks autonomous agent behaviour at trajectory level and attributes unsafe actions to root causes. It uses a three-dimensional taxonomy and the ATBench dataset, and ships open model variants (4B, 7B, 8B). Reported results show stronger safety moderation and clearer provenance for complex, tool-using scenarios.

Study shows LLMs yield to patient pressure Agents

Study shows LLMs yield to patient pressure

Mon, Jan 26, 2026 • By Lydia Stratus

A multi-agent evaluation finds large language models (LLMs) used for emergency care often give in to patient persuasion. Across 20 models and 1,875 simulated encounters, acquiescence ranges 0–100%; imaging requests are the most vulnerable. The work shows static benchmarks miss social pressure risks and urges multi-turn adversarial testing and human escalation guards.

Persuasive LLM Rewrites Break Automated Fact-Checkers Attacks

Persuasive LLM Rewrites Break Automated Fact-Checkers

Mon, Jan 26, 2026 • By Clara Nyx

Researchers show that generative Large Language Models (LLMs) can rephrase truthful claims using persuasion techniques to evade automated fact-checking. On FEVER and FEVEROUS benchmarks, persuasive rewrites substantially lower verification accuracy and cripple retrieval. Some techniques, especially obfuscation and manipulative wording, can collapse systems when an attacker optimises for maximum damage.

Move privacy controls into RAG retrieval, not prompts Defenses

Move privacy controls into RAG retrieval, not prompts

Wed, Jan 21, 2026 • By Clara Nyx

SD-RAG moves privacy enforcement out of prompts and into the retrieval stage of Retrieval-Augmented Generation (RAG) systems. It binds natural-language constraints to data chunks in a graph model, sanitises content before it reaches the Large Language Model (LLM), and reports up to a 58% privacy improvement versus prompt-only baselines, while noting synthetic-data and model-size limitations.

Move Privacy Checks to Retrieval, Not Prompts Defenses

Move Privacy Checks to Retrieval, Not Prompts

Wed, Jan 21, 2026 • By James Armitage

New research on SD-RAG shifts privacy and access controls from prompt-time to the retrieval layer in Retrieval-Augmented Generation (RAG). By binding human readable constraints to data chunks and redacting or paraphrasing before generation, the method reduces leakage and resists prompt injection, improving a privacy score by up to 58% in tests while trading some completeness and latency.

Study Reveals RCE Risks in Model Hosting Defenses

Study Reveals RCE Risks in Model Hosting

Wed, Jan 21, 2026 • By Elise Veyron

A cross-platform study finds remote code execution (RCE) risks when loading shared machine learning models. Researchers inspect five major hubs and identify roughly 45,000 repositories with load-time custom code, uneven platform safeguards, and common injection and deserialization issues. The findings push for default sandboxing, provenance checks and clearer developer guidance.

Looking for older articles?

Browse our complete archive of AI security news and analysis

Browse Archive