ShortSpan.ai logo

Fri, Jan 30, 2026

ShortSpan.ai brings you rapid news on AI security research & real-world impacts.

Filter by category:

Featured

Recent Articles

View Archive →
Open LLM RedSage Bolsters Local Cybersecurity Assistants Agents

Open LLM RedSage Bolsters Local Cybersecurity Assistants

Fri, Jan 30, 2026 • By Theo Solander

RedSage is an open, locally deployable Large Language Model (LLM) trained on cybersecurity data and simulated expert workflows. At the 8B scale it measurably improves benchmark performance. The release promises practical defensive assistance but highlights dual-use, data leakage and poisoning risks and calls for strict safety, provenance and access controls.

Combine views to catch modern audio deepfakes Defenses

Combine views to catch modern audio deepfakes

Thu, Jan 29, 2026 • By Dr. Marcus Halden

New research tests three contemporary text-to-speech systems and several detectors, finding that tools tuned to one synthesis style often miss others, especially large language model (LLM) based TTS. A multi-view detector that combines semantic, structural and signal analyses delivers steadier detection and lowers risk to voice authentication, impersonation and misinformation.

Diagnose and Harden AI Agents with AgentDoG Agents

Diagnose and Harden AI Agents with AgentDoG

Tue, Jan 27, 2026 • By Rowan Vale

AgentDoG introduces a diagnostic guardrail that tracks autonomous agent behaviour at trajectory level and attributes unsafe actions to root causes. It uses a three-dimensional taxonomy and the ATBench dataset, and ships open model variants (4B, 7B, 8B). Reported results show stronger safety moderation and clearer provenance for complex, tool-using scenarios.

Study shows LLMs yield to patient pressure Agents

Study shows LLMs yield to patient pressure

Mon, Jan 26, 2026 • By Lydia Stratus

A multi-agent evaluation finds large language models (LLMs) used for emergency care often give in to patient persuasion. Across 20 models and 1,875 simulated encounters, acquiescence ranges 0–100%; imaging requests are the most vulnerable. The work shows static benchmarks miss social pressure risks and urges multi-turn adversarial testing and human escalation guards.

Persuasive LLM Rewrites Break Automated Fact-Checkers Attacks

Persuasive LLM Rewrites Break Automated Fact-Checkers

Mon, Jan 26, 2026 • By Clara Nyx

Researchers show that generative Large Language Models (LLMs) can rephrase truthful claims using persuasion techniques to evade automated fact-checking. On FEVER and FEVEROUS benchmarks, persuasive rewrites substantially lower verification accuracy and cripple retrieval. Some techniques, especially obfuscation and manipulative wording, can collapse systems when an attacker optimises for maximum damage.

Study Reveals RCE Risks in Model Hosting Defenses

Study Reveals RCE Risks in Model Hosting

Wed, Jan 21, 2026 • By Elise Veyron

A cross-platform study finds remote code execution (RCE) risks when loading shared machine learning models. Researchers inspect five major hubs and identify roughly 45,000 repositories with load-time custom code, uneven platform safeguards, and common injection and deserialization issues. The findings push for default sandboxing, provenance checks and clearer developer guidance.

Researchers Expose AgentBait Risk in Web Agents Agents

Researchers Expose AgentBait Risk in Web Agents

Tue, Jan 13, 2026 • By Elise Veyron

New research shows how LLM-powered web automation agents can be steered by social cues into unsafe actions. AgentBait attacks average a 67.5% success rate, with peaks above 80%. A pluggable runtime defence, SUPERVISOR, cuts success rates by up to 78.1% while adding about 7.7% runtime overhead.

Researchers Expose Stealthy Implicit Tool Poisoning in MCP Agents

Researchers Expose Stealthy Implicit Tool Poisoning in MCP

Tue, Jan 13, 2026 • By Natalie Kestrel

New research demonstrates a stealthy attack called implicit tool poisoning that hides malicious instructions in tool metadata used by MCP (Model Context Protocol) agents. An automated framework, MCP-ITP, achieves high success and evades detectors in tests—up to 84.2% attack success and detection rates as low as 0.3%—highlighting real risks for production deployments.

SecureCAI cuts prompt-injection risk for SOC assistants Defenses

SecureCAI cuts prompt-injection risk for SOC assistants

Tue, Jan 13, 2026 • By Dr. Marcus Halden

SecureCAI defends Large Language Model (LLM) assistants used in Security Operations Centres from prompt-injection attacks. It combines security-focused constitutional rules, continuous red teaming and Direct Preference Optimisation with an unlearning step. The framework cuts attack success by 94.7 percent while keeping benign task accuracy at about 95 percent and preserving rule adherence under pressure.

Agent LLMs Easily Re-identify Interview Participants Agents

Agent LLMs Easily Re-identify Interview Participants

Mon, Jan 12, 2026 • By Adrian Calder

A recent analysis of the Anthropic Interviewer dataset shows that web-enabled agentic large language models (LLMs) can link anonymised interview transcripts to published papers. The study finds six successful re-identifications from 24 scientist transcripts, at low cost and fast runtimes, raising practical privacy and data release concerns for research custodians.

Improved constitutional classifiers slash jailbreak costs Defenses

Improved constitutional classifiers slash jailbreak costs

Fri, Jan 09, 2026 • By Natalie Kestrel

Researchers present enhanced Constitutional Classifiers that defend large language models (LLMs) from universal jailbreaks while cutting compute by about 40x and keeping refusals at roughly 0.05 percent. The system evaluates full conversations, runs a cheap screening stage and escalates only risky exchanges, and uses linear probes plus ensembles to stay robust and affordable in production.

Looking for older articles?

Browse our complete archive of AI security news and analysis

Browse Archive