Archive | ShortSpan.ai

August 2025

Thu, Aug 14, 2025 • By Clara Nyx

New Defense Exposes Flaws in LLM Tool Chains

A new defense framework, MCP-Guard, defends LLMs that call external tools from prompt injection and data leaks. The paper introduces a three-stage pipeline and a 70,448-sample benchmark. It reports a 96.01% detector accuracy and an overall 89.63% pipeline accuracy, promising practical protection for real deployments.

Defenses

Tue, Aug 12, 2025 • By James Armitage

AI Fingerprinting Advances Force Practical Defenses

New research shows automated methods can identify which LLM produced text with high accuracy using only a handful of targeted queries. The study also demonstrates a practical semantic-preserving filter that drastically reduces fingerprinting success while keeping meaning. This raises immediate privacy risks and offers a usable mitigation for deployed systems.

Attacks

Mon, Aug 11, 2025 • By Elise Veyron

Researchers Expose Few-Query Attacks on Multi-Task AI

New research shows practical black-box attacks that use only a few dozen to a few hundred queries to fool multi-task AI services. The method transfers adversarial text across tasks like translation, summarization, and image generation, affecting commercial APIs and large models. This raises urgent operational risks for public-facing AI systems and content pipelines.

Thinking Mode Raises Jailbreak Risk, Fixable Fast

Attacks

Mon, Aug 11, 2025 • By Lydia Stratus

Thinking Mode Raises Jailbreak Risk, Fixable Fast

New research finds that enabling chain-of-thought "thinking mode" in LLMs increases jailbreak success, letting attackers coax harmful outputs. The paper shows longer internal reasoning and educational-style justifications make models vulnerable, and introduces a lightweight "safe thinking intervention" that meaningfully reduces risk in real deployments.

Reinforcement Learning Improves Autonomous Pentest Success

Pentesting

Mon, Aug 11, 2025 • By Rowan Vale

Reinforcement Learning Improves Autonomous Pentest Success

New Pentest-R1 shows that combining offline expert walkthroughs with online interactive training helps smaller AI agents perform real multi-step penetration tests. The system raises success rates and cuts token use, but absolute performance stays modest. This matters for defenders who want automated, repeatable tests and for risk managers worried about misuse.

Secure Your Code, Fast: Introducing Automated Security Reviews with Claude Code

Enterprise

Thu, Aug 07, 2025 • By Dave Jones

Secure Your Code, Fast: Introducing Automated Security Reviews with Claude Code

This article explores Anthropic’s Claude Code, an AI-driven tool designed to automate security code reviews. Authored by Anthropic researchers, Claude Code highlights the potential for AI to augment security workflows by identifying vulnerabilities quickly and consistently. The discussion balances its practical benefits against inherent risks such as over-reliance and false positives, providing security pros with actionable insights for safe AI integration.

Program Analysis Stops Prompt Injection in AI Agents

Defenses

Mon, Aug 04, 2025 • By Dr. Marcus Halden

Program Analysis Stops Prompt Injection in AI Agents

AgentArmor treats an AI agent's runtime trace like a small program, analyzing data and tool calls to spot prompt injection. Tests show strong detection with high true positives and low false alarms, cutting attack success dramatically. Practical limits include dependency errors and extra runtime cost before enterprise deployment.

Researchers Outsmart LLM Guards with Word Puzzles

Attacks

Mon, Aug 04, 2025 • By Adrian Calder

Researchers Outsmart LLM Guards with Word Puzzles

New research shows a simple trick, turning harmful prompts into familiar word puzzles, lets attackers bypass modern LLM safety filters. The method, PUZZLED, masks keywords as anagrams, crosswords or word searches and achieves high success across top models, exposing a practical weakness in reasoning-based defenses that organizations must address.

New Cybersecurity LLM Promises Power, Raises Risks

Enterprise

Fri, Aug 01, 2025 • By James Armitage

New Cybersecurity LLM Promises Power, Raises Risks

A new instruction-tuned cybersecurity LLM, Foundation-Sec-8B-Instruct, is publicly released and claims to outperform Llama 3.1 and rival GPT-4o-mini on threat tasks. It promises faster incident triage and smarter analyst assistance, but limited transparency on training data and safeguards raises real-world safety and misuse concerns for defenders.

LLMs Automate Penetration Tasks, Exposing Infra Weaknesses

Pentesting

Fri, Aug 01, 2025 • By Lydia Stratus

LLMs Automate Penetration Tasks, Exposing Infra Weaknesses

New research shows a modern LLM can autonomously solve most beginner capture-the-flag tasks, finding files, decoding data, and issuing network commands with human-speed accuracy. That success lowers the skills barrier for attackers and exposes specific infrastructure gaps. Operators must apply practical hardening to endpoints, GPUs, vector stores, secrets and data paths now.

July 2025

Stop Fully Autonomous AI Before It Decides

Society

Thu, Jul 31, 2025 • By Adrian Calder

Stop Fully Autonomous AI Before It Decides

This paper argues that handing systems full autonomy is risky and unnecessary. It finds misaligned behaviours, deception, reward hacking and a surge in reported incidents since early 2023. The authors urge human oversight, adversarial testing and governance changes to avoid systems that can form their own objectives and bypass controls.

Agents

Tue, Jul 29, 2025 • By Dave Jones

Autonomous AI Agents: Hidden Security Risks in SmolAgents CodeAgent

This article reviews an NCC Group analysis by Ben Williams exposing security vulnerabilities in autonomous AI agents built with the SmolAgents framework, specifically CodeAgent. It details how insecure configurations can lead to command injection, data leakage, and sandbox escapes. The discussion balances AI’s automation benefits with practical mitigation strategies for safely deploying autonomous agents in security-sensitive environments.

Study Exposes Generative AI Workplace Disruptions

Society

Thu, Jul 10, 2025 • By Natalie Kestrel

Study Exposes Generative AI Workplace Disruptions

New research analyzes 200,000 anonymized Bing Copilot chats and finds people mostly use generative AI for information gathering and writing. The study says knowledge work, office support, and sales face the biggest applicability. This signals broad workplace shifts, but the dataset and opaque success metrics raise questions about scope and vendor claims.

June 2025

Pentesting

Mon, Jun 30, 2025 • By Adrian Calder

Reinforcement Learning Accelerates Automated Web Pentesting

New research shows a reinforcement learning agent can find web-application vulnerabilities faster and with far fewer actions than blind scanning. Trained on simulated site graphs with geometric priors, the approach produces compact models and CVE-mapped reports. Real-world gaps and tool limits mean human oversight and validation remain essential.

Stop Calling Tools Autonomous: Demand Human Oversight

Defenses

Mon, Jun 30, 2025 • By Elise Veyron

Stop Calling Tools Autonomous: Demand Human Oversight

New research shows many cybersecurity AIs are semi-autonomous, not independent agents. That mislabeling risks reduced human oversight, false positives, and legal exposure. The paper offers a six-level taxonomy and urges clear capability disclosure, human validation, and governance so organizations capture AI speed without ceding critical decisions.

Aligning AI Amplifies Misuse Risks, Research Warns

Society

Wed, Jun 04, 2025 • By Theo Solander

Aligning AI Amplifies Misuse Risks, Research Warns

New research shows that efforts to make advanced AI obedient can reduce one catastrophe risk but increase another: deliberate human misuse. The paper finds many current alignment methods plausibly amplify misuse potential and argues that technical fixes must be paired with robustness, control tools and stronger governance to prevent catastrophic abuse.

May 2025

AI Code Iterations Introduce More Security Flaws

Defenses

Tue, May 20, 2025 • By Dr. Marcus Halden

AI Code Iterations Introduce More Security Flaws

New research finds that iterative AI code improvements often add security flaws rather than fix them. In controlled tests, vulnerabilities rise sharply after a few automated iterations and different prompt styles create distinct risk patterns. The study urges mandatory human checks, routine analysis tools, and limits on consecutive AI-only edits to prevent unsafe regressions.

Experts Deploy Offensive Tests to Harden AI

Enterprise

Fri, May 09, 2025 • By James Armitage

Experts Deploy Offensive Tests to Harden AI

New research urges organisations to run proactive, offensive security tests across the AI lifecycle to uncover hidden weaknesses that traditional defences miss. It finds red teaming, targeted penetration tests and simulated attacks reveal practical risks like model extraction and prompt injection. This changes how companies should prioritise fixes and detection for high-risk AI deployments.

April 2025

AI Hackers Slash Security Testing Time and Cost

Pentesting

Tue, Apr 08, 2025 • By James Armitage

AI Hackers Slash Security Testing Time and Cost

New research presents CAI, an open framework that automates security testing and finds vulnerabilities far faster than people. It lowers costs, lets non-experts surface real bugs, and challenges big bug-bounty platforms. The work shows clear benefits but raises urgent questions about oversight, model choice, and safe deployment.

March 2025

New Framework Reveals AI's Cyberattack Leverage

Defenses

Mon, Mar 17, 2025 • By Clara Nyx

New Framework Reveals AI's Cyberattack Leverage

Researchers build a structured way to test how advanced AI boosts real cyberattacks and where defenders are blind. They analyze thousands of incidents and run model tests, finding AI speeds and scales certain stages like reconnaissance and evasion. The work helps security teams prioritize defenses before attackers exploit these gaps.

February 2025

Researchers Unify Tools to Harden AI Penetration Testing

Pentesting

Tue, Feb 18, 2025 • By Rowan Vale

Researchers Unify Tools to Harden AI Penetration Testing

New research introduces AutoPT-Sim, a unified simulation framework for automated penetration testing that models attackers, defenders, and dynamic networks. It releases a network generator and datasets at multiple scales. This standardizes experimentation, lowers barriers for reproducible AI security tests, and warns that evaluation metrics still need community agreement.

January 2025

AI Chips Away Human Control, Study Warns

Society

Wed, Jan 29, 2025 • By Theo Solander

AI Chips Away Human Control, Study Warns

New research argues that incremental AI improvements can quietly erode human influence over the economy, culture, and states, creating reinforcing feedback loops that may become effectively irreversible. The paper highlights systemic risks that emerge from normal incentives, suggesting teams must monitor cross-domain effects, strengthen democratic controls, and build civilization-scale safeguards.

Experts Split Over AI Doom as Safety Literacy Lags

Society

Mon, Jan 27, 2025 • By Adrian Calder

Experts Split Over AI Doom as Safety Literacy Lags

A survey of 111 AI professionals finds two clear camps: one treats AI as a controllable tool, the other as an uncontrollable agent. Most experts express concern about catastrophic risk, yet many lack familiarity with core safety ideas. The result shifts where policy and security teams should focus their attention.

Autonomous Pentest Framework Outsmarts GPT Models

Pentesting

Fri, Jan 24, 2025 • By Rowan Vale

Autonomous Pentest Framework Outsmarts GPT Models

New research shows an automated, multi-agent pentesting framework can outperform single-model baselines and complete an end-to-end attack on a real target in at least one case. This speeds up vulnerability discovery and cuts cost for defenders, but it also lowers the bar for misuse, demanding immediate governance and controls.

November 2024

Agents

Fri, Nov 08, 2024 • By Elise Veyron

LLM Agents Automate Penetration Testing, Raise Risks

New research shows LLM-driven agent frameworks can automate end-to-end penetration testing, greatly speeding discovery and exploitation while matching or exceeding some human workflows. This boosts assessment coverage and lowers costs, but also widens the attack surface and enables more scalable misuse. Organizations must balance automation benefits with governance, controls, and oversight.