April 2026
Zero-knowledge proofs police risky LLM fine-tuning
New work proposes Fine-Tuning Integrity: zero-knowledge proofs that an updated model only changed within a policy-defined class such as norm-bounded, low-rank or sparse. Proofs stay small and quick to verify regardless of model size, enabling supply-chain audits for Large Language Model updates without exposing model weights.
LLMs Tackle Hardware Security Verification, With Evidence
A new survey shows Large Language Models can speed pre‑silicon hardware security work, especially asset discovery and test‑plan generation. In an NVDLA case study, 31 directed transactions revealed forwarding without local privilege checks, with 30 flagged events. The authors stress grounding AI outputs in simulation and formal proofs to avoid unsafe conclusions.
Google outlines continuous defences for indirect prompt injection
Google details a continuous defence-in-depth approach to indirect prompt injection in Workspace with Gemini. It blends human and automated red teaming, an AI vulnerability rewards programme and OSINT with a governed vulnerability catalogue, synthetic data via Simula, layered deterministic and model-based controls, and end-to-end evaluations across Gmail and Docs.
March 2026
Real-time monitor spots LLM reasoning failures
New research argues securing Large Language Models requires watching the chain of thought, not just the final text. It defines nine unsafe reasoning behaviours, shows distinct attack signatures across 4,111 traces, and reports about 85% detection accuracy from a parallel 'Reasoning Safety Monitor' that can interrupt bad steps. Latency and robustness remain open.
Finetuning Makes Aligned LLMs Regurgitate Copyrighted Books
New research shows that finetuning aligned Large Language Models to expand plot summaries into prose can trigger verbatim recall of copyrighted books. GPT-4o, Gemini-2.5-Pro and DeepSeek-V3.1 regurgitate up to 85–90% of held-out titles, including 460+ word spans, with prompts that contain no book text. The behaviour generalises across authors and models.
Framework curbs agentic LLM risks in enterprise SOC
New research proposes AgenticCyOps, a security architecture for multi‑agent Large Language Model (LLM) systems inside Security Operations Centres (SOC). It treats tool orchestration and memory management as primary trust boundaries, defines five defensive principles, and shows reduced exploitable interfaces versus a flat design. The evaluation is structural and flags notable trade‑offs.
Codex Security touts end-to-end AI patching agent
Codex Security arrives as a research preview claiming an AI agent that uses project context to detect, validate and patch vulnerabilities. The promise is less noise and faster remediation. The gaps are big: no methods, datasets or benchmarks. Real concerns remain over patch correctness, provenance, supply-chain risk and data handling.
November 2025
Standard taxonomy translates AI threats into monetary risk
A new standardised AI threat taxonomy maps 52 operational sub‑threats across nine domains to business loss categories such as confidentiality, integrity, availability, legal and reputation. It enables quantitative risk modelling, supports regulatory audits and helps security and compliance teams convert technical vulnerabilities into defensible monetary exposure for insurance, reserves and governance.
Small Data Poisoning Tops Healthcare AI Risks
New analysis finds small data poisoning attacks, using as few as 100–500 malicious samples, can compromise healthcare AI models across imaging, documentation and decision systems. Insiders and supply‑chain paths make attacks practical. Detection often takes months to years, and current regulations and federated learning frequently hinder discovery and attribution.
October 2025
August 2025
LLMs Aid SOC Analysts, But Do Not Replace Them
A 10-month study of 3,090 queries from 45 SOC analysts finds LLMs act as on-demand cognitive aids for interpreting telemetry and polishing reports, not as decision-makers. Usage grows from casual to routine among power users. This shows promise for efficiency but warns against unchecked trust and single-site overreach.
GenAI Complacency: The Silent Cybersecurity Crisis Enterprises Ignore
Enterprises are rapidly adopting generative AI, but many underestimate the risks. Experts warn that by 2027, over 40% of breaches could stem from misused AI tools, unless organisations proactively manage prompt injection, data leakage, and AI-driven attack vectors.
Google Alerts: Indirect Prompt Injection Abuse Targets Gemini Assistant
Google has issued a warning about “indirect prompt injection” attacks that can coerce AI systems into leaking sensitive data. The attack embeds hidden instructions in benign content, bypassing standard detection and creating a new AI-driven social engineering threat.
Lenovo AI Chatbot Flaw Opens Door to XSS Attacks and Session Hijacking
Researchers uncovered a critical flaw in Lenovo’s AI chatbot, “Lena,” which allowed attackers to inject malicious prompts leading to cross-site scripting attacks. Exploitation could have exposed sensitive session cookies, enabled chat hijacking, and opened paths into enterprise environments.
Secure Your Code, Fast: Introducing Automated Security Reviews with Claude Code
This article explores Anthropic’s Claude Code, an AI-driven tool designed to automate security code reviews. Authored by Anthropic researchers, Claude Code highlights the potential for AI to augment security workflows by identifying vulnerabilities quickly and consistently. The discussion balances its practical benefits against inherent risks such as over-reliance and false positives, providing security pros with actionable insights for safe AI integration.
New Cybersecurity LLM Promises Power, Raises Risks
A new instruction-tuned cybersecurity LLM, Foundation-Sec-8B-Instruct, is publicly released and claims to outperform Llama 3.1 and rival GPT-4o-mini on threat tasks. It promises faster incident triage and smarter analyst assistance, but limited transparency on training data and safeguards raises real-world safety and misuse concerns for defenders.
