Enterprise — AI Security Archive

April 2026

Tue, Apr 07, 2026 • By Clara Nyx

Zero-knowledge proofs police risky LLM fine-tuning

New work proposes Fine-Tuning Integrity: zero-knowledge proofs that an updated model only changed within a policy-defined class such as norm-bounded, low-rank or sparse. Proofs stay small and quick to verify regardless of model size, enabling supply-chain audits for Large Language Model updates without exposing model weights.

LLMs Tackle Hardware Security Verification, With Evidence

Enterprise

Fri, Apr 03, 2026 • By Theo Solander

LLMs Tackle Hardware Security Verification, With Evidence

A new survey shows Large Language Models can speed pre‑silicon hardware security work, especially asset discovery and test‑plan generation. In an NVDLA case study, 31 directed transactions revealed forwarding without local privilege checks, with 30 flagged events. The authors stress grounding AI outputs in simulation and formal proofs to avoid unsafe conclusions.

Google outlines continuous defences for indirect prompt injection

Enterprise

Fri, Apr 03, 2026 • By Rowan Vale

Google outlines continuous defences for indirect prompt injection

Google details a continuous defence-in-depth approach to indirect prompt injection in Workspace with Gemini. It blends human and automated red teaming, an AI vulnerability rewards programme and OSINT with a governed vulnerability catalogue, synthetic data via Simula, layered deterministic and model-based controls, and end-to-end evaluations across Gmail and Docs.

March 2026

Real-time monitor spots LLM reasoning failures

Enterprise

Sun, Mar 29, 2026 • By Clara Nyx

Real-time monitor spots LLM reasoning failures

New research argues securing Large Language Models requires watching the chain of thought, not just the final text. It defines nine unsafe reasoning behaviours, shows distinct attack signatures across 4,111 traces, and reports about 85% detection accuracy from a parallel 'Reasoning Safety Monitor' that can interrupt bad steps. Latency and robustness remain open.

Finetuning Makes Aligned LLMs Regurgitate Copyrighted Books

Enterprise

Mon, Mar 23, 2026 • By Theo Solander

Finetuning Makes Aligned LLMs Regurgitate Copyrighted Books

New research shows that finetuning aligned Large Language Models to expand plot summaries into prose can trigger verbatim recall of copyrighted books. GPT-4o, Gemini-2.5-Pro and DeepSeek-V3.1 regurgitate up to 85–90% of held-out titles, including 460+ word spans, with prompts that contain no book text. The behaviour generalises across authors and models.

Enterprise

Wed, Mar 11, 2026 • By Elise Veyron

Framework curbs agentic LLM risks in enterprise SOC

New research proposes AgenticCyOps, a security architecture for multi‑agent Large Language Model (LLM) systems inside Security Operations Centres (SOC). It treats tool orchestration and memory management as primary trust boundaries, defines five defensive principles, and shows reduced exploitable interfaces versus a flat design. The evaluation is structural and flags notable trade‑offs.

Codex Security touts end-to-end AI patching agent

Enterprise

Mon, Mar 09, 2026 • By Clara Nyx

Codex Security touts end-to-end AI patching agent

Codex Security arrives as a research preview claiming an AI agent that uses project context to detect, validate and patch vulnerabilities. The promise is less noise and faster remediation. The gaps are big: no methods, datasets or benchmarks. Real concerns remain over patch correctness, provenance, supply-chain risk and data handling.

November 2025

Standard taxonomy translates AI threats into monetary risk

Enterprise

Sat, Nov 29, 2025 • By Lydia Stratus

Standard taxonomy translates AI threats into monetary risk

A new standardised AI threat taxonomy maps 52 operational sub‑threats across nine domains to business loss categories such as confidentiality, integrity, availability, legal and reputation. It enables quantitative risk modelling, supports regulatory audits and helps security and compliance teams convert technical vulnerabilities into defensible monetary exposure for insurance, reserves and governance.

Small Data Poisoning Tops Healthcare AI Risks

Enterprise

Mon, Nov 17, 2025 • By Adrian Calder

Small Data Poisoning Tops Healthcare AI Risks

New analysis finds small data poisoning attacks, using as few as 100–500 malicious samples, can compromise healthcare AI models across imaging, documentation and decision systems. Insiders and supply‑chain paths make attacks practical. Detection often takes months to years, and current regulations and federated learning frequently hinder discovery and attribution.

October 2025

Researchers Deploy Unified Framework to Curb LLM Threats

Enterprise

Tue, Oct 07, 2025 • By Natalie Kestrel

Researchers Deploy Unified Framework to Curb LLM Threats

A new paper introduces the Unified Threat Detection and Mitigation Framework (UTDMF), a real-time pipeline for Large Language Models (LLMs). Tested on Llama-3.1, GPT-4o and Claude-3.5, the system reports 92% prompt-injection detection, 65% fewer deceptive outputs and 78% fairness gains, and ships an API toolkit for enterprise integration.

August 2025

LLMs Aid SOC Analysts, But Do Not Replace Them

Enterprise

Wed, Aug 27, 2025 • By Clara Nyx

LLMs Aid SOC Analysts, But Do Not Replace Them

A 10-month study of 3,090 queries from 45 SOC analysts finds LLMs act as on-demand cognitive aids for interpreting telemetry and polishing reports, not as decision-makers. Usage grows from casual to routine among power users. This shows promise for efficiency but warns against unchecked trust and single-site overreach.

Enterprise

Sun, Aug 24, 2025 • By Dave Jones

GenAI Complacency: The Silent Cybersecurity Crisis Enterprises Ignore

Enterprises are rapidly adopting generative AI, but many underestimate the risks. Experts warn that by 2027, over 40% of breaches could stem from misused AI tools, unless organisations proactively manage prompt injection, data leakage, and AI-driven attack vectors.

Google Alerts: Indirect Prompt Injection Abuse Targets Gemini Assistant

Enterprise

Sat, Aug 23, 2025 • By Dave Jones

Google Alerts: Indirect Prompt Injection Abuse Targets Gemini Assistant

Google has issued a warning about “indirect prompt injection” attacks that can coerce AI systems into leaking sensitive data. The attack embeds hidden instructions in benign content, bypassing standard detection and creating a new AI-driven social engineering threat.

Lenovo AI Chatbot Flaw Opens Door to XSS Attacks and Session Hijacking

Enterprise

Fri, Aug 22, 2025 • By Dave Jones

Lenovo AI Chatbot Flaw Opens Door to XSS Attacks and Session Hijacking

Researchers uncovered a critical flaw in Lenovo’s AI chatbot, “Lena,” which allowed attackers to inject malicious prompts leading to cross-site scripting attacks. Exploitation could have exposed sensitive session cookies, enabled chat hijacking, and opened paths into enterprise environments.

Secure Your Code, Fast: Introducing Automated Security Reviews with Claude Code

Enterprise

Thu, Aug 07, 2025 • By Dave Jones

Secure Your Code, Fast: Introducing Automated Security Reviews with Claude Code

This article explores Anthropic’s Claude Code, an AI-driven tool designed to automate security code reviews. Authored by Anthropic researchers, Claude Code highlights the potential for AI to augment security workflows by identifying vulnerabilities quickly and consistently. The discussion balances its practical benefits against inherent risks such as over-reliance and false positives, providing security pros with actionable insights for safe AI integration.

New Cybersecurity LLM Promises Power, Raises Risks

Enterprise

Fri, Aug 01, 2025 • By James Armitage

New Cybersecurity LLM Promises Power, Raises Risks

A new instruction-tuned cybersecurity LLM, Foundation-Sec-8B-Instruct, is publicly released and claims to outperform Llama 3.1 and rival GPT-4o-mini on threat tasks. It promises faster incident triage and smarter analyst assistance, but limited transparency on training data and safeguards raises real-world safety and misuse concerns for defenders.

← Back to archive