Defenders deploy encrypted prompts to blunt AI attacks

Defenses

Published: Fri, Nov 07, 2025 • By Dr. Marcus Halden

Defenders deploy encrypted prompts to blunt AI attacks

A recent study examines using Large Language Models (LLMs) inside security tools and finds practical ways to reduce new AI-driven risks. Encrypted prompts and a decoupled model architecture both improve safety and accuracy, particularly for intrusion detection. The paper warns of prompt leakage, supply chain risks and higher compute and explainability costs.

Large Language Models (LLMs) are reshaping both offence and defence in cyber security. This paper looks at how to fold LLMs into existing tools without handing attackers a bigger hammer. The authors adopt a pragmatic posture: LLMs can add context and scale, but they also change where and how we must protect data and commands.

The clearest technical proposal is encrypted prompts. A prompt becomes a small, signed packet that carries a delimiter, a permission tag and a public key check. Before any LLM-initiated action, the system verifies the permission tag against that public key. In plain terms, the model does not act on free-text instructions alone; actions must arrive bound to an approved credential. That reduces classic prompt injection where an attacker slips malicious instructions into otherwise benign inputs.

Beyond prompts, the paper outlines a four-layer architecture for LLM-enabled security: data processing, model integration, cybersecurity application, and continuous learning. The data layer emphasises collection and sanitisation from traffic, logs and mail while preserving privacy. The model layer describes fine tuning and running multiple specialist models for threat-specific tasks. The application layer maps those models to tasks such as phishing detection, threat intelligence and automated response. The final layer keeps models current with transfer learning and feedback loops. The design is modular so defensive teams can swap components and limit blast radius when something goes wrong.

How they evaluated intrusion detection

The authors compare three approaches for intrusion detection. First, end-to-end prompting with a frozen LLM performs poorly in the reported tests, with accuracy in the range quoted by the paper. A retrieval-augmented generation approach performs better, reaching strong results for known attacks but weaker for novel, out-of-distribution activity. The third approach is a decoupled model design: a task-specific, fine tuned classifier handles detection and a frozen LLM provides human-readable explanations. That decoupled pattern delivers the highest reported accuracy in the study, although it is not free of operational costs.

The practical caveats matter. Integrating LLMs increases computational demand and raises explainability and ethical questions. New attack surfaces appear: prompt leakage, manipulated permissions, and risks from outsourcing model hosting or from supply chain weaknesses. Encrypted prompts reduce one class of attack but introduce key management problems and dependency on correct verification logic. The paper is sound on architecture but cautious where operational realities hit: testing, logging and rollback plans are essential.

Operational takeaways

Use signed, verifiable prompts to limit prompt injection, but treat key management as a first-class risk.
Prefer a decoupled architecture: a specialist detector for decisions and a frozen LLM for explanations to reduce attack surface.
Test for prompt leakage and supply-chain scenarios, and budget for higher compute and explainability work.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

Large Language Models for Cyber Security

Authors: Raunak Somani and Aswani Kumar Cherukuri

This paper studies the integration off Large Language Models into cybersecurity tools and protocols. The main issue discussed in this paper is how traditional rule-based and signature based security systems are not enough to deal with modern AI powered cyber threats. Cybersecurity industry is changing as threats are becoming more dangerous and adaptive in nature by levering the features provided by AI tools. By integrating LLMs into these tools and protocols, make the systems scalable, context-aware and intelligent. Thus helping it to mitigate these evolving cyber threats. The paper studies the architecture and functioning of LLMs, its integration into Encrypted prompts to prevent prompt injection attacks. It also studies the integration of LLMs into cybersecurity tools using a four layered architecture. At last, the paper has tried to explain various ways of integration LLMs into traditional Intrusion Detection System and enhancing its original abilities in various dimensions. The key findings of this paper has been (i)Encrypted Prompt with LLM is an effective way to mitigate prompt injection attacks, (ii) LLM enhanced cyber security tools are more accurate, scalable and adaptable to new threats as compared to traditional models, (iii) The decoupled model approach for LLM integration into IDS is the best way as it is the most accurate way.

🔍 ShortSpan Analysis of the Paper

Problem

This paper investigates the integration of Large Language Models into cybersecurity tools and protocols, addressing the limits of traditional rule based and signature based systems in the face of AI powered threats. It argues that cyber threats are increasingly dangerous and adaptive because they leverage AI capabilities, and that incorporating LLMs can make security tools scalable, context aware and intelligent to mitigate these evolving threats. The study examines how LLMs work, how encrypted prompts can prevent prompt injection attacks, and a four layer architectural framework for integrating LLMs into cybersecurity tools. It also discusses methods for embedding LLMs into traditional intrusion detection systems and highlights the decoupled model approach as the most accurate for IDS. The paper identifies attack surfaces such as prompt injection, data leakage through prompts and risks from external LLMs or supply chain weaknesses, and proposes mitigations including secure prompt handling with encryption, robust model isolation via a decoupled architecture, and layered IDS integration. While focusing on security implications, it notes ethical concerns and computa tional demands and does not explicitly address broad societal impacts.

Approach

The authors provide an overview of LLM architecture including pre training, fine tuning and use, describing how prompts and the temperature parameter influence responses. They detail encrypted prompts as a security mechanism that embeds current permissions in prompts, with a three component structure of a delimiter, a permission tag and a public key to verify permissions before LLM generated actions such as API calls are executed. The framework for LLM based cybersecurity tools is described as four layers: data processing, language model integration, cybersecurity application, and continuous learning. The data processing layer collects and cleans data from sources such as network traffic, logs and emails while preserving privacy. The language model integration layer fine tunes LLMs for cybersecurity tasks and can host multiple models such as SecureBERT and CyberBERT. The cybersecurity application layer covers threat intelligence, anomaly detection, phishing detection and incident response automation, with examples like Crimson. The continuous learning layer uses transfer learning and reinforcement learning with feedback loops to keep models up to date. The paper also compares three IDS integration approaches: end to end prompting with frozen LLMs, retrieval augmented generation, and a decoupled system consisting of a task specific fine tuned classifier together with a frozen LLM for explanation. It concludes with a description of a continuous intrusion detection framework and a lightweight BERT based model for classifying network traffic.

Key Findings

Encrypted prompts paired with LLMs effectively mitigate prompt injection attacks by binding actions to defined permissions verified via public key mechanisms.
LLM enhanced cybersecurity tools demonstrate greater accuracy, scalability and adaptability to new threats than traditional models across security tasks.
Within intrusion detection, a decoupled model approach yields the highest reported accuracy; end to end prompting achieves substantially lower accuracy (approximately 28 to 46 per cent in the cited evaluation), while a retrieval augmented generation approach attains up to 82 per cent accuracy for known attacks and 62 per cent for zero day or out of distribution attacks; the decoupled approach reports weighted accuracy up to about 98 per cent in the cited study.

Limitations

The work acknowledges ethical challenges, explainability concerns and substantial computational requirements associated with integrating LLMs into security workflows. It also notes new attack surfaces such as prompt leakage and manipulation, as well as risks from deploying external LLMs or relying on supply chains. The authors emphasise mitigations including secure prompt handling with encryption, robust model isolation through a decoupled architecture, and layered integration of IDS to limit blast radius, but these remain areas requiring careful testing and validation.

Why It Matters

The paper highlights practical implications for security practice: encrypted prompts can defend against prompt injection, and a four layer, decoupled architecture can improve accuracy and resilience of IDS. Architectural patterns presented support modular, scalable deployment across data processing, model integration, cybersecurity applications and continuous learning, enabling more capable detection and automated response to AI powered threats. However safe adoption requires addressing ethical and explainability concerns, handling data privacy in processing layers, managing computational demands, and remaining vigilant to new attack surfaces such as prompt leakage and manipulation in external LLM integrations.

Attribution Original paper on arXiv