AI agents fuzz industrial control protocols effectively

Pentesting

Published: Mon, Oct 06, 2025 • By Adrian Calder

AI agents fuzz industrial control protocols effectively

Researchers present MALF, a multi-agent Large Language Model (LLM) fuzzing framework that finds protocol-aware faults in industrial control systems (ICS). Using Retrieval-Augmented Generation (RAG) and QLoRA tuning, MALF reports 88–92% test pass rates, broad protocol coverage, many exception triggers and three zero-days in a power-plant range, highlighting both defensive value and dual-use risk.

Industrial control systems (ICS) run our grids, factories and water treatment plants, and their bespoke protocols are awkward to test. The paper presents MALF, a Multi-Agent LLM Fuzzing framework that couples retrieval of protocol documentation with LLM-powered test generation and a feedback loop of analysis to produce targeted, protocol‑aware inputs.

For clarity, Large Language Model (LLM) refers to the generative models used for text and structured output, and Retrieval-Augmented Generation (RAG) means the system pulls context from a knowledge base to ground those outputs. The authors also use QLoRA fine-tuning to make the models small and efficient enough to run on constrained hardware.

MALF is built from four cooperating agents: a seed generator that extracts protocol fields from live traffic, a test-case generator that mutates those seeds while respecting field semantics, a feedback analysis agent that classifies responses from devices, and a communications module that coordinates work and injection. The system is deliberately protocol-aware; it leverages a RAG knowledge base of about 320 pages covering commands and constraints, and it uses stepwise reasoning to validate field values and sequences before sending tests.

The results are noteworthy but not magical. On Modbus/TCP, S7Comm and Ethernet/IP the framework reports a test case pass rate of 88–92 percent, seed coverage above 90 percent and Shannon entropy in the reported range of 4.2–4.6 bits, which the authors use to argue for diverse, structured mutations. In operator-range trials against commercial programmable logic controllers (PLCs) MALF generated roughly 22 exception triggers per 24 hours and uncovered three zero-day flaws; one was disclosed as CNVD-2024-16009 related to a PLC connection denial of service.

The paper openly discusses trade-offs. Precision comes at the cost of throughput: the model-driven, multi-agent approach produces higher-quality, protocol-compliant tests but coordinates more and runs fewer raw mutations per unit time than dumb, high-speed fuzzers. The QLoRA quantisation and agent partitioning reduce resource needs and make field deployment easier, but they also require robust fault tolerance and careful parallelisation to regain lost throughput.

Why it matters

MALF shows that AI can automate nuanced protocol understanding and find faults traditional fuzzers miss. That is good for defenders who need realistic, high‑quality tests to harden ICS devices. It is also a reminder of dual use: the same automation that helps security teams could lower the bar for attackers to discover and weaponise complex protocol flaws.

What to do next

If you manage ICS, treat AI fuzzers as a force multiplier rather than a silver bullet. Prioritise incorporating high‑quality fuzzing into acceptance tests, maintain air-gapped or simulated ranges for destructive testing, and insist vendors provide protocol documentation or test suites. From an operational security angle, enforce network segmentation, monitor for anomalous protocol sequences, and control access to tooling and datasets that could be repurposed. Finally, plan for disclosure: if an AI tool finds a bug, have a clear path to coordinated vendor notification and mitigation rather than leaking details into the wild.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

MALF: A Multi-Agent LLM Framework for Intelligent Fuzzing of Industrial Control Protocols

Authors: Bowei Ning, Xuejun Zong, and Kan He

Industrial control systems (ICS) are vital to modern infrastructure but increasingly vulnerable to cybersecurity threats, particularly through weaknesses in their communication protocols. This paper presents MALF (Multi-Agent LLM Fuzzing Framework), an advanced fuzzing solution that integrates large language models (LLMs) with multi-agent coordination to identify vulnerabilities in industrial control protocols (ICPs). By leveraging Retrieval-Augmented Generation (RAG) for domain-specific knowledge and QLoRA fine-tuning for protocol-aware input generation, MALF enhances fuzz testing precision and adaptability. The multi-agent framework optimizes seed generation, mutation strategies, and feedback-driven refinement, leading to improved vulnerability discovery. Experiments on protocols like Modbus/TCP, S7Comm, and Ethernet/IP demonstrate that MALF surpasses traditional methods, achieving a test case pass rate (TCPR) of 88-92% and generating more exception triggers (ETN). MALF also maintains over 90% seed coverage and Shannon entropy values between 4.2 and 4.6 bits, ensuring diverse, protocol-compliant mutations. Deployed in a real-world Industrial Attack-Defense Range for power plants, MALF identified critical vulnerabilities, including three zero-day flaws, one confirmed and registered by CNVD. These results validate MALF's effectiveness in real-world fuzzing applications. This research highlights the transformative potential of multi-agent LLMs in ICS cybersecurity, offering a scalable, automated framework that sets a new standard for vulnerability discovery and strengthens critical infrastructure security against emerging threats.

🔍 ShortSpan Analysis of the Paper

Problem

Industrial control systems rely on specialised communication protocols that are vital to critical infrastructure but remain vulnerable to cybersecurity threats. Weaknesses in these industrial control protocols can be exploited to disrupt safety critical processes, enable lateral movement within ICS networks, or cause outages in power plants. Fuzz testing is a key technique for dynamic vulnerability discovery, but testing modern industrial control protocols is challenging due to the structured, stateful, and timing sensitive nature of protocols such as Modbus TCP, S seven com and Ethernet IP. Traditional fuzzers often struggle with protocol compliance, mutation diversity and real world realism, limiting their ability to identify latent flaws and zero day vulnerabilities in ICS deployments.

Approach

The paper introduces MALF, a fully automated multi agent large language model based fuzzing framework designed for intelligent fuzzing of industrial control protocols. MALF combines domain specific knowledge retrieval with large language models fine tuned for protocol aware input generation. It uses Retrieval Augmented Generation to access a knowledge base of ICP documentation and vulnerability information, and QLoRA to memory efficiently fine tune the model for fuzz testing tasks. The architecture comprises four core components: Seed Generation Agent, Test Case Generation Agent, Feedback Analysis Agent and a Communication Interaction Module. Agents operate in a feedback loop coordinated via ZeroMQ to generate protocol compliant seeds, mutate them into diverse test cases and adapt strategies in real time. The system relies on four bit quantised Llama 3 models tuned for specific tasks, with a RAG knowledge base containing about 320 pages and 750 000 characters covering 180 commands. RAG used for context specific retrieval and background augmented prompting; CoT reasoning guides step wise validation of protocol fields and sequence constraints.

Key technical elements include memory efficient tuning with QLoRA that introduces low rank adapters into Transformer layers and a 4 bit quantisation that reduces memory usage by up to 75 percent. This enables deployment on industrial hardware. The Seed Generation Agent uses real time traffic capture with CoT and RAG to extract protocol fields and generate protocol compliant seeds, while the Test Case Generation Agent applies field, structural and semantic mutations to produce diverse test cases. The Feedback Analysis Agent classifies responses from the system under test into normal, abnormal and critical anomalies, assigns severity scores, and adjusts mutation density and focus on high risk fields through dynamic strategy updates. The Communication Interaction Module handles real time traffic capture, test case injection, inter agent coordination, fault tolerance and extensibility. The framework is evaluated on Modbus TCP, S seven comm and Ethernet IP protocols and tested in an industrial attack defence range for power plants.

Key Findings

The framework achieves high test case pass rates across protocols, reporting a test case pass rate of 88 to 92 percent for Modbus TCP, S seven comm and Ethernet IP, outperforming baseline fuzzers in these industrial settings.
Mutations are highly diverse and protocol compliant, with seed coverage exceeding 90 percent and Shannon entropy varying between 4.2 and 4.6 bits, indicating broad exploration of protocol field values and structures.
MALF produces more exception triggers than comparative tools, with around 22 crashes per 24 hour fuzzing cycle on tested commercial PLCs, significantly higher than baselines in similar conditions.
Zero day vulnerabilities were uncovered in a real world industrial range, including three zero day flaws with one CNVD registered. Notable findings include a CNVD 2024 16009 related to a PLC connection denial of service and other vulnerable behaviours in S seven and IC models.
High coverage and mutation diversity are driven by the domain aware knowledge base and dynamic feedback loops, enabling MALF to explore rarely used functions and vendor specific constraints that traditional fuzzers may miss.

Limitations

While MALF delivers high quality test cases and strong vulnerability discovery, a trade off exists in throughput with a focus on precision. Ablation studies show that removing retrieval of domain knowledge (RAG) or fine tuning (QLoRA) reduces protocol compliance and therefore degrades test case quality and vulnerability discovery performance. The system relies on a multi agent coordination scheme that may incur coordination overhead and require robust fault handling. The experiments note slightly lower throughput and indicate potential improvements through parallelisation and hardware acceleration while preserving test quality.

Why It Matters

The results demonstrate that AI driven multi agent fuzzing can significantly improve vulnerability discovery in industrial control protocols. MALF offers a scalable automated approach that increases protocol coverage and mutation diversity, enabling more thorough validation pipelines for ICS and strengthening resilience of critical infrastructure. The framework underscores dual use risks as AI enabled fuzzers could automate vulnerability discovery and inform exploit development as well as defence. At the same time it highlights practical mitigation opportunities by providing high quality security testing for hardening ICS protocols and validating validation pipelines. Real world deployment in power plant ICS demonstrates the potential to identify previously unknown issues including zero day vulnerabilities, contributing to safer and more secure industrial operations and informing security standards and defensive measures for critical infrastructure.

Attribution Original paper on arXiv