RL Tricks Evade Sequence-Based Malware Detectors

Attacks

Published: Tue, Sep 16, 2025 • By Adrian Calder

RL Tricks Evade Sequence-Based Malware Detectors

Researchers show that reinforcement learning can craft realistic changes that fool sequence-based malware detectors. The attack generates constraint-aware perturbations to behavioural traces and maps those changes back to source code, keeping malware functional while evading detection. The finding warns that sequence models are brittle and need adversarial-aware, multi-layer defences.

Lede: New research demonstrates a practical way to fool sequence-based malware detectors that look at program behaviour. The authors use reinforcement learning to generate constrained perturbations to behaviour sequences, then translate those perturbations back into source-code changes so the malware still works while the detector misses it. This matters because it moves adversarial attacks from theoretical proofs into plausible operational techniques.

Nut graf: If you deploy detectors built on sequence models such as recurrent neural networks (RNNs), or you rely on behaviour logs for anomaly detection, take note. The paper shows an attack pipeline that respects practical constraints and avoids the unrealistic step of editing logs directly. That enlarges the threat model for security teams and for any critical system that trusts ML to catch bad behaviour.

Background: Sequence models inspect ordered events such as API calls, network actions or system calls to flag malicious activity. Prior adversarial work often manipulated those sequences in ways that were hard to implement in real software. The new approach uses a Deep Q-Network (DQN), a type of reinforcement learning (RL), paired with heuristic backtracking to pick perturbations that are feasible.

How it works: Rather than editing recorded traces, the researchers translate chosen perturbations back into source-code modifications or benign behaviours that the malware can execute. That avoids the implausible scenario of tampering with logs and demonstrates an end-to-end evasion route that preserves functionality.

Impact and risk: The upshot is not that all ML defences are useless, but that sequence-based detectors can be brittle when attackers optimise against them with realistic constraints. An attacker who can modify code or the build pipeline gains a credible path to bypass detectors that are not adversarial-aware.

Mitigations and what to do next:

Harden training with adversarial examples and constraint-aware simulations.
Combine sequence models with code-integrity checks, runtime sandboxing and diverse feature sets.
Monitor for unexplained behavioural changes and protect build and deployment pipelines.

Limitations and caveats: The work is a proof of concept. Practical success depends on attacker access to modify binaries or the build process and on the target environment. The paper does not claim easy universal bypasses, but it lowers the bar compared with earlier, more abstract attacks.

Kicker: For defenders, the safe bet is simple. Treat sequence detectors as one layer in a defence-in-depth posture, assume adversaries will optimise against models, and prioritise controls that an ML tweak cannot silently defeat.

Additional analysis of the original ArXiv paper

📋 Original Paper Title and Abstract

A Practical Adversarial Attack against Sequence-based Deep Learning Malware Classifiers

Authors: Kai Tan, Dongyang Zhan, Lin Ye, Hongli Zhang, and Binxing Fang

Sequence-based deep learning models (e.g., RNNs), can detect malware by analyzing its behavioral sequences. Meanwhile, these models are susceptible to adversarial attacks. Attackers can create adversarial samples that alter the sequence characteristics of behavior sequences to deceive malware classifiers. The existing methods for generating adversarial samples typically involve deleting or replacing crucial behaviors in the original data sequences, or inserting benign behaviors that may violate the behavior constraints. However, these methods that directly manipulate sequences make adversarial samples difficult to implement or apply in practice. In this paper, we propose an adversarial attack approach based on Deep Q-Network and a heuristic backtracking search strategy, which can generate perturbation sequences that satisfy practical conditions for successful attacks. Subsequently, we utilize a novel transformation approach that maps modifications back to the source code, thereby avoiding the need to directly modify the behavior log sequences. We conduct an evaluation of our approach, and the results confirm its effectiveness in generating adversarial samples from real-world malware behavior sequences, which have a high success rate in evading anomaly detection models. Furthermore, our approach is practical and can generate adversarial samples while maintaining the functionality of the modified software.

🔍 ShortSpan Analysis of the Paper

Problem

This paper studies practical adversarial attacks against sequence based malware detectors that use recurrent neural networks to analyse behavioural sequences. Although these models can detect malware by modelling sequence patterns, they are vulnerable to perturbations that change sequence characteristics and enable evasion. Traditional methods often delete or replace key behaviours or insert benign events, which can be hard to implement in real systems and may violate operational constraints. The authors propose a constraint aware attack that uses deep reinforcement learning to craft perturbations which preserve malware functionality while evading anomaly detectors, and they map these perturbations back to source code to avoid directly altering behaviour logs. The work underlines realistic evasion paths in the wild and highlights the need for stronger defenses, given the brittleness of behaviour log based detectors to adversarial perturbations.

Approach

The approach has three coordinated steps. First, a perturbation action set is constructed by mining representative benign subsequences through frequent pattern mining and representing them as directed graphs to capture sequential relationships. This defines where and what benign fragments can be inserted without destroying semantics. Second, sequence modification is performed via backtracking search guided by a Deep Q Network surrogate model trained against a black box target detector; gradients identify vulnerable insertion positions and a reinforcement learning process selects insertion decisions while a heuristic backtracking mechanism enforces behavioural dependencies and feasibility. Third, attack code is automatically transformed back to source code through a transformation based on LLVM IR. The program is compiled with debug information to map system call events to source lines, IR bitcode is edited by inserting a custom external function before identified statements, and the modified IR is recompiled to produce an executable that yields a perturbed system call sequence. The method uses gradient based position selection, but real world feasibility is ensured by dependencies encoded in the directed graphs and by control over execution frequency of inserted calls. The framework is evaluated on two public datasets with real world malware samples and a synthesized dataset, using Keras for modelling and a hardware platform described in the evaluation section.

Key Findings

The proposed constraint aware perturbation approach effectively generates adversarial sequences that evade sequence based detectors while preserving malware functionality, demonstrated by competitive or superior success rates on two datasets compared with baselines.
Across AndroCT and ADFA LD datasets, the approach achieves high attack success with realistic perturbations, reporting a average SR of 67.5% and 59.3% respectively across target models, outpesing baselines that use deletion or replacement of behaviours.
On the AndroCT dataset the method achieves a SR of 64.4% against a CNN based model while the GA baseline attains 58.1%, and on the ADFA LD dataset the SR is 59.1% against an Autoencoder based model with LAM at 47.6%, illustrating practical effectiveness across diverse detectors.
Perturbation rates are modest and comparable to or better than baselines in many cases, with reported PR values such as 18.5% on ADFA LD (GA 20.4%) and 21.1% on AndroCT (LAM 20.3%), indicating the perturbations maintain sequence realism and usability.
Adversarial samples exhibit transferability across classifiers and datasets, and demonstrate robustness against several defenses. Defences such as Adversarial Learning, Sequence Squeezing, and Defense Sequence GAN reduce success rates but our method remains effective, for example achieving 60.6% against Adversarial Learning, 61.2% against Sequence Squeezing, and 62.1% against Defense Sequence GAN.
Qualitative analysis shows inserted benign fragments maintain semantic validity and sequence order, exemplified by a malicious sequence where benign calls such as getuid and lsetxattr are inserted in a coherent manner without breaking functionality, illustrating practical realism for evasion in real systems.
Implementation demonstrations include modifying 80 usable proofs of concept from Exploit-DB to generate executable samples that bypass classifiers, including explicit mappings from inserted system calls to source code locations via LLVM IR and debug information.

Limitations

The approach relies on mining representative benign patterns and maintaining dependency constraints that may need adaptation as malware tactics evolve. While robust to several defenses, performance can vary across detectors and tasks, and continual updates or broader baseline coverage may be required. The study also notes the potential for adversaries to adapt, and suggests defence implications including adversarial aware training and multi level protections. The authors acknowledge the need for further work to generalise across more detection models and to integrate adaptive defence strategies.

Why It Matters

The work demonstrates realistic, constraint aware evasion against sequence based malware detectors using reinforcement learning to craft perturbations that preserve functionality while confusing classifiers. It highlights a practical evasion path via source code transformation, underscoring how defenses reliant on behavior logs can be brittle to sophisticated attacks. The findings support the need for robust adversarial training, constraint aware generation checks, and multi level defence combining code integrity, runtime monitoring, and ensemble or diverse feature detectors to mitigate such evasion strategies. The societal security implications stress strengthening machine learning based security pipelines in critical infrastructure against systematic adversarial exploitation.

Attribution Original paper on arXiv