15 of 21 RL Vulnerability Studies Still Just Fuzzing - Only One Localizes Bugs

Fifteen out of 21 reinforcement learning papers on C/C++ vulnerability analysis are still just doing fuzzing or guided exploration. Only three papers focus on direct vulnerability detection, and exactly one—one—tackles statement-level localization. That's the cold reality from a new systematic review covering studies published between 2015 and 2026.

The Field's Narrow Shoulders

Following PRISMA 2020 guidelines, the reviewers combed major databases and identified 21 primary studies. The breakdown is stark: 15 studies on fuzzing and program exploration, 3 on direct vulnerability detection, and just 1 on pinpointing the exact statement where a vulnerability lives. Most of these RL agents treat code as a black box—they generate test inputs or schedule mutations, but they rarely look at the source structure itself.

The review specifically targets C/C++ because manual memory management and code complexity make static analysis brittle. Traditional tools drown in false positives; the promise of RL is that it could learn to navigate the code's semantics. But the field hasn't delivered.

The Missing Graph

Here's the part that should make any static-analysis engineer sit up: Control Flow Graphs (CFGs) and Abstract Syntax Trees (ASTs) are almost never used as agent states. The reviewers note that statically extracted structural representations are “rarely used” in these RL formulations. That means agents are learning vulnerability patterns without the topological information that human analysts and classic tools rely on.

Agents that do use code—like the three detection papers and the one localization paper—tend to flatten the source into token sequences or embeddings. They skip the graph structure that encodes reachability, loop bounds, and data flow. The review explicitly identifies this as a key research gap: “the absence of RL agents that use source-code CFGs as states to detect and localize vulnerable nodes.”

What a CFG-Aware Agent Could Unlock

If you've ever debugged a use-after-free or a buffer overflow in C, you know the bug lives in the control flow—where a pointer escapes scope or a check is missing. A CFG-aware RL agent could treat each basic block as a state, learn which paths lead to unsafe operations, and then backtrack to the vulnerable node. That's exactly what no one has built yet.

The review doesn't propose the solution; it just maps the terrain and points at the empty space. For anyone building next-gen static analysis tools, that empty space is the opportunity. Time to teach an agent to read a CFG like we do.

Source: Reinforcement Learning for Software Vulnerability Analysis: A Systematic Review with Emphasis on C/C++ Source Code and Static Analysis
Domain: arxiv.org

15 of 21 RL Vulnerability Studies Still Just Fuzzing - Only One Localizes Bugs

The Field's Narrow Shoulders

The Missing Graph

What a CFG-Aware Agent Could Unlock

More in Cybersecurity