Source linked

SEVRA-BENCH Exposes How LLM Code Reviewers Fall for Social Engineering

A new benchmark of 1,062 adversarial pull requests shows automated LLM reviewers approve vulnerabilities disguised by social engineering, with closed-source models performing significantly better than open-source...

sevra benchsocial engineeringcode reviewllm securitycvecwe

SEVRA-BENCH drops 1,062 pull requests onto 8 LLM-based code reviewers, each PR quietly reintroducing a real CVE into a project while wrapping the change in social engineering. The results are not comforting.

The Benchmark: 1,062 Malicious PRs from Real CVEs

Each adversarial pull request in SEVRA-BENCH starts with a commit that once fixed a real vulnerability listed in the Common Vulnerabilities and Exposures (CVE) database. The paper's authors automatically invert that fix, restoring the original vulnerable code, and submit it as a new PR to an automated reviewer. The dataset draws from vulnerability fixes across the top 10 entries of the 2025 CWE Top 25, covering common weakness patterns.

15 Social Engineering Framings

A bare revert is too obvious. So the PR text is dressed in one of 15 social-engineering framings: varied claims about what the code does, fabricated supporting evidence, fake urgency, signals of prior approval from other reviewers, and appeals to authority. The reviewer must decide based on both code diff and the narrative. The benchmark tests whether an LLM can spot a malicious change when the author lies about it convincingly.

Closed vs Open Source: A Security Gap

Evaluating 8 current LLMs as code review agents, the authors found a sharp gap in security capabilities between closed- and open-source models. While the exact success rates per model are not published in the abstract, the gap itself is stark: open-source models approved a higher proportion of these adversarial PRs than their closed counterparts. No reviewer caught all 1,062 attacks.

What This Means for Code Review Automation

Automated PR review is already shipping in developer tools. SEVRA-BENCH demonstrates that current LLMs can be reliably manipulated by an attacker who controls both the code and the narrative. Until reviewers are trained to reject social engineering signals and verify code changes independently of PR descriptions, relying on LLMs for merge decisions is a vulnerability in itself.


Source: SEVRA-BENCH: Social Engineering of Vulnerabilities in Review Agents
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.