Isolating File Localization Lifts LLM Repair Rates 8 Points on SWE-bench

Adding explicit file-level issue localization to a repository-level LLM repair pipeline lifts the resolved rate by nearly 8 percentage points on SWE-bench Verified – from 44.7% to 52.4% – and shaves more than 150 seconds off the average repair time. That’s the headline from Loc2Repair, a modular evaluation framework described in a new preprint that finally isolates localization from patch synthesis as separate failure modes.

Loc2Repair Decouples Localization From Repair

Most end-to-end repair benchmarks mask where the pipeline fails: poor file targeting, bad patches, or failed debugging. Loc2Repair solves that by splitting the process. Under a shared runtime, artifact schema, and evaluation harness, researchers can swap in different localization models and repair backbones – then measure each component’s contribution. On SWE-bench Verified, three repair backbones were tested across four conditions: no explicit localization, localization from two different predictors, and gold-standard (ground-truth) file sets.

Pooled performance tells a clean story. Baseline repair without explicit localization resolves 44.7% of issues. Adding predicted localization pushes that to 48.9% and 49.1% – a consistent 4–5 point gain. Gold-standard localization hits 52.4%, an 8-point lift. Localization isn’t just a bonus; it’s a lever that works across backbones.

Localization Cuts Time, But Not Uniformly

Latency also improves. Pooled mean elapsed time drops by 100.94 seconds with one predictor and 52.25 seconds with the other. Gold guidance cuts 154.45 seconds. That’s real: if you’re running thousands of repair attempts, saving two and a half minutes per task adds up fast.

But the authors note token effects vary across models – not every backbone benefits equally in speed. And even with gold localization, failures persist, which means the remaining headroom lives elsewhere: patch synthesis and iterative debugging.

What This Enables

Loc2Repair gives the field a standard way to blame the right component. Instead of vague “LLM repair works/doesn’t work” claims, teams can now pinpoint whether their bottleneck is finding the right file or fixing the line. For anyone building a repo-level repair system, this framework is the difference between guessing and measuring.

The next step is obvious: pour effort into localization models that approach gold-level accuracy, then attack the leftover gap in patch generation.

Source: Loc2Repair: A Framework for Evaluating the Impact of File-Level Issue Localization in Repo-Level LLM Repair
Domain: arxiv.org

Isolating File Localization Lifts LLM Repair Rates 8 Points on SWE-bench

Loc2Repair Decouples Localization From Repair

Localization Cuts Time, But Not Uniformly

What This Enables

More in Artificial Intelligence