Source linked

Локализация ошибок в одном токене: новый LLM сокращает вывод по величине

Снижая обнаружение ошибок до одного генерируемого токена на файл, MLC соответствует агентической точности на Defects4J и PypiBugs при сокращении времени вывода по порядкам величины.

multi task llmbug localizationmlcdefects4jpypibugscode generation

Line-level bug localization just got a radical efficiency boost: a new model called MLC detects bugs in full-file context using a single generated token, cutting inference time by orders of magnitude compared to agentic approaches that need minutes of reasoning and thousands of tokens per file.

Existing bug localization techniques fall into two camps. Agentic methods are accurate but painfully slow—requiring multiple minutes of reasoning per file via iterative prompting or tool use. Lightweight classifiers are fast but coarse, operating at function level or struggling with limited context windows. MLC breaks that tradeoff.

Token Alignment Solves a Fundamental Parser Mismatch

Previous line-level bug classifiers suffered from tokenization mismatches: the model's tokenizer splits lines differently than the source code's line boundaries, making precise classification brittle. MLC introduces a token alignment algorithm that maps model tokens back to source lines cleanly, enabling per-line predictions without requiring custom tokenization or sacrificing context. The authors show this alignment is critical for achieving line-level granularity with standard LLM tokenizers.

Single-Token Inference Makes Line-Level Bug Detection Practical

MLC is a lightweight multi-task LLM trained with an optimized recipe for multi-line prediction. The key architectural trick: auxiliary decoding heads that let the model output bug classification for every line in the file with a single forward pass—literally one generated token per file. On the Defects4J and PypiBugs benchmarks, MLC achieves performance comparable to agentic systems that generate hundreds or thousands of tokens per file. The latency drop is not incremental; it's orders of magnitude. Agentic techniques require minutes; MLC finishes in seconds or less.

Generalization Beyond Training Benchmarks

The authors don't just claim speed on standard Java and Python bug datasets. They introduce a small out-of-domain evaluation dataset in Python and show MLC generalizes without retraining. That suggests the token alignment and multi-task training approach captures transferable patterns for bug localization, not just dataset-specific artifacts. The model processes full-file context end-to-end, so it sees the entire function, class, and imports at once—no sliding window tricks.

If the team follows through on open-sourcing the code, models, and datasets as promised, expect line-level bug localization to become a standard CI/CD gate rather than a luxury for teams with deep inference budgets. MLC proves you don't need thousands of tokens per file to find bugs accurately—just one.


Source: Multi-task LLMs for Bug Classification: Efficient Inference with Auxiliary Decoding Heads
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.