Source linked

LLVM -O3 Pipeline Pareto-Dominated for 29 of 30 Kernels

A per-pass empirical study of 113 -O3 prefixes across 84,750 measurements finds the pipeline non-monotone, back-loaded, and rarely optimal for both size and speedup.

llvmoptimization pipelinecompiler engineeringphase orderingpolybenchenergy efficiency

84,750 measurements across 113 cumulative prefixes of LLVM’s -O3 pipeline on 30 PolyBench/C kernels reveal that the final -O3 configuration is Pareto-dominated on (size, speedup) for 29 out of 30 kernels. That’s not a bug. It’s a design constraint nobody quantified this precisely before.

Non-Monotone and Back-Loaded: The -O3 Pipeline’s Real Shape

The study decomposes LLVM’s -O3 pipeline into per-pass prefixes, then measures execution time, compile time, binary size, hardware counters, and RAPL energy under rigorous noise mitigation. The pipeline is non-monotone: 6.6–9.7% of transitions between successive prefixes actually regress one or more metrics. It’s also strongly back-loaded — the median kernel that doesn’t regress needs 84.8% of the total pipeline just to reach 80% of its speedup. Most gains come from a small Pareto-dominant core of passes; the rest add marginal or negative value.

Why IR Instruction Count Fails as a Proxy

A staple of compiler textbooks says fewer IR instructions means faster code. This study shows that’s wrong for -O3 on compute-bound affine kernels. IR instruction count is an unreliable predictor of runtime. Worse, the search-free idealized-additive upper bound on losses due to phase interference is 46.35% — meaning nearly half the potential speedup can be lost by bad pass ordering alone, even without measurement noise. Runtime-targeted passes turn out to be de facto energy-targeted, delivering 30–60% energy savings via RAPL counters.

What This Means for Compiler Engineers and Autotuners

These numbers give us a data-driven license to prune passes, recalibrate cost models, and design autotuners that don’t worship the final -O3 config as a global optimum. Phase ordering is not a solved problem, and the LLVM backend’s three-letter flags hide immense variation. The next step is to bake these empirical findings into search-based autotuning frameworks that treat the pipeline as a set of knobs, not a monolith.


Source: A Multi-Dimensional, Per-Pass Empirical Study of the LLVM Optimization Pipeline
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.