Source linked

LLVM-O3-Pipeline Pareto-dominiert für 29 von 30 Kerneln

Eine empirische Studie von 113 -O3-Präfixen über 84.750 Messungen fand die Pipeline nicht monoton, zurückgeladen und selten optimal für sowohl Größe als auch Beschleunigung.

llvmoptimization pipelinecompiler engineeringphase orderingpolybenchenergy efficiency

84,750 measurements across 113 cumulative prefixes of LLVM’s -O3 pipeline on 30 PolyBench/C kernels reveal that the final -O3 configuration is Pareto-dominated on (size, speedup) for 29 out of 30 kernels. That’s not a bug. It’s a design constraint nobody quantified this precisely before.

Non-Monotone and Back-Loaded: The -O3 Pipeline’s Real Shape

The study decomposes LLVM’s -O3 pipeline into per-pass prefixes, then measures execution time, compile time, binary size, hardware counters, and RAPL energy under rigorous noise mitigation. The pipeline is non-monotone: 6.6–9.7% of transitions between successive prefixes actually regress one or more metrics. It’s also strongly back-loaded — the median kernel that doesn’t regress needs 84.8% of the total pipeline just to reach 80% of its speedup. Most gains come from a small Pareto-dominant core of passes; the rest add marginal or negative value.

Why IR Instruction Count Fails as a Proxy

A staple of compiler textbooks says fewer IR instructions means faster code. This study shows that’s wrong for -O3 on compute-bound affine kernels. IR instruction count is an unreliable predictor of runtime. Worse, the search-free idealized-additive upper bound on losses due to phase interference is 46.35% — meaning nearly half the potential speedup can be lost by bad pass ordering alone, even without measurement noise. Runtime-targeted passes turn out to be de facto energy-targeted, delivering 30–60% energy savings via RAPL counters.

What This Means for Compiler Engineers and Autotuners

These numbers give us a data-driven license to prune passes, recalibrate cost models, and design autotuners that don’t worship the final -O3 config as a global optimum. Phase ordering is not a solved problem, and the LLVM backend’s three-letter flags hide immense variation. The next step is to bake these empirical findings into search-based autotuning frameworks that treat the pipeline as a set of knobs, not a monolith.


Source: A Multi-Dimensional, Per-Pass Empirical Study of the LLVM Optimization Pipeline
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.