Mozilla's Perfherder Misses 6.8% of Regressions, CPD Ensembles Fix That

Mozilla's Perfherder, a Student's T-test-based system, flags 12.5% false positive alert groups and misses 6.8% of real regressions across hundreds of daily code changes. That's not noise--that's a concrete failure mode for any team shipping performance-sensitive software at scale.

Eleven Mozilla performance engineers spent time annotating 174 performance time series to build a ground-truth dataset--one of the first practitioner-labeled CPD benchmarks in performance engineering. The authors then evaluated 25 change-point detection methods and 15 ensemble approaches as alternatives to Mozilla's current method.

12.5% False Alarms, 6.8% Missed Regressions

The size of the problem is clear from the raw numbers. Perfherder's T-test approach is simple, but it trades precision for recall in a way that frustrates engineers. False positives waste investigation time; missed regressions let breakages slip into production. Both erode trust in automated CI gates.

Ensemble Voting Beats the Trade-Off

Offline and hybrid CPD methods improved recall but tanked precision. Ensemble voting strategies, by contrast, softened that trade-off. The best ensembles delivered an 11% improvement in F1-score over Mozilla's baseline. That's not a theoretical win--it's a measurable reduction in both wasted alerts and missed regressions.

What Integration Taught Mozilla's Engineers

The paper doesn't stop at benchmark numbers. The authors validated results through a practitioner survey and documented real integration lessons from deploying the top methods into Mozilla's performance engineering system. The takeaway: no single CPD method works well enough alone, but a carefully chosen ensemble does.

Mozilla now has a validated deployment path for hybrid CPD ensembles in their CI pipeline, giving engineers a tool that catches more regressions without overwhelming them with noise.

Source: Exploring Statistical Change Point Detection Techniques for Performance Anomaly Detection at Mozilla
Domain: arxiv.org

Mozilla's Perfherder Misses 6.8% of Regressions, CPD Ensembles Fix That

12.5% False Alarms, 6.8% Missed Regressions

Ensemble Voting Beats the Trade-Off

What Integration Taught Mozilla's Engineers

More in Systems Engineering