Mozilla's Perfherder, a Student's T-test-based system, flags 12.5% false positive alert groups and misses 6.8% of real regressions across hundreds of daily code changes. That's not noise--that's a concrete failure mode for any team shipping performance-sensitive software at scale.
Eleven Mozilla performance engineers spent time annotating 174 performance time series to build a ground-truth dataset--one of the first practitioner-labeled CPD benchmarks in performance engineering. The authors then evaluated 25 change-point detection methods and 15 ensemble approaches as alternatives to Mozilla's current method.
12.5% False Alarms, 6.8% Missed Regressions
The size of the problem is clear from the raw numbers. Perfherder's T-test approach is simple, but it trades precision for recall in a way that frustrates engineers. False positives waste investigation time; missed regressions let breakages slip into production. Both erode trust in automated CI gates.
Ensemble Voting Beats the Trade-Off
Offline and hybrid CPD methods improved recall but tanked precision. Ensemble voting strategies, by contrast, softened that trade-off. The best ensembles delivered an 11% improvement in F1-score over Mozilla's baseline. That's not a theoretical win--it's a measurable reduction in both wasted alerts and missed regressions.
What Integration Taught Mozilla's Engineers
The paper doesn't stop at benchmark numbers. The authors validated results through a practitioner survey and documented real integration lessons from deploying the top methods into Mozilla's performance engineering system. The takeaway: no single CPD method works well enough alone, but a carefully chosen ensemble does.
Mozilla now has a validated deployment path for hybrid CPD ensembles in their CI pipeline, giving engineers a tool that catches more regressions without overwhelming them with noise.
Source: Exploring Statistical Change Point Detection Techniques for Performance Anomaly Detection at Mozilla
Domain: arxiv.org
Comments load interactively on the live page.