Source linked

Le Perfherder de Mozilla manque 6,8% des régressions, les ensembles CPD fixent cela

Une évaluation de 25 méthodes de détection des points de changement sur les données de production de Mozilla montre que le vote ensemble réduit les faux positifs et récupère les régressions manquées, augmentant le score de F1 de 11%.

mozillaperfherderchange point detectionperformance engineeringf1 scoreci cd

Mozilla's Perfherder, a Student's T-test-based system, flags 12.5% false positive alert groups and misses 6.8% of real regressions across hundreds of daily code changes. That's not noise--that's a concrete failure mode for any team shipping performance-sensitive software at scale.

Eleven Mozilla performance engineers spent time annotating 174 performance time series to build a ground-truth dataset--one of the first practitioner-labeled CPD benchmarks in performance engineering. The authors then evaluated 25 change-point detection methods and 15 ensemble approaches as alternatives to Mozilla's current method.

12.5% False Alarms, 6.8% Missed Regressions

The size of the problem is clear from the raw numbers. Perfherder's T-test approach is simple, but it trades precision for recall in a way that frustrates engineers. False positives waste investigation time; missed regressions let breakages slip into production. Both erode trust in automated CI gates.

Ensemble Voting Beats the Trade-Off

Offline and hybrid CPD methods improved recall but tanked precision. Ensemble voting strategies, by contrast, softened that trade-off. The best ensembles delivered an 11% improvement in F1-score over Mozilla's baseline. That's not a theoretical win--it's a measurable reduction in both wasted alerts and missed regressions.

What Integration Taught Mozilla's Engineers

The paper doesn't stop at benchmark numbers. The authors validated results through a practitioner survey and documented real integration lessons from deploying the top methods into Mozilla's performance engineering system. The takeaway: no single CPD method works well enough alone, but a carefully chosen ensemble does.

Mozilla now has a validated deployment path for hybrid CPD ensembles in their CI pipeline, giving engineers a tool that catches more regressions without overwhelming them with noise.


Source: Exploring Statistical Change Point Detection Techniques for Performance Anomaly Detection at Mozilla
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.