Source linked

Mozilla's Perfherder يغادر 6.8٪ من الركود، CPD Ensembles تصحيح ذلك

وتظهر تقييم 25 طريقة لتحديد نقطة التغيير على البيانات الإنتاجية من موزيلا أن التصويت الجماعي يقلل من الإيجابية الخاطئة وتجديد الإيجابيات المفقودة، مما يزيد سعر F1 بنسبة 11٪.

mozillaperfherderchange point detectionperformance engineeringf1 scoreci cd

Mozilla's Perfherder, a Student's T-test-based system, flags 12.5% false positive alert groups and misses 6.8% of real regressions across hundreds of daily code changes. That's not noise--that's a concrete failure mode for any team shipping performance-sensitive software at scale.

Eleven Mozilla performance engineers spent time annotating 174 performance time series to build a ground-truth dataset--one of the first practitioner-labeled CPD benchmarks in performance engineering. The authors then evaluated 25 change-point detection methods and 15 ensemble approaches as alternatives to Mozilla's current method.

12.5% False Alarms, 6.8% Missed Regressions

The size of the problem is clear from the raw numbers. Perfherder's T-test approach is simple, but it trades precision for recall in a way that frustrates engineers. False positives waste investigation time; missed regressions let breakages slip into production. Both erode trust in automated CI gates.

Ensemble Voting Beats the Trade-Off

Offline and hybrid CPD methods improved recall but tanked precision. Ensemble voting strategies, by contrast, softened that trade-off. The best ensembles delivered an 11% improvement in F1-score over Mozilla's baseline. That's not a theoretical win--it's a measurable reduction in both wasted alerts and missed regressions.

What Integration Taught Mozilla's Engineers

The paper doesn't stop at benchmark numbers. The authors validated results through a practitioner survey and documented real integration lessons from deploying the top methods into Mozilla's performance engineering system. The takeaway: no single CPD method works well enough alone, but a carefully chosen ensemble does.

Mozilla now has a validated deployment path for hybrid CPD ensembles in their CI pipeline, giving engineers a tool that catches more regressions without overwhelming them with noise.


Source: Exploring Statistical Change Point Detection Techniques for Performance Anomaly Detection at Mozilla
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.