Source linked

Developers Felt 20% Faster With AI, But Were Actually 19% Slower

intrepidkarthi.com@systems_wire3 hours ago·Developer Tools·2 comments

A controlled trial of 16 experienced devs on 246 tasks shows self-reported productivity gains flip negative when measured against the clock-the gauge is actively misleading.

metrfaros aigitcleardoraai assisted developmentdeveloper productivity

Experienced developers in a controlled trial felt roughly 20% faster with AI tools, but the stopwatch showed them running 19% slower—a nearly 40-point inversion between feeling and fact.

The study that broke the productivity gauge

METR ran a randomized controlled trial on 16 experienced open-source developers working in codebases they knew well, using current frontier AI tools. Before the tasks, developers expected a speedup. Afterward, they reported feeling about 20% faster. The clock said they were 19% slower. The people most confident the tool was helping were the ones it was measurably slowing down.

The authors are careful: 16 developers over 246 tasks doesn't prove AI slows everyone. The effect flips positive for juniors and greenfield work. But the self-report-versus-stopwatch inversion is the part that should keep every engineering leader up at night.

What the larger datasets show

Faros AI looked across more than 10,000 developers: pull requests merged up 98%, PR size up over 150%, review time up 91%, for roughly zero net change in delivery. 31% of PRs merged with no review at all. DORA found higher AI adoption associated with a measurable drop in delivery stability, with damage persisting into this year. GitClear, reading 200 million changed lines, found copy-pasted code rising, code churn rising, and refactoring collapsing to under 10% of changes—2024 being the first year developers pasted more code than they reorganized.

The pattern is identical across every dataset: more generated, more merged, more churned. Same amount delivered, shakier when it lands.

The real bottleneck is verification, not generation

Generation got cheap. Verification got expensive. We removed the old bottleneck and shipped the work straight into a new one: review. The volume exploded at the exact stage we didn't re-staff, and the dashboards we trust can't see the cost because it lands downstream in incidents and churn and reviewer burnout, on a different page from the velocity chart everyone is cheering.

The tooling market already conceded this point. Windsurf—the agent-first IDE—was acquired by the maker of Devin after Google stripped its founders and core researchers into DeepMind. The most aggressive bet in the tooling space is a bet that the job is now verification: sitting at a dashboard and reviewing what agents produced.

This is likely the dip in a J-curve, not the destination. New tools cost before they pay. Juniors and new code (growing share of output) show positive effects. DORA's throughput has started recovering even as stability lags. But the discipline for anyone running a team is simple: stop steering by how fast it feels. The feeling is the one number we now know reads backward. Measure what reaches production and stays standing. Re-staff the stage where the work piles up. Treat any productivity claim that lives in a feeling as unproven until the stopwatch agrees.


Source: The gauge broke: devs felt 20% faster with AI, measured 19% slower
Domain: intrepidkarthi.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.