Source linked

Datadog Processes Millions of Weekly Android Profiles via ProfilingManager API

android-developers.googleblog.com@systems_wire1 hour ago·Developer Tools·2 comments

Android 15's system-level ProfilingManager API lets Datadog collect call stacks and heap dumps from production devices, exposing root causes like a background service stealing the big CPU core during cold start.

datadogandroidprofilingmanagerandroid 15perfettomobile performance

Millions of production profile traces per week, according to Datadog’s internal June 2026 data, and the key enabler is a system API most Android developers haven’t touched yet: ProfilingManager. Introduced in Android 15, this service lets apps programmatically capture call stack samples, field traces, and memory heap dumps directly from production environments — no debug builds, no manual reproduction loops.

The hidden scheduler fight that only field traces caught

A Google communications app was seeing slower cold start times on newer, more powerful hardware. That alone should have been a red flag. The engineer pulled field-collected traces, compared across device types, and found the culprit: a background text-to-speech service was being prewarmed during app startup, monopolizing the device’s highest-performing big CPU core. The app’s main thread slept while the prewarm ran. Without system-level trace data from production, that scheduling conflict would have stayed invisible.

ProfilingManager doesn’t just capture stack samples. It supports CPU traces, Java heap dumps, native heap profiles, and outputs to Perfetto trace files. Datadog uploads these traces to its backend, visualizes them alongside existing RUM telemetry, and gives engineers a direct link from “ANR rate spiked” to “thread stuck waiting on this mutex.”

Why Datadog chose system-level over homegrown profiling

Before ProfilingManager, Datadog’s Real User Monitoring tracked high-level signals: time to initial display, ANR rates, CPU load, frozen frames. That told you what was slow, but not why. Engineering teams could see the symptom but had no code-level root cause. Datadog evaluated writing their own trace processor using Android Debug APIs, but ProfilingManager won because it offloads sampling decisions to the OS and imposes the lowest runtime overhead of any option tested.

The API also enforces stability safeguards. Rate limiting is built in — Datadog doesn’t have to guess safe sampling frequencies. On-device filtering strips out data from other processes before the profile reaches the app, keeping file sizes small and privacy boundaries intact. The result: proactive trace snapshots triggered on system events like APP_FULLY_DRAWN, with planned expansion to ANR, OOM, and COLD_START signals.

From trace firehose to autonomous remediation

Processing millions of weekly profiles required Datadog to build a server-side pipeline that can parse and analyze high-fidelity Perfetto traces at scale. The team has been profiling Datadog’s own Android app and early adopters for months, using the data to refine detection algorithms before general availability.

Here’s the part that makes this more than a debug tool: Datadog explicitly states they aim to make Android profiling data a first-class input for coding agents. That means autonomous agents could eventually resolve performance regressions without human triage. For now, developers get a Perfetto trace link in their RUM session, showing exactly what happened in the milliseconds before an ANR. That’s a lot closer to “fix this line” than “repro on your local device and hope.”


Source: Datadog delivers millions of in-depth performance insights with ProfilingManager
Domain: android-developers.googleblog.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.