Source linked

Meta's AV1 RTC Rollout: 20% Bitrate Savings, H.264-Like Power on Low-End Devices

engineering.fb.com@wild_heron2 hours ago·Systems Engineering·4 comments

Meta deployed AV1 on the majority of Messenger and WhatsApp mobile devices by pairing a low-complexity encoder with ML-driven device eligibility and adaptive codec switching, delivering noticeable quality gains under...

metaav1real time communicationmessengerwhatsappvideo codec

Meta now runs AV1 on the majority of mobile devices in Messenger and WhatsApp video calls, and the key isn't a magic hardware accelerator - it's a low-complexity encoder that sips power like H.264 while delivering at least 20% bitrate savings.

That 20% reduction comes from offline tests comparing AV1 to H.264 under the same product settings on low-end and mid-range phones. On devices that can tolerate higher encoding complexity, the savings grow even larger. For the real-world networks that RTC lives on - 10 kbps to 400 kbps, often under 100 kbps - that gap is the difference between a blurry mess and a legible face.

Meta's own demo shows H.264 at 100 kbps looking noticeably blurry while AV1 at the same bitrate stays clear. That matters when every bit counts.

The Low-Complexity Encoder That Makes AV1 Possible on Low-End Phones

An open-source AV1 encoder spike power consumption by 14% on a Pixel 8 during a video call compared to H.264. That's a non-starter for mobile. Meta adopted an internal low-complexity encoder with an ultra-low-complexity preset that matches H.264 baseline power draw.

This isn't a trivial preset change. The encoder offers multiple presets ranging from high to low complexity, and the ultra-low preset was designed from scratch to keep compression efficiency gain while dropping computational load. Devices adjust which preset they use on the fly based on real-time encoding latency monitoring.

Binary size is another pain point. AV1 support from libAOM adds 1.7 MB (600 kB compressed) to the app. Meta shrank that by optimizing the quantization matrix tool (10% of encoder library size) and removing unused features like QM entirely, freeing 60 kB. They also share codec libraries across features like video message transcoding.

ML-Based Device Eligibility: Why Hardware Specs Weren't Enough

Choosing which Android devices can run AV1 turned out to be harder than the codec itself. Initial attempts based on memory, release year, and OS version failed reliably. Meta built an ML-based device eligibility framework that ingests real-world performance statistics from their logging pipeline.

The model outputs an rtc_score that quantifies a device's AV1 capability. Iteration told the story: Model V1.1 rolled out in August 2025 broadened AV1 traffic, which generated a richer dataset for Model V2. That model introduced a two-tier approach separating higher-end and lower-end devices.

Even a 2023 octa-core smartphone struggled to encode at 320x180@15fps during calls - likely due to CPU throttling. So Meta added adaptive encoder preset adjustment, encoding latency-aware codec switching, and peer decoding latency monitoring. If a low-end phone can't encode AV1 in real time, it sends H.264 but can still receive AV1 from a high-end peer. This asymmetric design sharply increased AV1 coverage.

Adaptive Codec Switching and Error Resilience for Real Networks

Rate control gets much harder when target bitrate and resolution change multiple times per call. Meta uses Video Buffering Verifier (VBV) delay as a metric, keeping it below 200 ms. They tuned the encoder to avoid both overshoot (which causes congestion) and undershoot (which misleads bandwidth estimation and stalls quality ramp-up).

AV1's Reference Picture Resampling (RPR) lets the encoder change resolution without sending a key frame, reducing bitrate spikes and video freezes. That's a concrete win over H.264.

For packet loss, Meta enables temporal layers adaptively - turning them on when loss rises, off when it doesn't - and uses Long-Term Reference (LTR) frames. LTR allows the decoder to resync from a previously acknowledged reference frame without forcing a full key frame. They piggyback LTR feedback on proprietary RTP header extensions, coordinating encoder and network layer tightly.

Proactive LTR requests when the sender detects packet loss reduce freezes without waiting for the receiver to complain. The result: recovery from packet loss without the key-frame spike that often causes a cascade of congestion.

With software AV1 now covering most mobile devices, Meta is pushing hardware vendors to invest in AV1 silicon across all device tiers to unlock group calls (which require decoding multiple streams) and further power savings. The bitrate math is settled - now it's about the silicon.


Source: Adopting AV1 for Real-Time Communication (RTC) at Scale
Domain: engineering.fb.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.