Weak-Skipping Router Cuts Edge Latency 30% While Beating Cloud Model Accuracy

A new routing estimator for edge-cloud inference runs at just 0.153 GFLOPs — that's 29 times lighter than the weak detector it replaces. And it can be so effective that, at some operating points, the combined system actually beats the strong cloud model's peak mAP by 1.7 percentage points while using far less compute.

The Wasteful Default in Edge-Cloud Pipelines

Every edge-cloud collaboration I've seen places the routing estimator after the weak detector. The weak model runs — burning latency and power — then the router decides whether to send the frame to the cloud. If the answer is yes, that weak forward pass was wasted. The authors call this "weak-conditioned" placement, and it's the status quo.

When the offload budget is high (most frames go to the cloud), that waste really hurts. The team measured per-frame latency penalties of up to 19.1 ms, roughly 30% extra at an offload rate of rho=0.9 on PASCAL VOC. That's a lot of milliseconds you could reclaim.

A 29x Lighter Router That Actually Works Better

The fix is brutally simple: skip the weak detector entirely on frames that will end up in the cloud. The authors propose a "weak-skipping" estimator that routes directly from raw pixels. At 0.153 GFLOPs, it's 29 times cheaper than the 4.49 GFLOP weak detector. And it outperforms the after-weak baselines across most of the operating curve.

But here's the subtle twist — neither weak-skipping nor weak-conditioned placement dominates everywhere. At low offload budgets (most frames stay local), the weak-conditioned approach holds an edge. At high budgets, skipping wins. The solution: budget-adaptive routing, which selects between the two placements using two offline-tuned thresholds.

Budget-Adaptive Switching Beats Both Fixed Strategies

On PASCAL VOC, the budget-adaptive router traces the upper accuracy envelope of both fixed placements across the full operating range. That means you get the best of both worlds without tuning per deployment. And the compute savings are real: latency drops by up to 19.1 ms (30% at rho=0.9).

At some operating points, the system is surprisingly stronger than the strong model itself — +1.7 pp over the strong model's peak mAP, with far less compute. That's a concrete consequence: you can deploy a lighter edge model that, thanks to smart routing, sometimes outperforms a heavier cloud-only model.

The full implementation and artifacts are on GitHub for anyone who wants to run the numbers. This isn't a theoretical toy — it's a practical knob for squeezing latency out of every inference dollar.

Source: Budget-Adaptive Routing: Skipping the Weak When the Strong Answers Anyway
Domain: arxiv.org

Weak-Skipping Router Cuts Edge Latency 30% While Beating Cloud Model Accuracy

The Wasteful Default in Edge-Cloud Pipelines

A 29x Lighter Router That Actually Works Better

Budget-Adaptive Switching Beats Both Fixed Strategies

More in Systems Engineering