Source linked

Missing Memory-Clock State Triples Edge Inference Miss Rates on Jetson Orin

A deployed DVFS governor on a Jetson Orin NX misses 25-28% of deadlines when blind to the memory clock; an EMC-aware model cuts misses to 1.3% under a 2% QoS budget.

jetson orinnvidiadvfsedge mlgovernormemory clock

A deployed DVFS governor on a Jetson Orin NX misses 25-28% of its inference deadlines because it never checks the memory clock. That blind spot is the entire problem.

The blind spot in CPU-GPU governor models

Frequency-aware latency estimators let deadline-aware governors schedule edge ML inference by modeling latency over CPU and GPU clocks. Clean abstraction, but it misses the memory clock (EMC) -- a deployment state that determines whether a governor meets its deadlines and at what energy. The authors measured this on a real Orin NX: a governor using a pure GPU-only fit, blind to EMC, misses a quarter of cycles at tight deadlines. An EMC-aware refit holds misses to at most 1.3% under a 2% QoS miss budget, and achieves that by picking the energy-minimal clock for periodic vision tasks. That is not a marginal improvement: it is the difference between a usable edge system and one that constantly violates its service-level objectives.

EMC-aware refit cuts misses from 28% to 1.3%

The failure generalizes across three workload classes: MobileNetV2 (classic CNN), a ViT transformer, and Qwen2.5 LLM token decode. In the LLM case, saturated decode makes the EMC-aware policy actually lower-energy than the infeasible blind choice. A CPU-by-GPU estimator sends the deployed governor to an operating point that cannot meet the deadline; only an EMC-aware model identifies the feasible side of the energy frontier. The effect is real and outside the CPUxGPU abstraction: across two Orin SKUs sharing the same lockable EMC points, median latency shifts by up to ~45%, the pattern replicates on both SKUs, and it survives a fused TensorRT fp16 engine. CPUxGPU models do not absorb it. Per-lockable-point EMC tables are needed.

What this means for edge deployment

Clustered misses make aggregate QoS rates understate deployment risk. A governor that hits 98% of deadlines on average might still drop 10 frames in a row when the memory clock is stuck at a low frequency. The authors release their harness so others can reproduce the failure on their own hardware. This work complements, not rebuts, the state of the art within its CPUxGPU scope, but it draws a hard line: if your edge inference governor ignores the memory clock, you are leaving 25% of your deadlines on the table. Fix that abstraction hole before shipping.


Source: Edge-Inference Governors Need Memory-Clock State
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.