RAMS динамически переключает YOLOv8 Tiers, чтобы сократить задержку в 5,6 раза на встроенном крае

RAMS cuts inference latency on the Jetson Orin TensorRT from a fixed 19 ms down to 3.41 ms under heavy load, and it does that by swapping YOLOv8 model tiers on the fly without reloading weights. 5.6x faster, and it still keeps 74% of the proxy accuracy.

Why Model Switching on Edge Hardware is Hard

Edge object detection lives under constant resource pressure. CPU contention, memory bandwidth fights, thermal throttling. You can run a lightweight model fast, but you miss vulnerable road users (VRUs). Run a heavy model and you take 19 ms per frame, which means dropping frames when the system is busy. RAMS (Resource-Adaptive and Detection-Conditioned Model Switching) tackles this with a runtime controller that reads device pressure, calibrates switching thresholds during idle periods, and flips between three YOLOv8 tiers (NANO at 320 px, SMALL at 416 px, MEDIUM at 640 px) with zero latency penalty for the switch itself.

Safety-Conditioned Policies Beat Fixed Thresholds

RAMS defines five switching policies, but the clever ones are detection-conditioned. After the system detects a VRU, it refuses to downgrade to the smallest tier for a window. That simple rule prevents the kind of accuracy collapse you get from a pure threshold-based policy when resources get tight. Across Raspberry Pi 5, x86 laptops, and Jetson Orin (both ONNX and TensorRT), the same controller equations work over a 37x latency range. On the Jetson Orin TensorRT under heavy load, the safety2 policy runs at 3.41 ms mean latency, bumping up to SMALL or MEDIUM only when VRUs are present.

The Metric That Separates Strategy From Detector Luck

The authors also introduce the VRU-Weighted Accuracy Score (SWAS), a scalar metric built to compare switching policies without needing ground truth labels. SWAS comes in two flavors: an oracle-bounded version and a detector-derived version. The gap between them reveals how much of the apparent benefit comes from genuine tier selection versus the detector's own circularity. Under heavy load, detection-conditioned switching improves SWAS by 25.4% (oracle) and 47.3% (detector-derived) relative to threshold-only policies. Live KITTI evaluation shows per-tier VRU recall at 24.2%, 41.2%, and 59.0%, which means even reactive overrides are fundamentally capped by what the baseline detector can actually see.

Reactive overrides are bounded by baseline recall. RAMS shows that intelligent tier selection on the edge is less about chasing every frame and more about knowing when to hold onto the heavy model.

Source: RAMS: Resource-Adaptive and Detection-Conditioned Model Switching for Embedded Edge Perception
Domain: arxiv.org

RAMS динамически переключает YOLOv8 Tiers, чтобы сократить задержку в 5,6 раза на встроенном крае

Why Model Switching on Edge Hardware is Hard

Safety-Conditioned Policies Beat Fixed Thresholds

The Metric That Separates Strategy From Detector Luck

More in Artificial Intelligence