Source linked

Terastal Slashes DNN Deadline пропускает 40% с использованием вариантов настраиваемых слоев

Новая рамка расписания снижает коэффициент пропущенности сроков на модель до 40,58% на гетерогенных ускорителях DNN, при этом приносит только 2,24% точности.

terastalheterogeneous acceleratorsreal time schedulingmulti dnnlayer variants

A 40.58% reduction in deadline miss rate per model—that’s what Terastal delivers over FCFS scheduling on heterogeneous DNN accelerators, and it costs just 2.24% average accuracy.

Why Layer Gaps Kill Real-Time Scheduling

Heterogeneous DNN accelerators (GPU, NPU, TPU, etc.) let you map each layer to its preferred accelerator, cutting latency. But when workloads skew—one accelerator runs hot while another sits idle—the latency difference between accelerators for the same layer becomes a bottleneck. Large gaps limit scheduling flexibility: you either wait for the fast accelerator or take a latency hit on the slow one. More deadline misses follow.

Terastal’s authors recognized this and attacked the root cause: the layer itself.

Terastal's Two-Pronged Attack: Variants and Budgets

Instead of treating layers as fixed, Terastal introduces layer variants—customized implementations of a layer tuned to run acceptably well on non-preferred accelerators. A variant might trade a smidge of accuracy for a much smaller latency gap. Offline, Terastal assigns virtual budgets per accelerator using heterogeneity-aware analysis, then designs variants that stay within those budgets. Online, it jointly schedules accelerator mapping and variant selection under timing constraints, picking the variant that meets deadlines with minimal accuracy loss.

The Numbers: 40% Fewer Misses for 2% Accuracy

Compared to three baselines—FCFS, EDF, and DREAM—Terastal reduces deadline miss rate per model by 40.58%, 30.53%, and 36.27% respectively. The accuracy penalty across all models using variants averages only 2.24%. That's a trade I'd take every time for a soft real-time system where a missed deadline means a dropped frame or a stalled inference pipeline.

This isn't a theoretical toy. The framework couples offline design with online scheduling in a way that directly addresses the pain of real multi-tenancy on heterogeneous hardware. Next time you're staring at underutilized NPUs while your GPU queues back up, remember: the problem might not be your scheduler—it might be your layers.


Source: Terastal: Layer-Variant-based Scheduling for Real-Time Multi-DNN Workloads on Heterogeneous Accelerators
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.