Source linked

Dynamic MIG Scheduling Beats Static Partitioning by 68% on Energy-Tardiness Tradeoff

Reinforcement learning repartitions NVIDIA MIG slices throughout the day, cutting energy plus tardiness by 68% compared to no partitioning and 31% over static slices.

nvidiamulti instance gpureinforcement learningjob schedulingenergy efficiencydata centers

68% better than not partitioning at all. That's the headline from a new paper that applies reinforcement learning to dynamically repartition NVIDIA's Multi-Instance GPUs (MIGs) for AI/ML workloads, balancing energy consumption against how late jobs finish.

The Cost of Static Slices

NVIDIA's MIG lets you carve a single A100 or H100 into up to seven smaller, isolated GPU instances. Most operators pick a static partition and leave it, or maybe change it twice a day. Both approaches waste energy or hurt latency when workload patterns shift. The authors modeled a single MIG as a heterogeneous machine scheduling problem with preemption, then ran simulations with a diurnal workload trace pulled from real data center logs.

Four Algorithms, One Winner

Four scheduling algorithms were compared on the multi-objective of energy plus tardiness. The promising one was fed into a reinforcement learning agent that decides when and how to repartition the MIG over a full day. Dynamic repartitioning outperformed twice-daily repartitioning by 26%, static partitioning by 31%, and no partitioning at all by 68%. Those are not cherry-picked peaks; they come from a multi-objective function that forces tradeoffs between power draw and deadline misses.

What That Means for Operators

A given MIG configuration is optimal only for a narrow range of queue conditions. The RL policy learns specific preferred slice layouts for different times of day and different job mixes. Instead of manual tuning or fixed schedules, data centers can deploy predictive, automatic reconfiguration that responds to actual load. The work points straight at a production-ready scheduler hook: feed it current queue depth and time of day, let it pick a slice profile, and save power without blowing SLAs. Expect this pattern to show up in Kubernetes device plugins and cluster autoscalers within the next year.


Source: Energy Efficient Scheduling of AI/ML Workloads on Multi Instance GPUs with Dynamic Repartitioning
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.