ItoyoriFBC Gets Promise-Future Model, Scales 15.6x on 16 Nodes

Hierarchical LU factorization hits near-ideal scaling after ItoyoriFBC replaces its static future model with a promise-future synchronization scheme using MPI one-sided communication.

itoyorifbcmpiamt runtimespromise futurehpchierarchical lu factorization

15.6x speedup on 16 nodes for Hierarchical LU factorization — that’s what a promise-future model buys you over a static future-only design in an asynchronous many-task runtime.

Why Static Futures Break Dynamic Algorithms

The ItoyoriFBC AMT runtime originally used a future-only model where each future is bound to its producer at creation time, and the number of tasks that read each future must be fixed at compile time. That’s fine for static DAGs. For algorithms like Hierarchical LU factorization (HLU) that create dependencies on the fly, it’s a straightjacket. You cannot express the adaptive task graph without either over-provisioning or deadlocking.

Promise-Future Model Unlocks HLU

Researchers extended ItoyoriFBC with a promise-future synchronization layer built on MPI one-sided communication. The separation of promise (write-side) and future (read-side) decouples producer binding from task creation. Now a task can propagate a promise handle before the value is computed, and any number of consumers can attach later at runtime. This lifts the compile-time constraint and makes dynamic algorithms expressible without contortions.

15.6x Speedup With Near-Ideal Scaling

Evaluating the new model on HLU across up to 16 nodes, the team observed near-ideal scaling: a 15.6x speedup over the single-node baseline. That’s 97.5% parallel efficiency. The MPI one-sided communication (RMA) keeps synchronization overhead low compared to traditional two-sided messaging, which matters when tasks are fine-grained and dependencies are irregular.

The promise-future variant of ItoyoriFBC is a concrete answer to the question “how do you make AMT runtimes work for non-embarrassingly-parallel workloads on clusters?” Expect this pattern to show up in other runtimes that need to handle dynamic task graphs at scale.

Source: Promise-Future Synchronization for Cluster Asynchronous Many-Task Runtimes via MPI One-Sided Communication
Domain: arxiv.org

ItoyoriFBC Gets Promise-Future Model, Scales 15.6x on 16 Nodes

Why Static Futures Break Dynamic Algorithms

Promise-Future Model Unlocks HLU

15.6x Speedup With Near-Ideal Scaling

More in Systems Engineering