Source linked

Axon Superoptimizer Automates Tiling, Fusion, and Kernel Selection for Tensor Programs

Axon discovers and applies algebraic transformations without hand-crafted rewrite rules, using SMT over unbounded tensors to guarantee correctness.

axonprogram synthesissuperoptimizationtensor programsai acceleratorssmt

Writing a high-performance kernel for an AI accelerator normally demands deep expertise in tiling, instruction selection, data layout, and operator fusion. Axon automates all of it using program synthesis and SMT-based verification, turning kernel authoring into a specification problem.

How Axon Works: Synthesis, SMT, and Algebraic Transformations

Axon takes a tensor program and a target ISA description. It generates target instructions directly from semantics specifications using program synthesis, no hand-written rewrite rules needed. Algebraic transformations are discovered by propagating operators through the computation graph, and an SMT solver over unbounded tensors checks that every transformation preserves semantics. This approach covers transformations that would be tedious to encode manually and catches equivalence bugs automatically.

From Tensor Ops to Target ISA: Tiling, Fusion, and Empirical Search

Once the semantics are locked down, Axon lowers tensor operations to the target accelerator's instruction set. It explores tiling configurations constrained by hardware descriptions (memory size, compute units, data paths) and fuses both operators and instructions to minimize memory traffic. The synthesizer searches the space of equivalent program variants and empirically selects the fastest one on real hardware.

Why This Matters for AI Accelerator Programming

Current practice forces human experts to juggle dozens of interdependent knobs. Axon replaces that manual search with an automated pipeline that guarantees correctness and surfaces the best-performing variant for a given accelerator. The paper's focus on tile-based accelerators is pragmatic: these architectures dominate edge and datacenter inference, and their programming difficulty has been a bottleneck. Axon transforms the kernel author's role from manual optimization to specifying semantics, letting the synthesizer find the optimal variant.


Source: Axon: A Synthesizing Superoptimizer for Tensor Programs
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.