Source linked

MADAR Processor Abolishes Addresses, Uses Orbiting Rings for Data Flow

MADAR removes register file and cache addressing entirely with a compile-time scheduled ring hierarchy that keeps per-operation energy flat as AI matrix multiply scales.

madarprocessor architectureaddress freecompile time schedulingai accelerationenergy efficiency

Over half a modern processor's area and energy go to addressing -- moving operands between register files and caches, running tags, ports, miss queues, and bypass networks just to find where a value was left. MADAR says toss all that machinery. Abolish the address entirely.

All State Circulates in Rings; Instructions Ride the Same Slots

MADAR's core idea: every piece of state -- instructions, operands, results -- rides in rings of slots that advance one position per clock. A value is named by its place in an orbit, a (ring, period) coordinate, not by an address. A fixed computation station sits beside each ring; an instruction computes when it sweeps past its operands on a schedule baked at compile time. No tag compares, no register-index decode, no bypass network. The ring period determines how often a slot comes back around, forming a hierarchy that replaces the cache hierarchy: shorter-period rings act like L1, longer-period rings act like slower memory. Movement between rings is scheduled, not triggered by a miss.

Compilable and Verified: Cycle-Accurate RTL and a Scheduler

This isn't a hand-wavy dream. The MADAR authors define the execution model formally, implement a cycle-accurate register-transfer-level model, and build a constructive scheduler that emits programs cross-checked against the implementation. They also price it with a first-order energy model. The result is a machine that combines circulating-store, dataflow, and statically scheduled techniques in a way no prior design has.

AI Acceleration Gets Flat Energy Per MAC as Reduction Grows

The payoff shows up clearest in matrix multiply and convolution. The multiply-accumulate at the heart of every matmul and convolution compiles into a streaming form where energy per operation stays flat as the reduction dimension grows. Operand reuse that makes matmul efficient is carried by the ring-period hierarchy -- the memory hierarchy does by rotation what a cache does by tags. That means the energy overhead of indexing into a large shared sum doesn't ramp up with problem size.

What This Enables

MADAR is a new design point for any computation whose data movement is known before the program runs. That covers most linear algebra, signal processing, and AI inference. The next move is to see how far the compile-time scheduling constraint bites on irregular workloads, and whether the ring hierarchy can match cache performance on general-purpose code. If the energy and area savings hold up in silicon, addressing might finally be the part of a processor we stop building.


Source: MADAR: An Address-Free Processor
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.