Source linked

SupraSNN: Superscalar-Inspired SNN Cuts Latency 47% and Energy 5.6x

A hardware-software co-design framework treats synaptic events as parallel micro-ops, achieving 149μs inference on MNIST with 0.025 mJ per image on a Xilinx Zynq FPGA

spiking neural networksfpga acceleratorsxilinx zynqhardware software co designsynapse level parallelismsupra snn

47.6% lower latency and 5.6× better energy efficiency than prior FPGA-based SNN accelerators—that's the headline number SupraSNN delivers on a Xilinx Zynq XC7Z020 FPGA. Not bad for a brain-inspired network that usually gets bogged down in serial synapse processing.

Treating Synapses as Micro-Ops

SupraSNN borrows the superscalar playbook from 1990s CPU design. Instead of dispatching instructions to parallel functional units, it dispatches synaptic events to multiple Synapse Processing Units via a Multi-Cast Tree. A Merge Tree collects distributed results and feeds them to a centralized Neuron Unit. The key insight: complex neuron state dynamics stay in one place to avoid hardware duplication, while the synapses—the real compute hogs—run in parallel.

A co-optimized mapping and scheduling framework first partitions the SNN under memory constraints, then heuristically orders synaptic execution to keep the pipelines full. No hand-tuning required; the toolchain does the heavy lifting.

Measured Performance: 149µs and 0.025 mJ

On a feedforward SNN trained on MNIST (93.44% accuracy), SupraSNN delivers 149 µs inference latency and 0.025 mJ per image. That works out to 0.276 nJ per synapse event. The implementation targets the Xilinx Zynq XC7Z020, a mid-range FPGA common in edge deployments. Prior FPGA accelerators simply can't match the synapse-level parallelism this architecture unlocks.

Beyond MNIST: Recurrent SNN on Heidelberg Dataset

SupraSNN doesn't just handle static images. A recurrent SNN on the Spiking Heidelberg Dataset (71.82% accuracy) achieves 1.41 ms latency and 0.77 mJ per sample on the larger XC7Z030 FPGA. That's still under 1 mJ per inference—a sweet spot for always-on sensor processing and neuromorphic edge applications.

The architecture is a concrete reminder that the right hardware-software co-design can make SNNs practical where they've been theoretical. Expect to see synapse-level parallelism become a standard trick in future accelerator tape-outs.


Source: SupraSNN: Exploiting Synapse-Level Parallelism in Spiking Neural Network Accelerators through Co-Optimized Mapping and Scheduling
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.