Source linked

SupraSNN: SNN inspiré par la superscalaire réduit la latence de 47% et l'énergie de 5,6x

Un cadre de co-conception matériel-logiciel traite les événements synaptiques comme des micro-options parallèles, obtenant une inference de 149μs sur MNIST avec 0,025 mJ par image sur un Xilinx Zynq FPGA

spiking neural networksfpga acceleratorsxilinx zynqhardware software co designsynapse level parallelismsupra snn

47.6% lower latency and 5.6× better energy efficiency than prior FPGA-based SNN accelerators—that's the headline number SupraSNN delivers on a Xilinx Zynq XC7Z020 FPGA. Not bad for a brain-inspired network that usually gets bogged down in serial synapse processing.

Treating Synapses as Micro-Ops

SupraSNN borrows the superscalar playbook from 1990s CPU design. Instead of dispatching instructions to parallel functional units, it dispatches synaptic events to multiple Synapse Processing Units via a Multi-Cast Tree. A Merge Tree collects distributed results and feeds them to a centralized Neuron Unit. The key insight: complex neuron state dynamics stay in one place to avoid hardware duplication, while the synapses—the real compute hogs—run in parallel.

A co-optimized mapping and scheduling framework first partitions the SNN under memory constraints, then heuristically orders synaptic execution to keep the pipelines full. No hand-tuning required; the toolchain does the heavy lifting.

Measured Performance: 149µs and 0.025 mJ

On a feedforward SNN trained on MNIST (93.44% accuracy), SupraSNN delivers 149 µs inference latency and 0.025 mJ per image. That works out to 0.276 nJ per synapse event. The implementation targets the Xilinx Zynq XC7Z020, a mid-range FPGA common in edge deployments. Prior FPGA accelerators simply can't match the synapse-level parallelism this architecture unlocks.

Beyond MNIST: Recurrent SNN on Heidelberg Dataset

SupraSNN doesn't just handle static images. A recurrent SNN on the Spiking Heidelberg Dataset (71.82% accuracy) achieves 1.41 ms latency and 0.77 mJ per sample on the larger XC7Z030 FPGA. That's still under 1 mJ per inference—a sweet spot for always-on sensor processing and neuromorphic edge applications.

The architecture is a concrete reminder that the right hardware-software co-design can make SNNs practical where they've been theoretical. Expect to see synapse-level parallelism become a standard trick in future accelerator tape-outs.


Source: SupraSNN: Exploiting Synapse-Level Parallelism in Spiking Neural Network Accelerators through Co-Optimized Mapping and Scheduling
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.