Source linked

69% меньше, 72% прохладнее: приблизительный умножитель FP для SRAM Compute-in-Memory

Приблизительный мультипликатор сегментации мантисы для SRAM DCiM сокращает логическую площадь на 69% и мощность на 72% с незначительной потерей точности на ResNet-18

sramcompute in memoryfloating point multiplierapproximate computingopenacmresnet 18

69% logic area reduction and 72% power savings — that's what a mantissa-segmentation approximate multiplier delivers over exact IEEE 754 floating-point units in SRAM-based compute-in-memory (DCiM) arrays. Post-layout results show no delay overhead, and ResNet-18 inference suffers negligible accuracy degradation.

The Problem With Floating Point in DCiM

Digital Compute-in-Memory cuts the data movement tax that kills edge AI efficiency. But most DCiM frameworks stick to integer or fixed-point arithmetic because jamming full IEEE 754 floating-point units into dense SRAM arrays blows up area and power budgets. The authors — Shen, Shan, and colleagues — quantify exactly how bad: exact FP multipliers are too fat for the bitcell-dense grid.

Mantissa Segmentation as a Practical Knob

Their solution trades mantissa precision for hardware cost. Instead of rounding or truncating randomly, they segment the mantissa and approximate the product using a smaller lookup table plus a few adders. The result is configurable: you dial the accuracy knob based on application tolerance. For image processing tasks and ResNet-18, the accuracy drop stays below measurable significance.

No delay is added — critical for keeping the DCiM macro cycle time intact. The exact IEEE 754 baseline exists in the same OpenACM framework for comparison, so you can mix exact and approximate multipliers on the same chip.

What This Enables

This makes compiler-integrated approximate floating-point a viable path for SRAM DCiM systems. Edge devices that need float32 or bfloat16 for training-aware quantization or fine-tuning can now get it without the area and power tax. The multiplier is already on GitHub under ShenShan123/OpenACM, so you can wire it into your DCiM compiler flow today.

Approximate computing gets a concrete, silicon-validated win — 72% power savings with no hidden latency penalty. That’s the kind of tradeoff that makes edge AI architectures rethink their arithmetic units.


Source: Accuracy-Configurable Floating-Point Multiplier Design for SRAM-Based Compute-in-Memory
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.