Source linked

KATANA Maps Kalman Filters onto Edge NPUs, Slashing Energy by 97.9%

A new algorithm maps Linear and Extended Kalman Filters onto commercial NPUs, achieving 223 FPS at 13.4 W and cutting dynamic energy by 97.9% versus CPU execution.

katanaintel core ultraedge npuskalman filtersneural processing unitreal time tracking

The authors of KATANA just proved you can run Kalman filters on a neural processing unit and cut energy consumption by 97.9% versus CPU execution. That changes the power budget math for every drone, radar system, and autonomous vehicle that relies on real-time state estimation.

NPUs Aren't Just for Neural Networks Anymore

Intel's Core Ultra Series 1 and 2 integrate a low-power Neural Processing Unit (NPU) designed for data-parallel matrix workloads. The catch: every existing Kalman filter implementation runs on CPUs (serial, energy hungry) or custom FPGAs/ASICs (long design cycles). KATANA is the first end-to-end mapping of both the Linear and Extended Kalman Filters onto a commercial NPU, using the DPU matrix engine for 100% of operations.

The team applied three algebraic graph rewrites to make the Kalman filter NPU-friendly: a subtract-to-add reformulation via a precomputed negative-projection matrix ($H_{\text{neg}}$), static-shape tensor fusion, and block-diagonal batched parallelization. These rewrites transform the filter's sequential recursion into a batched, static-shape computation that fits the NPU's SIMD-like execution model.

Real Numbers, Real Hardware

On the Intel Core Ultra Series 2 NPU, the optimized batched EKF hits 223.35 FPS at 13.43 W active power. The LKF reaches 408.73 FPS at 14.05 W. Compare that to the CPU baseline: the KATANA mapping delivers up to a 97.9% reduction in dynamic energy. That's not a theoretical projection - those are measured figures on shipping silicon.

This means multi-object tracking updates that previously serialized across CPU cores can now run in parallel on the NPU, leaving the CPU and GPU free for primary workloads like perception or control logic. For battery-constrained edge platforms, that extra watt margin translates directly into longer mission duration or higher tracking throughput.

What This Unlocks Next

The KATANA framework targets the Intel AI-PC SoC line, but the algebraic rewrite approach generalizes to any NPU with a matrix engine. The next step: extending the mapping to the Unscented Kalman Filter and exploring adaptive precision quantization to squeeze even more FPS from the same power envelope.


Source: KATANA: A Fast, Low-Power Mapping of Kalman Filters onto Edge NPUs for Real-Time Tracking
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.