A new reverse-engineering report documents the Apple Neural Engine's internals across every chip generation from the A11 to the A18 and M1 to M5, including a direct user-space invocation path that Apple never documented.
What the ANE Actually Does and How It's Structured
The Apple Neural Engine is a fixed-function matrix accelerator shipped in every iPhone, iPad, and Mac SoC since the A11. The report reverse-engineers the datapath and roofline that bound its throughput and energy, compiling per-chip target tables and an operation-by-device matrix. Direct measurements come from M1 and M5 hardware; claims are labeled as measured, decompile-derived, or predicted. The analysis covers the private runtime, compiler, kernel driver, firmware, and the on-disk program format, including the weight-compression scheme.
The Undocumented Direct Path and Why It Matters
Core ML is Apple's supported framework for the ANE, but the report reveals a direct dispatch route below Core ML callable from ordinary user space. That path is undocumented, unsupported, and version-fragile. The report explicitly cautions it is intended for measurement, research, and on-device work, not for shipping software. For anyone doing low-level performance analysis or custom neural network deployment on Apple silicon, this path is a gold mine - just don't ship it.
Measured Performance and Open Questions
The report documents the dispatch route, command protocol, and static analysis of the kernel driver and firmware. Per-chip performance characteristics are provided for A11 through A18 and M1 through M5, with the direct measurements on M1 and M5 serving as ground truth. Open questions and methodology are recorded, giving future researchers a clear starting point. Anyone working on on-device AI optimization now has a definitive reference for Apple's neural accelerator, down to the undocumented bytes.
Source: Apple Neural Engine: Architecture, Programming, and Performance
Domain: arxiv.org
Comments load interactively on the live page.