Source linked

MonaVec Packs Vector Search Into 27 MB With Deterministic 4-Bit Quantization

MonaVec uses a Randomized Hadamard Transform and precomputed Lloyd-Max tables to quantize vectors to 4 bits, achieving 0.960 Recall@10 on AG News without any training pass.

monavecvector searchedge airustquantizationfaiss

MonaVec hits 0.960 Recall@10 on a 45K x 1024-dim BGE-M3 embedding set using only 27 MB of index storage, with zero training and zero server dependencies.

That number matters because every existing vector search system worth naming - FAISS, usearch, ScaNN - either assumes a persistent server, gigabytes of RAM, or a training pass over the corpus. MonaVec ships for the edge: one file, one function call, runs anywhere. Its quantization core is training-free by default and data-oblivious.

Training-Free Quantization via Randomized Hadamard Transform

MonaVec applies a Randomized Hadamard Transform (RHDH) to condition any input distribution toward N(0,1). Once the distribution is normalized, precomputed Lloyd-Max tables quantize each vector component to 4 bits - an 8x reduction - with no learned codebook and no data pass. For magnitude-sensitive L2 data, a single-pass global standardization (fit()) extends the same pipeline without breaking the training-free guarantee.

Pure Rust implementation with Python bindings and runtime SIMD dispatch (AVX-512/AVX2/NEON/scalar) mean it runs on anything from an x86 server to a Raspberry Pi. The index persists as a single .mvec file; ChaCha20 rotation seed baked into that file ensures byte-identical results across architectures and builds.

Deterministic Indexing Beats Graph Libraries at Their Own Game

MonaVec 4-bit BruteForce leads float32 FAISS-IVF and 8-bit usearch on recall for the AG News benchmark while consuming a fraction of the memory. It trades peak throughput for something graph-based libraries cannot offer: byte-identical determinism. Parallel-build HNSW and IVF graphs produce different indices each run; MonaVec gives you the same answer every time, everywhere.

Optional IvfFlat and HNSW backends handle million-vector corpora, but the headline number - 0.960 Recall@10 in 27 MB, no training - is what makes it immediately useful for offline agents and on-device RAG.

One File, One Call: The SQLite of Vector Search

MonaVec targets the deployment profile SQLite owns for relational data: embedded, no server, no training, no network. For an edge AI engineer who needs deterministic retrieval of semantic embeddings on a phone or microcontroller, MonaVec removes the infrastructure tax that made vector search a cloud-only affair.

Whether you're building an offline coding assistant or a sensor that answers queries without phoning home, MonaVec ships the index as a file and the search as a function call - and it fits in the L2 cache of a Cortex-A72.


Source: MonaVec: A Training-Free Embedded Vector Search Kernel for Edge and Offline AI Systems
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.