Subquadratic Ships SubQ 1.1 Small With 64.5x Less Compute at 1M Tokens

At 1M tokens, SubQ 1.1 Small uses 64.5x less compute than dense attention and runs 56x faster than FlashAttention-2 on a single attention layer.

Subquadratic built SubQ 1.1 Small around Subquadratic Sparse Attention (SSA), a learned sparse attention mechanism that scales linearly with context length instead of quadratically. That single architectural change makes direct reasoning over million-token artifacts economically feasible for the first time. The model card and technical report, released today, detail how SSA compresses attention to just 0.13% of all possible relationships at 12M tokens while retaining near-perfect retrieval.

SSA Cuts Attention Compute by 64.5x at 1M Tokens

SSA replaces the O(n²) dense attention pass with a content-routed sparse formulation. At 1M tokens, SubQ 1.1 Small requires 64.5x fewer FLOPs than the dense baseline and achieves a 56x wall-clock speedup over FlashAttention-2. The efficiency advantage grows with context length because SSA's cost grows linearly, not quadratically. Subquadratic trained the model using staged context extension from 262K to 2M tokens, followed by roughly one trillion tokens of continued pretraining on naturally long documents, books, and repository-scale code.

Retrieval Holds Near-Perfect at 12x Training Length

SubQ 1.1 Small scored 100% on needle-in-a-haystack (NIAH) at 1M and 2M tokens, and 98% at 6M and 12M tokens. The model was trained predominantly at 1M tokens yet the retrieval held near perfectly at 12x that length. On Nvidia's RULER test (13 multi-hop tasks at 128K), it scored 99.12%. SSA's routing based on content relevance rather than fixed positional patterns explains this generalization: the selection criterion is independent of absolute position.

General Reasoning Competes with Frontier Models Despite Small Size

GPQA Diamond at 85.4% places SubQ 1.1 Small just below mid-tier frontier models like GPT-5.5 and Opus 4.8 but well above smaller models like Haiku 4.5 (67.2%). LiveCodeBench v6 pass@4 at 89.7% is close to the absolute frontier. AutomationBench Finance at 13% matches stronger models on a benchmark where absolute scores remain low across all contenders. Subquadratic achieved this balance through more than one hundred experiments across multiple model generations, iterations that SSA's efficiency made practical as standard procedure rather than rare events.

Subquadratic plans to deploy a broader lineup of models ranging from 2M to 12M tokens later this year, turning complete-artifact reasoning into a standard capability rather than a workaround for context window limits.

Source: Subquadratic - Introducing SubQ 1.1 Small
Domain: subq.ai

Subquadratic Ships SubQ 1.1 Small With 64.5x Less Compute at 1M Tokens

SSA Cuts Attention Compute by 64.5x at 1M Tokens

Retrieval Holds Near-Perfect at 12x Training Length

General Reasoning Competes with Frontier Models Despite Small Size

More in Artificial Intelligence