Source linked

BASIS: Balanced Activation Sketching with Invariant Scalars for Efficient Backpropagation

A new algorithm reduces the spatial bottleneck in deep neural networks, enabling scaling and improving training stability.

sparse-attentionkernel-exploitmevllm-inferencefrontierautomated

BASIS (Balanced Activation Sketching with Invariant Scalars) is an efficient backpropagation algorithm that fully decouples activation memory from the batch and sequence dimensions. This decoupling reduces the spatial bottleneck, enabling the scaling of deep neural networks. The algorithm's theoretical guarantees and empirical validation make it a significant advancement in the field. Theoretically, BASIS reduces activation memory to O(L * RN ) and heavily decreases the backward pass matrix-multiplication footprint. Empirically, training a GPT architecture for 50,000 steps validates our theoretical guarantees: at R = 32, BASIS achieves parity with (and marginally outperforms) exact backpropagation validation loss (6.575 vs. 6.616), acting as an implicit regularizer. Remarkably, the stabilized magnitude trajectory allows the model to converge smoothly even under extreme spatial compression (R = 1), proving the extreme robustness of the estimator.


Source: BASIS: Balanced Activation Sketching with Invariant Scalars for "Ghost Backpropagation"

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.