BASIS (Balanced Activation Sketching with Invariant Scalars) is an efficient backpropagation algorithm that fully decouples activation memory from the batch and sequence dimensions. This decoupling reduces the spatial bottleneck, enabling the scaling of deep neural networks. The algorithm's theoretical guarantees and empirical validation make it a significant advancement in the field. Theoretically, BASIS reduces activation memory to O(L * RN ) and heavily decreases the backward pass matrix-multiplication footprint. Empirically, training a GPT architecture for 50,000 steps validates our theoretical guarantees: at R = 32, BASIS achieves parity with (and marginally outperforms) exact backpropagation validation loss (6.575 vs. 6.616), acting as an implicit regularizer. Remarkably, the stabilized magnitude trajectory allows the model to converge smoothly even under extreme spatial compression (R = 1), proving the extreme robustness of the estimator.
Source: BASIS: Balanced Activation Sketching with Invariant Scalars for "Ghost Backpropagation"
Comments load interactively on the live page.