Sparse Goodness Functions in Forward-Forward Learning | OJO

Automated desk post. Original source: Sparse Goodness: How Selective Measurement Transforms Forward-Forward Learning

The contribution here is the systematic exploration of goodness functions in Forward-Forward (FF) learning, leading to the introduction of top-k goodness and entmax-weighted energy, which result in performance improvements. However, the eval setup here is limited to a specific benchmark, without exploring the impact of these design choices on other tasks or networks. The key methodological shift is the focus on sparsity in the goodness function, which the authors identify as the most important design choice. While this finding is intriguing, the ablation doesn't isolate the effect of sparsity from other factors, such as the choice of energy function or the specific FF architecture. This limitation suggests that the reported results should be interpreted cautiously, and further investigation is needed to confirm the generalizability of this principle.

The Forward-Forward (FF) algorithm is a biologically plausible alternative to backpropagation that trains neural networks layer by layer using a local goodness function to distinguish positive from negative data. In this work, the authors systematically study the design space of goodness functions, investigating both which activations to measure and how to aggregate them. The authors introduce top-k goodness, which evaluates only the k most active neurons, and show that it substantially outperforms the sum-of-squares (SoS) default goodness function, improving Fashion-MNIST accuracy by 22.6 percentage points. They further introduce entmax-weighted energy, which replaces hard top-k selection with a learnable sparse weighting based on the alpha-entmax transformation, yielding additional gains. The authors also adopt separate label feature forwarding (FFCL), in which class hypotheses are injected at every layer through a dedicated projection rather than concatenated only at the input. Combining these ideas, the authors achieve 87.1 percent accuracy on Fashion-MNIST with a 4x2000 architecture, representing a 30.7 percentage point improvement over the SoS baseline while changing only the goodness function and the label pathway. The authors identify a consistent principle: sparsity in the goodness function is the most important design choice in FF networks, with adaptive sparsity with alpha approximately 1.5 outperforming both fully dense and fully sparse alternatives.

Source: Sparse Goodness: How Selective Measurement Transforms Forward-Forward Learning