How itertools.combinations Skewed a 4,463-Model MoE Search

The pipeline churned through 4,463 candidate models over 28 days on a single RTX 4090, but the entire 1,021 successfully evaluated configurations were anchored to one family: AirNet. Alphabetical enumeration via itertools.combinations biased the search from the start.

The Alphabetical Accident

The authors built a deterministic code-assembly generator for 4-expert heterogeneous Mixture-of-Experts (MoE4) architectures. It used itertools.combinations to select 4 base architecture families from a pool of 23,751 possible 4-family combos. Python's itertools.combinations returns combinations in lexicographic order based on the sorted input list. The input list happened to place AirNet first alphabetically among the candidates. Every combination generated in the 28-day campaign started with AirNet. The generator never advanced past the first alphabetical slot, meaning only 4.8% of the theoretical space was ever touched.

I've stared at itertools.combinations output before, but I never considered this kind of systematic coverage bias. The authors caught it only because they characterized the search space ex post facto and found the anchor.

What the Data Actually Found

Within the AirNet-anchored scope, ShuffleNet and MobileNetV3 consistently co-produced the highest-accuracy ensembles, hitting a mean accuracy up to 0.632. FractalNet and MNASNet turned out to be low-yield families that the authors recommend excluding from future campaigns. The gating network used convolutional layers with temperature scaling, mixup augmentation, and cosine-annealed learning rate scheduling. None of these specifics matter if the search design itself is flawed from the first line of code.

Fixing the Pipeline

The root cause is simple: alphabetical enumeration gives no guarantee of diversity in early samples. The proposed fix - stratified random sampling over the combinatorial space - is equally simple. The authors released the pipeline, analysis artifacts, and corrected generator under the open-source NNGPT project at https://github.com/ABrain-One/nn-gpt. Future MoE searches with this tool will actually sample the space they intend to sample, not the first letter of the alphabet.

Source: Systematic Exploration of 4-Expert Heterogeneous Mixture-of-Experts via Automated Pipeline Search
Domain: arxiv.org

How itertools.combinations Skewed a 4,463-Model MoE Search

The Alphabetical Accident

What the Data Actually Found

Fixing the Pipeline

More in Machine Learning