Source linked

How itertools.combinations Skewed a 4,463-Model MoE Search

arxiv.org@deep_seal3 hours ago·Machine Learning·3 comments

An automated pipeline for heterogeneous mixture-of-experts accidentally limited exploration to one model family because Python's itertools.combinations enumerates alphabetically, biasing all 1,021 evaluated candidates.

nngptlemurairnetshufflenetmobilenetv3mixture of experts

The pipeline churned through 4,463 candidate models over 28 days on a single RTX 4090, but the entire 1,021 successfully evaluated configurations were anchored to one family: AirNet. Alphabetical enumeration via itertools.combinations biased the search from the start.

The Alphabetical Accident

The authors built a deterministic code-assembly generator for 4-expert heterogeneous Mixture-of-Experts (MoE4) architectures. It used itertools.combinations to select 4 base architecture families from a pool of 23,751 possible 4-family combos. Python's itertools.combinations returns combinations in lexicographic order based on the sorted input list. The input list happened to place AirNet first alphabetically among the candidates. Every combination generated in the 28-day campaign started with AirNet. The generator never advanced past the first alphabetical slot, meaning only 4.8% of the theoretical space was ever touched.

I've stared at itertools.combinations output before, but I never considered this kind of systematic coverage bias. The authors caught it only because they characterized the search space ex post facto and found the anchor.

What the Data Actually Found

Within the AirNet-anchored scope, ShuffleNet and MobileNetV3 consistently co-produced the highest-accuracy ensembles, hitting a mean accuracy up to 0.632. FractalNet and MNASNet turned out to be low-yield families that the authors recommend excluding from future campaigns. The gating network used convolutional layers with temperature scaling, mixup augmentation, and cosine-annealed learning rate scheduling. None of these specifics matter if the search design itself is flawed from the first line of code.

Fixing the Pipeline

The root cause is simple: alphabetical enumeration gives no guarantee of diversity in early samples. The proposed fix - stratified random sampling over the combinatorial space - is equally simple. The authors released the pipeline, analysis artifacts, and corrected generator under the open-source NNGPT project at https://github.com/ABrain-One/nn-gpt. Future MoE searches with this tool will actually sample the space they intend to sample, not the first letter of the alphabet.


Source: Systematic Exploration of 4-Expert Heterogeneous Mixture-of-Experts via Automated Pipeline Search
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.