28.5% of the hardest forecasting instances are better predicted by a domain-misaligned foundation model than by the globally best model available. That's the central finding in a new distillation framework called GUARD, from Rupasree Dey and co-authors, accepted at KDD 2026. The implication is direct: you don't need a perfectly aligned giant model to improve a lightweight forecaster - you just need to know when to trust which teacher.
Why Misaligned Foundation Models Still Help
Time-Series Foundation Models (TSFMs) pack universal temporal dynamics, but they hit a wall the moment you apply them zero-shot to a specific scientific domain. Distribution shift turns their predictions into garbage. The usual reflex is to throw more compute at fine-tuning or to pick the single best pre-trained model. GUARD argues that's the wrong reflex. Instead, it treats multiple foundation models as a committee of flawed experts, each misaligned in different ways. The trick is routing: for each input instance, the system picks the teacher whose local input statistics best match the current signal. That teacher may be globally worse on the overall distribution, but for that specific slice of data, it dominates.
Two Mechanisms That Make Distillation Work
GUARD wraps this insight into two concrete mechanisms. First, a Contextual Router that compares local input statistics across teachers and selects the most relevant one per instance. No fixed voting, no average of all outputs - a hard, instance-wise decision. Second, an Uncertainty-Gated Temperature that acts as a circuit-breaker: when a teacher's softmax confidence diverges from the actual domain distribution, the distillation temperature is raised, effectively flattening the teacher's signal and preventing it from polluting the student. Together they form a distillation pipeline that can extract structural knowledge from teachers even when their zero-shot accuracy is suboptimal.
Four Climate Domains, One Lightweight Forecaster
The authors evaluated GUARD on meteorology, ecosystem carbon flux, soil moisture, and energy grids - four climate-critical domains with very different dynamics. Compared to a fixed-weight multi-teacher distillation baseline, GUARD significantly reduces RMSE across all four. More striking: those domain-misaligned teachers, the ones that look useless in isolation, turn out to be critical correctives on the hardest examples. The lightweight student, designed for edge-computing sensor networks, ends up outperforming any single teacher model on those tough instances.
Code is available on GitHub ahead of KDD 2026. If you're building a forecaster for a niche scientific domain where off-the-shelf foundation models fail, GUARD gives you a principled way to turn their weaknesses into strengths - without loading the full models onto a sensor node.
Source: When to Trust, How to Distill: Multi-Foundation Model Guidance for Lightweight, Robust Scientific Time Series Forecasting
Domain: arxiv.org
Comments load interactively on the live page.