Source linked

Misaligned Foundation Models Beat Accurate Ones on 28.5% of Hard Time Series Instances

Even foundation models that perform poorly zero-shot on a scientific domain can serve as critical correctives, outperforming globally superior models on 28.5% of the hardest forecasting instances.

guardtime series foundation modelsknowledge distillationclimate forecastingedge computingkdd 2026

28.5% of the hardest forecasting instances are better predicted by a domain-misaligned foundation model than by the globally best model available. That's the central finding in a new distillation framework called GUARD, from Rupasree Dey and co-authors, accepted at KDD 2026. The implication is direct: you don't need a perfectly aligned giant model to improve a lightweight forecaster - you just need to know when to trust which teacher.

Why Misaligned Foundation Models Still Help

Time-Series Foundation Models (TSFMs) pack universal temporal dynamics, but they hit a wall the moment you apply them zero-shot to a specific scientific domain. Distribution shift turns their predictions into garbage. The usual reflex is to throw more compute at fine-tuning or to pick the single best pre-trained model. GUARD argues that's the wrong reflex. Instead, it treats multiple foundation models as a committee of flawed experts, each misaligned in different ways. The trick is routing: for each input instance, the system picks the teacher whose local input statistics best match the current signal. That teacher may be globally worse on the overall distribution, but for that specific slice of data, it dominates.

Two Mechanisms That Make Distillation Work

GUARD wraps this insight into two concrete mechanisms. First, a Contextual Router that compares local input statistics across teachers and selects the most relevant one per instance. No fixed voting, no average of all outputs - a hard, instance-wise decision. Second, an Uncertainty-Gated Temperature that acts as a circuit-breaker: when a teacher's softmax confidence diverges from the actual domain distribution, the distillation temperature is raised, effectively flattening the teacher's signal and preventing it from polluting the student. Together they form a distillation pipeline that can extract structural knowledge from teachers even when their zero-shot accuracy is suboptimal.

Four Climate Domains, One Lightweight Forecaster

The authors evaluated GUARD on meteorology, ecosystem carbon flux, soil moisture, and energy grids - four climate-critical domains with very different dynamics. Compared to a fixed-weight multi-teacher distillation baseline, GUARD significantly reduces RMSE across all four. More striking: those domain-misaligned teachers, the ones that look useless in isolation, turn out to be critical correctives on the hardest examples. The lightweight student, designed for edge-computing sensor networks, ends up outperforming any single teacher model on those tough instances.

Code is available on GitHub ahead of KDD 2026. If you're building a forecaster for a niche scientific domain where off-the-shelf foundation models fail, GUARD gives you a principled way to turn their weaknesses into strengths - without loading the full models onto a sensor node.


Source: When to Trust, How to Distill: Multi-Foundation Model Guidance for Lightweight, Robust Scientific Time Series Forecasting
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.