Source linked

Управляемое кривой смешивание решает катастрофическое забвение в MLLM

Новый метод CGM использует хессианские подходы для получения оптимального соотношения смешения, сохраняя общие знания при специализации на последующих задачах.

curvature guided mixingmllmmultimodal large language modelscatastrophic forgettingmodel mergingllava

Fine-tuning a multimodal LLM on a specialized dataset nukes its general capabilities faster than you can say "catastrophic forgetting." Heuristic model merging methods try to patch the damage but rely on guesswork. A new paper from an ECCV 2026 submission fixes that with actual math.

Hessian Landscapes Guide the Merge

The key insight is that parameter importance varies across tasks. CGM (Curvature-Guided Mixing) formulates a joint optimization objective that balances task-specific loss and general knowledge retention. Instead of picking a blending ratio by trial and error, it uses a second-order (Hessian) approximation of the loss landscapes to solve for the optimal ratio in closed form. That ratio blends parameters based on their curvature higher curvature means that parameter is more critical for that task, so it gets weighted more heavily. No more blind interpolation.

Soft and Hard Mixing, Both Principled

CGM produces a "soft mixing" where every parameter gets a continuous blend weight. The authors also introduce CGM$\dagger$, a "hard mixing" variant that selects a sparse subset of parameters to swap, guided by a curvature-aware score. This is useful when you want to surgically replace only the most task-relevant weights rather than blending everything. Both variants are grounded in the same Hessian analysis, not in ad-hoc heuristics.

LLaVA-1.5 and Qwen2.5VL Put to the Test

Experiments across multiple downstream tasks show that CGM and CGM$\dagger$ consistently improve the tradeoff between task specialization and general knowledge retention compared to existing merging methods. The paper reports results on LLaVA-1.5 and Qwen2.5VL, two widely used MLLM families. Code is available at github.com/zzsyjl/CGM-ECCV-2026, so you can replicate the findings or adapt the framework to your own models.

If CGM scales to larger architectures, making a specialized MLLM without trashing its commonsense reasoning just became a solvable engineering problem.


Source: Curvature-Guided Mixing for MLLM Adaptation
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.