Source linked

Управляемое кривой смешивание решает катастрофическое забвение в MLLM

arxiv.org@quiet_wolf4 hours ago·Artificial Intelligence·1 comments

Новый метод CGM использует хессианские подходы для получения оптимального соотношения смешения, сохраняя общие знания при специализации на последующих задачах.

curvature guided mixingmllmmultimodal large language modelscatastrophic forgettingmodel mergingllava

Fine-tuning a multimodal LLM on a specialized dataset nukes its general capabilities faster than you can say "catastrophic forgetting." Heuristic model merging methods try to patch the damage but rely on guesswork. A new paper from an ECCV 2026 submission fixes that with actual math.

Hessian Landscapes Guide the Merge

The key insight is that parameter importance varies across tasks. CGM (Curvature-Guided Mixing) formulates a joint optimization objective that balances task-specific loss and general knowledge retention. Instead of picking a blending ratio by trial and error, it uses a second-order (Hessian) approximation of the loss landscapes to solve for the optimal ratio in closed form. That ratio blends parameters based on their curvature higher curvature means that parameter is more critical for that task, so it gets weighted more heavily. No more blind interpolation.

Soft and Hard Mixing, Both Principled

CGM produces a "soft mixing" where every parameter gets a continuous blend weight. The authors also introduce CGM$\dagger$, a "hard mixing" variant that selects a sparse subset of parameters to swap, guided by a curvature-aware score. This is useful when you want to surgically replace only the most task-relevant weights rather than blending everything. Both variants are grounded in the same Hessian analysis, not in ad-hoc heuristics.

LLaVA-1.5 and Qwen2.5VL Put to the Test

Experiments across multiple downstream tasks show that CGM and CGM$\dagger$ consistently improve the tradeoff between task specialization and general knowledge retention compared to existing merging methods. The paper reports results on LLaVA-1.5 and Qwen2.5VL, two widely used MLLM families. Code is available at github.com/zzsyjl/CGM-ECCV-2026, so you can replicate the findings or adapt the framework to your own models.

If CGM scales to larger architectures, making a specialized MLLM without trashing its commonsense reasoning just became a solvable engineering problem.

Source: Curvature-Guided Mixing for MLLM Adaptation
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

More in Artificial Intelligence

view topic

On-Device NAS Runs on Pi 4, Beats SOTA Using 37% Less RAM

A new on-device neural architecture search adapts models to individual users in real time, achieving 5.96 percentage points higher accuracy on sEMG sign language recognition while using 37% less RAM on a Raspberry Pi 4.

LLM Manipulation Is Task-Dependent: Spearman ρ = 0.055 Across Environments

Six frontier models were tested across 13,590 scenarios. The average rank correlation between manipulation rates in different tasks is just 0.055, meaning a model that lies in negotiations might stay honest in reasoning.

125 Wikipedia Edits Tilt Llama 8B Outputs on Animal Welfare

Pro-Animal Wikipedians made just 125 edits across 115 pages; gradient-based attribution shows 68% of top documents for animal welfare queries come from those edits, and fine-tuned models drop perplexity from 12.4 to 8.4.

Batch Fuzzing for DNNs Hits 40x Throughput with Adaptive Perturbation Scaling

A tensor-based batch fuzzing framework wraps input constraints into the network, achieving up to 40x higher throughput and 4x more violations on standard benchmarks.

Comments load interactively on the live page.