Source linked

LFPM الدفاع الفضائي المميز يقتل أبواب الخلفية في النماذج المشتركة دون ضياع دقة

يتم تجميع النماذج في خطر على هجمات الخلفية التي لا يمكن حلها من خلال تقسيم المهام دون إزالة الأداء. LFPM يمنع الخلفيات الخلفية من وجهة النظر من منطقة الخصائص باستخدام الخطوط الحمراء بين المهام.

model mergingbackdoor attackstask arithmeticcross task linearitylfpmcybersecurity

Model merging hides a backdoor problem: stitching together multiple task-specific models into a single unified model creates a surface for poisoning, and existing parameter-space defenses like task arithmetic consistently degrade clean-task performance when trying to remove the attack. That trade-off is no longer forced.

Why Parameter-Space Editing Fails

Previous backdoor defenses for merged models operate directly on parameter vectors—subtracting, adding, or scaling task vectors in weight space. The authors of the new preprint (arXiv:2606.12498) show that this approach is fundamentally limited: the backdoor signal is entangled with clean-task features at the parameter level, so any attempt to excise it also removes useful information. The result: accuracy drops sharply even as the attack is partially mitigated.

LFPM: Feature-Space Arithmetic via Cross-Task Linearity

The proposed method, Linear Feature Path Minimization (LFPM), shifts the battlefield from parameters to features. It operates under the Cross-Task Linearity (CTL) framework, which observes that learned features across different tasks are approximately linear in representation space. Instead of editing weights directly, LFPM introduces an anti-backdoor task vector—optimized to minimize a loss path-integral along the interpolation between clean and backdoored states—that suppresses the backdoor while leaving task-relevant features intact.

The optimization uses gradient accumulation along the feature-space interpolation path, computing a path-integral loss that penalizes backdoor activation without penalizing clean-task features. This is not a trivial gradient descent: the loss path-integral captures how the backdoor behavior evolves across the merged model's feature manifold, enabling targeted suppression.

Robustness Across Tuning Paradigms

Experiments span both full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) settings. LFPM consistently exhibits strong robustness against backdoor attacks, outperforming existing task-arithmetic defenses by a wide margin on both attack success rate reduction and clean-task accuracy preservation. The paper does not cite specific benchmark numbers, but the claim is clear: the feature-space approach eliminates the accuracy-robustness trade-off that plagued prior work.

LFPM opens a new direction for robust model merging—feature-space arithmetic—which should become standard practice in any multi-task model deployment pipeline.


Source: From Parameters to Feature Space: Task Arithmetic for Backdoor Mitigation in Model Merging
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.