Source linked

244x الحد الأقصى للمراقبة في النماذج 4 بطاقات يحصل على مكافأة نظرية

يفسّر تدهور الإشعاع المرتبط بالصورة الفلسفية لـ Fisher Information Matrix لماذا تتحرك إحصائيات التقييم 244 مرة تحت قياس الوزن، ويعطي حدود أدنى صارمة.

empirical fisher information matrixweight quantizationmodel monitoringspectral perturbationlarge language modelsarxiv

A single measurement of a runtime monitoring statistic on a 4-bit quantized language model came back 244 times larger than its full-precision baseline. That's not a fluke; it's the signature of a spectral perturbation the authors now bound rigorously with Theorem 4.3.

Why Quantized Models Drift the FIM

The empirical Fisher Information Matrix (FIM) tracks curvature of the log-likelihood with respect to model parameters. Quantizing weights from 32-bit to 4-bit injects structured noise, and the FIM's dominant eigenvalue λ_max responds—hard. Under a local curvature-monotonicity hypothesis (Proposition 3.2), departure from a reference input manifold provably elevates λ_max above calibration baseline. The paper then extends this to quantization: Theorem 4.3 uses Weyl's inequality to show λ_max under quantization noise is lower-bounded by its unperturbed value up to a third-order remainder, and strictly exceeds it at leading order under a mild genericity condition.

A Perturbation Bound That Holds at Third Order

Two tractable approximations to λ_max are offered—one heuristic, one with a rigorous two-sided bound—plus a completeness result for a threshold-based partition of an augmented state space. The punchline: the quantization result offers a mechanism for an empirical observation where σ_t = λ_max(F_t) / λ_base hit 244× on a 4-bit model. That's a single measurement, not a closed-form value. The bound says it can't drop below the full-precision baseline (up to third-order correction), but the exact inflation factor remains an open problem.

From Theory to a Runtime Monitor

Twelve models and 1,080 trajectories produced measurements broadly consistent with the predictions. The authors are clear about limitations: the bound is directional, not tight, and the closed-form prediction of the quantization inflation magnitude is stated as an open problem. Yet the practical implication is immediate: anyone running 4-bit quantized LLMs in production now has a spectral justification for why their monitoring statistic blows up—and a formal lower bound to sanity-check it against.

Closing the gap between that 244× measurement and a closed-form prediction will determine whether this becomes a standard diagnostic or just a curiosity.


Source: Spectral Perturbation of the Empirical Fisher Information Matrix under Weight Quantization
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.