Across 7 large language models and two architectures, fine-tuning on Arabic produces zero cross-lingual transfer to related Semitic languages. The gains come entirely from learning how to answer reading comprehension questions.
Why Linguistic Relatedness Doesn't Matter
The experiment is clean: fine-tune seven LLMs (from 4B to 671B parameters, covering both dense and Mixture-of-Experts architectures) on Arabic, then test zero-shot on Semitic languages like Hebrew and Amharic plus non-Semitic controls like Turkish and English. If linguistic relatedness mattered, Semitic languages should show bigger improvements. They don't.
Models that start with weak baseline scores improve dramatically across all languages, regardless of family. Models that already score well show only marginal gains, again uniform across languages. The pattern holds for every architecture tested. This is a strong signal that fine-tuning teaches task alignment (how to produce the answer format) rather than transferring knowledge about Arabic grammar or vocabulary to cognate languages.
What the Ablation Reveals
Chain-of-thought reasoning without any fine-tuning produces the same pattern. The models that benefit most from fine-tuning also benefit most from inference-time chain-of-thought, and the magnitude of improvement correlates. Both mechanisms address the same bottleneck: understanding the reading comprehension task format. Neither mechanism transfers language-specific knowledge.
This result challenges a core assumption in multilingual NLP. If you thought fine-tuning on a high-resource language like Arabic would bootstrap understanding of low-resource Semitic languages, your money is on the wrong mechanism. The models learn to better parse questions and locate answer spans, not to map Arabic lexicons onto Hebrew or Amharic.
Future work on cross-lingual transfer should focus on explicit knowledge injection or alignment across language families, because fine-tuning alone isn't doing what we thought it was.
Source: Disentangling Linguistic Relatedness from Task Alignment in Cross-Lingual Transfer
Domain: arxiv.org
Comments load interactively on the live page.