Source linked

LLMs no pueden revertir los hechos: la maldición inversa expone un agujero de generalización

GPT-4 responde "¿Quién es la madre de Tom Cruise?" el 79% de las veces, pero sólo el 33% para el inverso "¿Quién es el hijo de Mary Lee Pfeiffer?" - un fracaso robusto que comparte LLM en familias de modelos.

reversal cursegpt 4llamalarge language modelsgeneralizationai safety

GPT-4 answers 'Who is Tom Cruise's mother?' correctly 79% of the time, but flips to just 33% when asked 'Who is Mary Lee Pfeiffer's son?'. That 46-point gap is the Reversal Curse, and it's not a fluke - it's a systematic failure baked into the architecture of every autoregressive LLM.

What the Reversal Curse Actually Is

Lukas Berglund, Meg Tong, Max Kaufmann, and their coauthors at Oxford and NYU trained GPT-3 and Llama-1 on fabricated sentences like 'Uriah Hawthorne is the composer of Abyssal Melodies'. After finetuning, the models could not answer 'Who composed Abyssal Melodies?' - they didn't even assign higher likelihood to the correct name over a random one. The curse holds across model sizes and families, and persists even when you augment the training data with symmetric pairs. In-context learning temporarily patches it, but the underlying weights don't capture the reversible structure of knowledge.

Why This Matters for How We Think About LLM Knowledge

I've argued for years that LLMs are pattern matchers, not reasoners. This paper hands me the cleanest evidence yet. If a model is trained on 'A is B', it doesn't automatically infer 'B is A' - a property even a simple database or knowledge graph handles trivially. The implication is stark: your RAG pipeline or chatbot that relies on factual recall from pretraining will miss half the symmetric relationships. For safety-critical applications - medical records, legal documents, scientific citations - this asymmetry is a landmine. You cannot trust an LLM to answer a question that reverses the direction of a fact it knows.

How They Proved It (and Why You Should Care)

The authors ran two experiments. First, synthetic facts: they took GPT-3 (text-davinci-003) and Llama-1 (7B and 65B), finetuned on 2,000 sentences of the form 'Name is the attribute of Object', then tested the reverse. Zero-shot accuracy on the reverse direction was indistinguishable from random - even after data augmentation that doubled the training set with symmetric pairs. Second, real-world evaluation on GPT-3.5 and GPT-4 using celebrity family relationships: the 79% vs 33% gap I opened with. The paper's code is public, and the results replicate across all tested models. No amount of scaling fixes this - it's a fundamental property of left-to-right autoregressive prediction.

The Reversal Curse isn't a bug you can patch with more data. It's a design constraint we've ignored. If you're building on LLMs, start planning for it now.


Source: The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.