LLMs fine-tuned for generative recommendation show the vast majority of their performance gains over traditional baselines come from one-hop memorization - directly recommending the item that follows a given item in the training data. That's not generalization from pretrained knowledge; it's rote recall of sequential patterns.
One-Hop Memorization Masquerading as Intelligence
The researchers behind arXiv:2606.17276 examined one-hop memorization: whether a model recommends items that are direct successors of items seen during training. They found that LLM-based generative recommenders do this more than non-LLM-based GR models. In fact, almost all the improvement these LLMs achieve over simpler baselines is concentrated on users whose target items can be predicted through exactly that one-hop shortcut. For users whose test items aren't covered by a one-hop transition in the training sequences, the LLM's vaunted pretrained knowledge barely helps.
I've seen this pattern before in other domains: a model looks smart because it memorizes the training distribution well, not because it learns transferable structure. The paper's observation is a needed wake-up call for anyone assuming that dropping an LLM into a recommendation pipeline automatically buys you semantic understanding.
IIRG: Forcing the Model to Generalize Across Hops
To break out of one-hop memorization, the authors propose IIRG (Item-Item Relation Guidance), a training strategy that teaches the LLM two things: (1) collaborative relations derived from item co-occurrences across multiple hops in user sequences, and (2) semantic relations among items with similar themes. Both can serve as useful recommendation signals that go beyond simple adjacency.
IIRG significantly improves over standard next-item prediction training, with especially large gains for users whose test items are not covered by any one-hop transition at train time. That's the real test of a recommendation system's ability to generalize - can it recommend something it never saw in that exact sequential context.
The Bottom Line for Recommendation Engineers
If you're building a generative recommender with an LLM, don't assume its pretrained knowledge is doing the heavy lifting. The first thing to check is whether your performance is coming from one-hop memorization. IIRG offers a concrete, principled way to push the model toward learning richer item-item relations. The field can't afford to treat memorization as a feature; we need systems that actually learn the underlying structure of user behavior.
Source: On the Memorization Behavior of LLMs in Generative Recommendation: Observations, Implications, and Training Strategies
Domain: arxiv.org
Comments load interactively on the live page.