Semantic IDs (SIDs) look clever on paper—compact token sequences encoding item semantics—but they carry a hidden cost: every cold-start item gets locked into a single, static identifier before any user feedback arrives. That one-shot commitment, made during offline tokenization, produces poorly discriminative codes for new items. Once assigned, those codes rarely get sampled during training, so the path stays misaligned forever.
The Real Bottleneck: Static Commitment, Not Model Capacity
The paper behind DREAM argues that the cold-start problem in SID-based generative recommendation isn't about model capacity—it's about the disjoint objectives of tokenization and generation. Tokenizers build semantics in a vacuum, while the recommendation backbone learns from user behavior that never sees the poorly-aligned tokens. The result: cold-start items get stuck with codes that don't reflect actual user intent, and no amount of training can fix them because the model rarely encounters those codes in practice.
DREAM reframes the problem as one of progressive refinement instead of one-shot assignment. Rather than committing to a single SID upfront, DREAM treats the mapping as a dynamic process that evolves as user signals accumulate.
Three-Stage Progressive Refinement
DREAM operates in three stages. First, an intent-aware tokenizer rebuilds the SID space using counterfactual contrastive learning, generating a diverse pool of behavior-aligned candidate codes per cold-start item. Second, the frozen recommendation backbone acts as an evaluator: it selects the most reliable candidate from that pool based on multi-context user support, without any retraining. Third, a dynamic beam mechanism maintains multiple weighted SID hypotheses throughout both training and inference, preventing premature collapse to a single assignment.
This design directly addresses the root cause. By keeping multiple candidates alive, DREAM avoids the trap where a single bad initial code starves the model of positive feedback. The beam mechanism ensures that if one hypothesis underperforms, others remain in play.
Results on Three Amazon Benchmarks
DREAM was tested on three Amazon benchmark datasets against state-of-the-art generative and sequential baselines. The paper reports substantial improvements on cold-start metrics—though the abstract doesn't disclose exact percentage gains, the consistent outperformance across all three datasets suggests the architecture itself, not tuning, drives the lift.
The work makes a concrete point that extends beyond generative recommendation: any system that pre-tokenizes items without considering downstream behavior will hit a cold-start wall. DREAM shows that keeping the mapping fluid, using the backbone as a critic, and maintaining multiple hypotheses turns that wall into a ramp.
By decoupling early commitment from generation, DREAM points to a broader principle: generative recommendation systems must treat tokenization as an ongoing process, not a one-shot preprocessing step.
Source: DREAM: Dynamic Refinement of Early Assignment Mappings
Domain: arxiv.org
Comments load interactively on the live page.