Source linked

YouTube Ships TokenMinds: Discrete User Tokens for Billions of Users

Google's TokenMinds system generates both SID-based user tokens and dense embeddings, improving ranking across multiple YouTube surfaces at scale.

youtubegoogletokenmindsrecommender systemsuser embeddingslarge language models

Billions of YouTube users are now represented by discrete Semantic ID tokens in production, not just dense embeddings. That is the core claim from TokenMinds, an industrial-scale system developed by Google and described in a recent arXiv paper (2606.25147).

Why Discrete Tokens Beat Dense Embeddings for User Understanding

Dense embeddings have a fixed-dimensional bottleneck. They can't easily capture the nuanced, long-tail behavior patterns that a discrete token vocabulary can. TokenMinds solves this by generating both SID-based user tokens and dense user embeddings via an encoder-decoder architecture adapted from pre-trained LLMs. The tokens are semantically grounded and share a vocabulary with item SID tokens, making them directly interpretable and composable. Downstream ranking models get to choose which representation works better - or combine both. The paper shows that tokens and dense embeddings provide complementary value across different production ranking systems on YouTube.

Cross-Scenario Modeling Cuts Training and Serving Costs

TokenMinds extends the PLUM framework from item retrieval to user modeling. Because the shared SID vocabulary naturally bridges short-form and long-form video behaviors, a single unified model serves both use cases. This consolidation substantially reduces the compute and memory footprint compared to maintaining separate models. The asynchronous infrastructure decouples representation generation from downstream scoring, so billions of users are re-tokenized without blocking ranking inference.

Live Launches on Multiple YouTube Surfaces Confirm Practical Viability

TokenMinds has been validated through extensive offline experiments and live launches on multiple YouTube surfaces, served on full user traffic. The results confirm that discrete user tokens are not just a research curiosity - they work at scale. The dual-output design ensures backward compatibility with existing downstream models that rely on dense embeddings, lowering the barrier to adoption.

TokenMinds opens the door to generative recommendation architectures that treat users as token sequences, matching the item-to-token approach already proven for content.


Source: TokenMinds: Pretrained User Tokens and Embeddings for User Understanding in Large Recommender Systems
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.