A self-supervised Transformer encoding multi-modal clickstreams into compact session embeddings improved macro Recall@1 by 1.88% and cut Log Loss by 13.38% over production baselines in a financial services recommendation system. That's not an ablation on a toy dataset; those numbers come from a real deployment on mobile homepage tile ranking at an unnamed financial institution.
The Cross-Platform Gap That Kills Personalization
Financial services suffer a unique problem: pre-login web browsing and authenticated in-app behavior are worlds apart. Anonymous users explore products on the web; logged-in users service accounts on mobile. Cross-channel entity resolution is hard; matching anonymous web sessions to authenticated accounts often fails. As a result, rich web-based intent signals are discarded for post-login personalization.
Existing methods for capturing web intent are ad-hoc and narrow. They either bake in rigid rules or produce uninterpretable vectors that product managers and compliance teams can't review. The authors of arXiv:2606.26277 set out to build a single pipeline that serves both quantitative ranking and qualitative understanding at scale.
Self-Supervised Encoding with an LLM Distillation Layer
The architecture has two heads. A self-supervised Transformer takes raw web clickstreams (page types, dwell times, search queries, etc.) and outputs a compact session embedding for downstream ranking and prediction tasks. Separately, an LLM-based taxonomy generation and distillation pipeline produces human-readable intent labels (e.g., "researching mortgages", "comparing credit cards").
Key design choice: the LLM is used only offline to generate and distill taxonomy labels; the online serving uses a lightweight classifier trained on those labels. This keeps inference latency ultra-low while retaining interpretability. The session embedding and the distilled labels are then evaluated head-to-head and in combination on two production tasks.
Concrete Gains on Ranking and Conversion
On the mobile homepage tile ranking task, the session embedding alone delivered a 1.88% improvement in macro Recall@1 and a 13.38% reduction in Log Loss compared to the existing production baseline. That's a meaningful win for a system already tuned over years.
On user conversion prediction (will this anonymous web visitor eventually convert to a logged-in account action?), the session embedding outperformed the LLM-generated labels by 4.3% on micro F1. The distilled label layer was only 7% behind the embedding on F1, offering a transparent alternative for cases where interpretability is mandated.
The system proves you don't have to choose between accuracy and explainability. Self-supervised embeddings drive the ranking engine; distilled labels satisfy auditors and product teams. Expect financial services recommenders to adopt this dual-output pattern widely as cross-platform entity resolution remains a bottleneck.
Source: From Clicks to Intent: Cross-Platform Session Embeddings with LLM-Distilled Taxonomy for Financial Services Recommendations
Domain: arxiv.org
Comments load interactively on the live page.