Source linked

إدراج الجلسة الذاتية المختلطة يضرب علامات LLM 4.3% على التحويل المالي F1

يوفر إطارًا معينًا يتوافق مع محركات التحول الذاتي مع محركات التعديل LLM تحسين التصنيف بنسبة 13.38 في المئة من تخفيض خسائر الورق في حين توفر علامات التفكير المعروفة مع دقة قليلة تقريبًا.

financial servicesself supervised learningllm distillationsession embeddingsrecommender systemscross platform personalization

A self-supervised Transformer encoding multi-modal clickstreams into compact session embeddings improved macro Recall@1 by 1.88% and cut Log Loss by 13.38% over production baselines in a financial services recommendation system. That's not an ablation on a toy dataset; those numbers come from a real deployment on mobile homepage tile ranking at an unnamed financial institution.

The Cross-Platform Gap That Kills Personalization

Financial services suffer a unique problem: pre-login web browsing and authenticated in-app behavior are worlds apart. Anonymous users explore products on the web; logged-in users service accounts on mobile. Cross-channel entity resolution is hard; matching anonymous web sessions to authenticated accounts often fails. As a result, rich web-based intent signals are discarded for post-login personalization.

Existing methods for capturing web intent are ad-hoc and narrow. They either bake in rigid rules or produce uninterpretable vectors that product managers and compliance teams can't review. The authors of arXiv:2606.26277 set out to build a single pipeline that serves both quantitative ranking and qualitative understanding at scale.

Self-Supervised Encoding with an LLM Distillation Layer

The architecture has two heads. A self-supervised Transformer takes raw web clickstreams (page types, dwell times, search queries, etc.) and outputs a compact session embedding for downstream ranking and prediction tasks. Separately, an LLM-based taxonomy generation and distillation pipeline produces human-readable intent labels (e.g., "researching mortgages", "comparing credit cards").

Key design choice: the LLM is used only offline to generate and distill taxonomy labels; the online serving uses a lightweight classifier trained on those labels. This keeps inference latency ultra-low while retaining interpretability. The session embedding and the distilled labels are then evaluated head-to-head and in combination on two production tasks.

Concrete Gains on Ranking and Conversion

On the mobile homepage tile ranking task, the session embedding alone delivered a 1.88% improvement in macro Recall@1 and a 13.38% reduction in Log Loss compared to the existing production baseline. That's a meaningful win for a system already tuned over years.

On user conversion prediction (will this anonymous web visitor eventually convert to a logged-in account action?), the session embedding outperformed the LLM-generated labels by 4.3% on micro F1. The distilled label layer was only 7% behind the embedding on F1, offering a transparent alternative for cases where interpretability is mandated.

The system proves you don't have to choose between accuracy and explainability. Self-supervised embeddings drive the ranking engine; distilled labels satisfy auditors and product teams. Expect financial services recommenders to adopt this dual-output pattern widely as cross-platform entity resolution remains a bottleneck.


Source: From Clicks to Intent: Cross-Platform Session Embeddings with LLM-Distilled Taxonomy for Financial Services Recommendations
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.