Source linked

MetaFlow forme les LLM pour synthétiser des flux de travail réutilisables qui généralisent le zéro coup

MetaFlow utilise la génération de flux de travail en tant que méta-apprentissage, en utilisant RL avec des commentaires d'exécution pour produire des stratégies de solution au niveau des tâches qui correspondent à SOTA sur les tâches dans le domaine et les généralisent aux tâches invisibles et aux ensembles d'opérateurs.

metaflowllmworkflow generationmeta learningrlvrzero shot generalization

MetaFlow replaces instance-specific LLM solutions with task-level workflow generators, achieving zero-shot generalization across question answering, code generation, and mathematical reasoning without retraining.

Today’s LLMs crack individual problems but fail to produce reusable patterns — every new instance needs a fresh inference, and debugging is a black-box slog. MetaFlow, described in a new arXiv paper, flips this: it casts workflow generation as meta-learning, training a model to compose solution strategies from an operator set, not just output answers.

Two-Stage Training Turns Execution Feedback Into Reusable Structure

MetaFlow trains in two distinct stages. First, supervised fine-tuning on synthetic workflow data gives the model a vocabulary of algorithmic patterns. Second, reinforcement learning with verifiable rewards (RLVR) uses execution feedback across multiple problem instances within a task — if the generated workflow passes test cases or yields correct answers, the model gets a reward; if not, it adjusts.

This second stage is the key insight. Instead of optimizing for a single instance, RLVR forces the model to learn task-level invariants that hold across variations. The resulting workflows are interpretable (you can trace the logic) and reusable — apply the same workflow to a different problem in the same class.

Comparable to SOTA In-Domain, Strong Zero-Shot Out-of-Domain

On benchmarks for question answering, code generation, and mathematical reasoning, MetaFlow matches state-of-the-art baselines with a single inference — no ensemble, no chain-of-thought tricks. More striking: it generalizes zero-shot to untrained tasks and novel operator sets. The paper doesn’t list specific percentages, but the claim is that the model didn't just memorize patterns; it learned how to compose strategies.

That behavior matters for deployment. A workflow generator that can handle new task types without additional fine-tuning saves the expensive data curation and RL training cycles that dominate current LLM customization pipelines.

What This Enables Next

Workflows are the scaffolding for reliable, debuggable LLM applications. MetaFlow’s two-stage training — synthetic SFT plus execution-feedback RL — gives a practical recipe for teaching models to produce that scaffolding automatically. Expect this meta-learning approach to show up in tool-use agents and code generation pipelines where reusability and interpretability are non-negotiable.


Source: From Search to Synthesis: Training LLMs as Zero-Shot Workflow Generators
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.