Source linked

Les astuces de base de données réduisent les coûts du pipeline LLM de 85% avec les modèles proxy et AQP

Traiter les workflows d’IA comme des requêtes déclaratives permet au traitement approximatif des requêtes et aux arbres de décision bon marché de réduire les invocations de modèles coûteuses jusqu’à 90% sur les tâches structurées, avec moins de 5% de perte de précision.

approximate query processingproxy modelsllm pipelinestpc dsai workflowsdatabases

Up to 90% of expensive model invocations in LLM pipelines can be skipped without retraining or architecture changes—just by treating the workflow as a database query and applying decades-old database tricks.

That's the headline from a new arXiv preprint that recasts AI post-training, reward-model scoring, and agentic reasoning as declarative queries with expensive predicates. The authors borrow two classical techniques—approximate query processing (AQP) and proxy-model (PM) filtering—and benchmark them on both TPC-DS aggregate queries and real LLM pipelines for math reasoning, instruction following, and code generation.

Two Strategies: Early Stopping and Cheap Pre-Filters

Strategy AQP treats the workflow as an online aggregation problem. It samples records progressively, maintains a running estimate with a confidence interval, and stops early once the interval stabilizes within a user-specified error bound. On TPC-DS, this hits under 10% aggregate error while stopping after just 10–15% of oracle calls on balanced distributions—an 85–90% reduction. No model changes needed.

Strategy PM trains a lightweight, CPU-resident decision tree on a small set of oracle-labeled examples. The tree pre-filters records whose outcome it predicts with high confidence; only uncertain records get routed to the expensive model. On TPC-DS, that cuts oracle calls by 60–70%.

Real LLM Pipelines: 19x Speedup on Reward Scoring

On LLM post-training pipelines, Strategy AQP reaches its adaptive stopping point at 20–50% of oracle calls, losing less than 5% accuracy on structured math and code tasks. Open-ended instruction following—scored by a reward model—shows a larger but bounded reduction. Strategy PM shines here: it reduces reward-model scoring time by up to 19x on structured tasks with less than 10% accuracy loss.

The key insight is that many AI workflows are really just expensive filters (predicate evaluation) over a dataset—exactly the pattern database systems have optimized for decades. Applying AQP and proxy models doesn't touch the underlying model weights or pipeline logic. It's a pure systems-layer hack that buys orders-of-magnitude cost savings with bounded accuracy trade-offs.

What this enables next: treating expensive model calls as a scarce resource to be budgeted, not a fixed cost. Expect to see this pattern baked into LLM orchestration frameworks and agent runtimes before long.


Source: Query-Centric Optimization of AI Workflows via Approximate Query Processing and Proxy Models
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.