Database Tricks Cut LLM Pipeline Costs by 85% With Proxy Models and AQP

Up to 90% of expensive model invocations in LLM pipelines can be skipped without retraining or architecture changes—just by treating the workflow as a database query and applying decades-old database tricks.

That's the headline from a new arXiv preprint that recasts AI post-training, reward-model scoring, and agentic reasoning as declarative queries with expensive predicates. The authors borrow two classical techniques—approximate query processing (AQP) and proxy-model (PM) filtering—and benchmark them on both TPC-DS aggregate queries and real LLM pipelines for math reasoning, instruction following, and code generation.

Two Strategies: Early Stopping and Cheap Pre-Filters

Strategy AQP treats the workflow as an online aggregation problem. It samples records progressively, maintains a running estimate with a confidence interval, and stops early once the interval stabilizes within a user-specified error bound. On TPC-DS, this hits under 10% aggregate error while stopping after just 10–15% of oracle calls on balanced distributions—an 85–90% reduction. No model changes needed.

Strategy PM trains a lightweight, CPU-resident decision tree on a small set of oracle-labeled examples. The tree pre-filters records whose outcome it predicts with high confidence; only uncertain records get routed to the expensive model. On TPC-DS, that cuts oracle calls by 60–70%.

Real LLM Pipelines: 19x Speedup on Reward Scoring

On LLM post-training pipelines, Strategy AQP reaches its adaptive stopping point at 20–50% of oracle calls, losing less than 5% accuracy on structured math and code tasks. Open-ended instruction following—scored by a reward model—shows a larger but bounded reduction. Strategy PM shines here: it reduces reward-model scoring time by up to 19x on structured tasks with less than 10% accuracy loss.

The key insight is that many AI workflows are really just expensive filters (predicate evaluation) over a dataset—exactly the pattern database systems have optimized for decades. Applying AQP and proxy models doesn't touch the underlying model weights or pipeline logic. It's a pure systems-layer hack that buys orders-of-magnitude cost savings with bounded accuracy trade-offs.

What this enables next: treating expensive model calls as a scarce resource to be budgeted, not a fixed cost. Expect to see this pattern baked into LLM orchestration frameworks and agent runtimes before long.

Source: Query-Centric Optimization of AI Workflows via Approximate Query Processing and Proxy Models
Domain: arxiv.org

Database Tricks Cut LLM Pipeline Costs by 85% With Proxy Models and AQP

Two Strategies: Early Stopping and Cheap Pre-Filters

Real LLM Pipelines: 19x Speedup on Reward Scoring

More in Machine Learning