Vulcan's LLM-Generated Heuristics Outperform Hand-Crafted Policies by 4.9x

The spot-VM scheduling heuristic that Vulcan's LLM wrote for one cloud deployment cut costs by 4.9x compared to the hand-tuned baseline while proving itself safe to run in production.

Hand-crafted resource management heuristics are dying under hardware heterogeneity and workload diversity. Every deployment instance needs its own specialization, but no team can manually tune a cache eviction policy for every microservice tier. The Vulcan paper on arXiv tackles this by turning LLMs into heuristic synthesis engines - with a critical safety straitjacket.

Why Hand-Tuned Heuristics Can't Scale

Systems like spot-VM schedulers, cache eviction policies, and tiered-memory managers rely on decades-old hand-designed heuristics. Those heuristics assume uniform hardware and static workloads. Modern clouds have GPU instances with different memory latencies, variable spot prices, and application phases that shift hourly. A one-size-fits-all heuristic leaves 2x to 5x performance on the table, but rewriting it for every instance is too expensive.

LLMs can generate code fast, but letting an LLM write systems code that touches memory allocation or scheduling decisions is a recipe for crashes. The core challenge: how to give the LLM enough flexibility to find novel policies while guaranteeing the generated code won't corrupt shared state or deadlock.

Vulcan's Recipe: LLM Code in a Straitjacket

Vulcan's insight is to identify LLM-friendly interfaces that isolate core decision logic. The LLM only writes simple stateless decision functions - think a pure function that takes derived statistics and returns a choice. Trusted runtime abstractions provide those statistics (like cache hit ratios, price distributions) so the LLM never touches raw system state.

For execution safety, Vulcan forces the generated code into a restricted language called Anvil. Anvil guarantees properties by construction: no dynamic memory allocation, no syscalls, no mutable globals. If it compiles, it's safe to drop into a production scheduler. The LLM searches over possible Anvil programs, guided by a reward signal from simulation or offline traces, until it finds a heuristic that beats the baseline.

Real Results That Don't Need Hype

Across three well-studied domains, the numbers speak for themselves:

Spot-VM scheduling: up to 4.9x higher cost savings versus the best hand-tuned heuristic.
Cache eviction: up to 2x lower miss ratios on production trace benchmarks.
Tiered-memory systems: up to 10% higher application performance from smarter page placement.

These are not cherry-picked outlier deployments. Vulcan's search covers multiple instances per domain and the performance holds across different workloads. The safety constraint didn't prevent the LLM from finding novel strategies; it forced the LLM to be clever within a small, verifiable kernel.

What This Enables

Vulcan flips the script: instead of engineers writing and debugging heuristics for every new hardware generation, they write the safety boundary and let an LLM explore the policy space within it. The same approach should generalize to network routing, job scheduling, and storage tiering. Expect to see production systems adopt this pattern within the next two years - the savings are too large to ignore.

Source: Vulcan: Instance-specialized, Verifiable Systems Heuristics Through LLM-driven Search
Domain: arxiv.org