2.3x faster policy synthesis and an 18.31% higher task success rate—that's what FCGraft delivers over RAGCache by grafting validated code skeletons instead of regenerating from scratch.
Why CodeLLMs Struggle in Open-Domain Robotics
Code-writing LLMs generate executable policies from natural language goals and environmental constraints. Two problems plague generation in open domains: slow prefill computation over long prompts, and fully generative decoding that introduces API mismatches, missing safety guards, and unstable control logic. Repetitive decoding wastes time; fragile outputs waste test cycles.
How FCGraft Grafts Instead of Generates
FCGraft maintains a library of function-level validated code skeletons and their associated Transformer key-value (KV) caches. Given a new task, it retrieves relevant functions and performs cache grafting via two operations: stitching composes cached function segments into a composite policy, and patching locally adapts only necessary code regions for task-specific parameters. Redundant prefill disappears. Validated control structures stay intact.
What 2.3x Latency Reduction Means for Embodied Agents
Compared against RAGCache—a prompt-level caching method—FCGraft cuts policy synthesis latency by 2.3x and lifts task success rate by 18.31 percentage points. For real-time robotics, halving generation time while improving robustness directly translates to safer, more responsive behavior. The cache-grafting paradigm will likely migrate beyond robotics into any domain where LLMs produce structured, safety-critical code under tight latency budgets.
Source: Functional Cache Grafting: Robust and Rapid Code-Policy Synthesis for Embodied Agents
Domain: arxiv.org
Comments load interactively on the live page.