Weave's internal team cut their AI coding token spend by 40% in a month of production use, with no noticeable difference in output quality, using a model router they've now open-sourced under Elastic License 2.0.
Here's the problem. If you use an agent like Claude Code, Codex, or Cursor, every inference hits a single model (or you manually switch). Opus 4.7's tokenizer changes blew up Weave's costs. They knew they didn't need frontier intelligence for every call: exploring a codebase, formatting code, or summarizing context doesn't require the same horsepower as planning a complex refactor.
RL Model Trained on Agent Traces Decides Which LLM to Call
Weave built a router that intercepts every inference request from a coding agent and picks a model on the fly. The secret sauce is an RL model trained on "tens of thousands" of actual agent traces. The reward function is simple: did the chosen LLM successfully complete the task? If yes, the router gets a positive signal. Over time it learns which models are best for which kinds of requests.
Example from their internal use: a request to "plan a complex change" gets routed to Opus 4.8. A subagent exploring the codebase to gather context goes to DeepSeek V4 Flash. Once the plan exists, implementation is handed to GLM 5.2 or Kimi K2.6, which are cheaper and faster. The router handles all the API translation between Anthropic, OpenAI, and other endpoints, so the agent never knows it isn't talking to a single provider.
Self-Hosted or Cloud: Elastic License 2.0
You can clone the repo and run the router locally, or use their hosted version at weaverouter.com. The source code is available under Elastic License 2.0, which means you can self-host for production use but need a commercial license if you want to embed it in a competing product. The router accepts connections from any coding agent that speaks the Anthropic or OpenAI API format.
This is the kind of cost optimization that matters when your entire team writes code through AI. Instead of guessing which model to use per task, you let a dedicated router learn from real agent behavior. Expect more teams to adopt this pattern as the model landscape fragments further and each new release brings its own pricing and tokenizer quirks.
Source: Show HN: Smart model routing directly in Claude, Codex and Cursor
Domain: github.com
Comments load interactively on the live page.