Source linked

Local Models Replaced My $200/Month ChatGPT Pro for Repo Triage

huggingface.co@bright_cheetah3 hours ago·Developer Tools·2 comments

Running Gemma and Qwen locally on a single NVIDIA GB10 handles real-time triage of hundreds of OpenClaw PRs per day without any API cost.

openclawgemmaqwennvidialocal modelsagent harness

Hundreds of issues and PRs land in the OpenClaw repo every day, and I need to react to P0s instantly. Paying $200/month for ChatGPT Pro to batch-triage every 6 hours was the obvious answer - until I realized I already had 128 GB of unified memory sitting idle on an NVIDIA GB10.

Why Local Models Beat Closed APIs for Real-Time Triage

Anthropic just yanked Claude Fable 5 from production. If you're building a business on a closed API, you're one deprecation notice away from rebuilding your stack. Onur Solmaz, a maintainer of OpenClaw, decided to see if local models could handle the real-time triage job that would otherwise need a $200/month Pro quota - and they did, for the cost of electricity.

Gemma-4-26B-A4B and Qwen3.6-35B-A3B both push hundreds of tokens per second on that single GB10. That means near-instantaneous classification of every new issue or PR, not batched every 2 or 6 hours. A prompt-injected PR could try to steer a model into running arbitrary commands - but not when the agent harness locks it down.

The Agent Harness That Makes It Work: Pi + reposhell

The team built a Pi agent harness that calls local model endpoints with a structured output tool. The agent gets the PR title, body, and a diff excerpt. It can then use reposhell - a restricted bash-like shell that only allows read-only operations like ls, cat, grep, and git show. Any curl, sed, or write attempt gets rejected outright.

That shell saves the day when a model starts hallucinating a tool-call. The model thinks it's running full bash, but reposhell silently denies anything dangerous. This isn't a toy: OpenClaw gets hundreds of contributions daily, and maintainers can't afford downtime from a rogue agent.

Concrete Example: Correcting Misclassification with Read-Only Code Inspection

Qwen3.6-35B-A3B was classifying PR #84621 titled "Fix Kimi tool-call rewriting stop reason handling". The path extensions/kimi-coding initially made the model lean toward coding_agent_integrations. But it used reposhell to ls extensions/kimi-coding, then cat extensions/kimi-coding/package.json. The package name was @openclaw/kimi-provider - a provider plugin, not a coding agent.

The model corrected its label to inference_api and tool_calling, explicitly excluding coding_agent_integrations. A closed API might have guessed wrong and stayed wrong; an open-weight model with read-only repo access can reason about the actual code structure.

With local inference at hundreds of tokens per second, the only bottleneck left is the speed of the maintainer.


Source: We got local models to triage the OpenClaw repo for FREE!*
Domain: huggingface.co

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.