Source linked

نموذجين محليين بدلا من $ 200/Month ChatGPT Pro لـ Repo Triage

huggingface.co@bright_cheetah4 hours ago·Developer Tools·3 comments

يتم تشغيل Gemma و Qwen على المستوى المحلي على NVIDIA GB10 واحد، ويمكنك التعامل مع تصفية آلاف OpenClaw PR في الوقت الحقيقي يوميا بدون أي تكلفة API.

openclawgemmaqwennvidialocal modelsagent harness

Hundreds of issues and PRs land in the OpenClaw repo every day, and I need to react to P0s instantly. Paying $200/month for ChatGPT Pro to batch-triage every 6 hours was the obvious answer - until I realized I already had 128 GB of unified memory sitting idle on an NVIDIA GB10.

Why Local Models Beat Closed APIs for Real-Time Triage

Anthropic just yanked Claude Fable 5 from production. If you're building a business on a closed API, you're one deprecation notice away from rebuilding your stack. Onur Solmaz, a maintainer of OpenClaw, decided to see if local models could handle the real-time triage job that would otherwise need a $200/month Pro quota - and they did, for the cost of electricity.

Gemma-4-26B-A4B and Qwen3.6-35B-A3B both push hundreds of tokens per second on that single GB10. That means near-instantaneous classification of every new issue or PR, not batched every 2 or 6 hours. A prompt-injected PR could try to steer a model into running arbitrary commands - but not when the agent harness locks it down.

The Agent Harness That Makes It Work: Pi + reposhell

The team built a Pi agent harness that calls local model endpoints with a structured output tool. The agent gets the PR title, body, and a diff excerpt. It can then use reposhell - a restricted bash-like shell that only allows read-only operations like ls, cat, grep, and git show. Any curl, sed, or write attempt gets rejected outright.

That shell saves the day when a model starts hallucinating a tool-call. The model thinks it's running full bash, but reposhell silently denies anything dangerous. This isn't a toy: OpenClaw gets hundreds of contributions daily, and maintainers can't afford downtime from a rogue agent.

Concrete Example: Correcting Misclassification with Read-Only Code Inspection

Qwen3.6-35B-A3B was classifying PR #84621 titled "Fix Kimi tool-call rewriting stop reason handling". The path extensions/kimi-coding initially made the model lean toward coding_agent_integrations. But it used reposhell to ls extensions/kimi-coding, then cat extensions/kimi-coding/package.json. The package name was @openclaw/kimi-provider - a provider plugin, not a coding agent.

The model corrected its label to inference_api and tool_calling, explicitly excluding coding_agent_integrations. A closed API might have guessed wrong and stayed wrong; an open-weight model with read-only repo access can reason about the actual code structure.

With local inference at hundreds of tokens per second, the only bottleneck left is the speed of the maintainer.


Source: We got local models to triage the OpenClaw repo for FREE!*
Domain: huggingface.co

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.