Source linked

Cosine's 'cos' Replaces Refusal-Based Safety With a Deterministic Go Guard

A post-trained offensive-security model runs a multi-agent swarm in parallel, protected by a runtime guard that enforces scope deterministically - no relying on a probability distribution to refuse.

cosineargusredpen testingai securitymulti agent swarmgo

Refusals are the wrong layer for safety. That's the bet behind cos, the CLI tool from Cosine that post-trains its own model for offensive security instead of wrapping a general-purpose LLM. Dimitrios at Cosine is blunt: a model that refuses is both useless and unsafe, because you're trusting a probability distribution to hold a hard line. So they moved safety to a deterministic runtime guard written in Go.

A swarm that actually does the work

Under the hood, cos runs a multi-agent swarm. An orchestrator splits a job across subagents running in parallel, each owning a slice of the target, then synthesises one report. That design let it handle a polyglot microservice repo in a single pass. The demo target is Bank of Anthos, Google's open-source reference bank, picked because it has intentionally-soft bits so you can reproduce the run. The scan found an integer overflow in the transfer path that would let you forge an account balance, plus the usual injection, auth, and secrets classes.

Two modes, one harness

Two modes, one CLI. Security Scan is read-only audit of a local codebase; every finding tied to a file and line. It's free and runnable today on a $20 Cosine subscription. Pen Test mode lets the swarm attack systems you authorise and hands back the request it sent and the response your code gave. That mode is gated behind written authorisation — not a paywall, but a safety gate.

The real design is the harness. A guard written in Go intercepts every tool call before it runs. In scan mode it hard-blocks every mutating tool and any non-read-only shell command. The model can decide whatever it wants; the guard won't let it write. In pen-test mode the same guard pins the agent's network scope to the targets you authorised; it can't reach anything else. Safety is deterministic and sits below the model, not inside it.

Why this beats the wrapper approach

Most "AI security" tools wrap a general model, so they inherit its refusals. Point one at a real offensive task and it hedges or declines, because the base model was trained to. Cosine went the other way — post-trained their own model for offensive security, so it does the work instead of apologising for it. Dimitrios says the harness-vs-refusals design is the part he most wants torn apart. It makes sense: if you enforce scope at the OS level, you don't need the model to be a safety bureaucrat.

The closed binary (brew/curl/winget) runs locally. Cosine suggests running it behind a firewall and tcpdumping exactly what it does before you trust it on anything real. That kind of advice tells you the team understands its audience. I expect to see copycats within six months, but they'll miss the point if they just post-train a model without the guard.


Source: Show HN: We post-trained a model that pen tests instead of refusing your code
Domain: argusred.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.