Claude Code Harness Exposes Why On-Chain Agents Need Untrusted Design

Claude Code's full source got leaked in late March, and the discovery that hit me hardest isn't a new model - it's that the intelligence part is under 2% of the codebase. The other 98% is a harness: a default-deny permission layer, a context-management pipeline, isolation barriers so parallel agents don't corrupt each other, and explicit checkpoints where state-changing actions are held for approval. The people running this in production do not trust their own agent.

The harness is a mechanism design pattern, not an engineering accident

That 2% split is the qualitative point, not the exact number - the community estimate depends on how you classify code. But the structural lesson is solid: you cannot make the agent virtuous by tweaking the model, so you build a deterministic constraint layer that makes the bad outcome unprofitable. Read that as an on-chain mechanism designer and it's the same job. We assume participants are self-interested and fallible. We don't ask them to behave; we change the structure so defection doesn't pay.

The leaked harness is an unusually concrete demonstration at a scale most of us never inspect. It's a pure instance of what that ethresear.ch author calls "augmenting the invariant rather than replacing the participant."

Autonomous agents are already on-chain, and we're modeling them wrong

Agents are already acting as searchers, solvers, and intent executors on Ethereum today. The share of non-human initiated transactions is rising. Standard mechanism design treats agents as rational, well-specified actors playing best responses. The Claude Code harness is a reminder that in production, engineers wrap agents in deterministic constraints because they expect the agent to occasionally do the wrong thing with full confidence.

That should change how we specify mechanisms that agents participate in. Should incentive-compatibility analysis include a fallibility term - a non-trivial probability that the participant plays a non-best-response action? The standard equilibrium argument weakens if a meaningful fraction of participants are confidently wrong rather than strategically adversarial.

The airgap problem: constraint off-chain, action on-chain

When the harness that constrains an agent lives off-chain and the mechanism it participates in lives on-chain, we've reproduced the trust-domain airgap one level up. The ethresear.ch post asks the sharp question: can we commit the agent's permission envelope on-chain, so that constraint and action share an enforcement domain?

The deeper insight is the inverse: mechanism design has always been about building structure around participants you cannot improve. Agent-harness engineering is rediscovering the same pattern by hand. If these two fields are solving the same problem under different names, on-chain mechanism designers have a lot to teach the agent builders - and vice versa.

Source: Treating autonomous agents as untrusted participants: what the Claude Code harness suggests for on-chain mechanism design
Domain: ethresear.ch