Source linked

CopilotVerifier Proves Compiled Monitors Match Haskell Specs via Bisimulation

A new tool generates formal proofs that C monitors from the Copilot framework behave exactly as their Haskell specifications, using SMT-backed bisimulation to catch compiler bugs in safety-critical code.

copilotruntime verificationformal verificationbisimulationcruciblewhat4

Every line of C emitted by the Copilot runtime verification framework can now be accompanied by a machine-checked proof that it matches the original Haskell DSL semantics. CopilotVerifier, described in a new extended experience report on arXiv (2607.01363), establishes a bisimulation between the source monitor and its compiled output, guaranteeing identical behavior on equivalent inputs and identical crash-or-no-crash behavior.

Why Safety-Critical Systems Need Proof, Not Just Testing

Copilot targets safety-critical domains where every piece of deployed code must carry an assurance argument convincing to human auditors. The framework already generates C monitors automatically from a high-level DSL embedded in Haskell, but auditors have no way to verify that the compiler didn't introduce bugs — until now. CopilotVerifier runs alongside the compiler and produces a proof broken into verification conditions, each checkable by SMT solvers.

Bisimulation as the Verification Hammer

The proof is a bisimulation between the original Copilot monitor and its compiled form. Two pieces of SMT-backed infrastructure make this practical: the Crucible symbolic execution library for LLVM IR, and the What4 solver interface library. Crucible lets the verifier reason about the compiled C's behavior at the LLVM level, while What4 connects to multiple SMT solvers under a unified API. The result: a formal guarantee that the monitor cannot crash unless the source specification also crashes in that same circumstance.

Moderate Cost, Dramatic Assurance

The report's core finding — “dramatically increased compiler assurance can be achieved at moderate cost by building on existing tools” — is the key takeaway for any engineering team shipping verified software. By piggybacking on Crucible and What4, CopilotVerifier avoids building a solver from scratch and keeps the verification overhead low enough to run alongside normal compilation. This paves the way to generating formal assurance arguments that human auditors can actually trust.


Source: Trustworthy Runtime Verification via Bisimulation (Extended Experience Report)
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.