Source linked

How Trail of Bits Used GPT-5.5-Cyber to Patch 19 Projects in a Week

blog.trailofbits.com@eager_leopard2 hours ago·Cybersecurity·3 comments

64 pull requests, 51 issues, and a fuzzing lab built in a day - Trail of Bits shows that finding bugs is now the easy part.

trail of bitsopenaigpt 5 5 cyberopen source securityai assisted patchingpatch the planet

64 pull requests and 51 issues filed across 19 projects in a single week. That is what happens when Trail of Bits clears dozens of engineers' schedules, pairs them with open-source maintainers, and points GPT-5.5-Cyber at critical infrastructure. The first week of Patch the Planet covered cURL, NATS, pyca, Sigstore, aiohttp, the Go project, freenginx, Python, urllib3, PyPI, SimpleX, Valkey, RustCrypto, and more. Over 30 projects have joined so far.

The Firehose Problem and the Patch the Planet Answer

Frontier models produce a firehose of security findings. Stretched maintainers must separate real vulnerabilities from plausible-sounding false positives. Patch the Planet is different: Trail of Bits engineers orchestrate and triage every finding, then write patches. They showed up with code, not just bug reports. 37 of those 64 pull requests are already merged, and many more are in flight. They added new tests, fuzzing harnesses, CI security scanning, supply-chain tooling, and features maintainers had been meaning to build.

What a Frontier Model Can Do in a Day

Given a narrow goal - find remotely exploitable bugs - GPT-5.5-Cyber decided that reading the source of one of the most-reviewed C libraries in existence was a waste of tokens. Instead it stood up a full fuzzing lab in under a day: sanitizer and variant builds, a seed corpus from existing tests, and harnesses across a dozen entry points. It even built a harness that injected operating system backpressure to reach unexplored buggy states. Trail of Bits estimates that effort would have taken a fuzzing expert two to three weeks. A separate CVE variant analysis pipeline built in a day produced novel issues with almost exclusively high-signal output.

The Real Work Starts After the Bug Is Found

Finding bugs is now the easy part. The expensive part is everything after: confirming a finding, getting severity right, writing a patch a maintainer will accept, hardening surrounding code, and coordinating disclosure. That is the work that floods of AI-generated reports threaten to bury. Trail of Bits' most valuable contributions were long-term improvements: a new zizmor CI workflow at python.org, correctness fixes in RustCrypto's big-integer library, serde encoding support, HPKE DHKEM suite IDs, SBOM sidecars for Python's Windows artifacts, and tighter release-file validation.

What Maintainers Need to Do Now

Deduplication is the easiest problem to solve - simple AI-based tools comparing new reports against open issues work well. False-positive filtering and severity correction are harder. Without explicit guidance, models default to rating everything as critical. PyCA's security documentation was dramatically effective at reducing false positives. Files like AGENTS.md that tell models which documentation to consult produced the most consistent results. Maintainers who document their threat models and security criteria will get high-signal help; those who don't will drown in noise.


Source: Introducing Patch the Planet
Domain: blog.trailofbits.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.