Source linked

Cloudflare Patches hyper Race Condition That Silently Truncated 14.8 MB Responses

blog.cloudflare.com@bold_raven3 hours ago·Systems Engineering·4 comments

A discarded `let _ = poll_flush` result caused hyper to shut down a socket with 14.8 MB still in its buffer. The fix? Check the flush before shutdown.

cloudflarehyperrusthttp libraryrace conditionbug fix

A 14.9 MB image response arrived at the Workers runtime as 219 KB, with a Content-Length header promising the full payload and a 200 status code. No errors logged. No crashes. The other 14.8 MB never left hyper's internal buffer.

That buffer was full and waiting to be flushed to a socket. Hyper had already decided the connection was finished.

The One Line That Hid a Race Condition

Cloudflare's Images service, written in Rust, uses the hyper HTTP library to manage connections. In December 2025, the team rearchitected the Images binding to replace a heavyweight intermediary (FL) with a direct Unix socket on the same machine. Everything got faster. But now the reader on the socket side occasionally paused for a few milliseconds, letting the socket buffer fill during large responses.

Hyper's HTTP/1 dispatch loop in dispatch.rs runs a state machine. Simplified:

loop {
 let _ = self.poll_read(cx)?;
 let _ = self.poll_write(cx)?;
 let _ = self.poll_flush(cx)?;
 if !self.conn.wants_read_again() {
 return Poll::Ready(Ok(()));
 }
}

let _ = self.poll_flush(cx)? discards the result, including Poll::Pending. When the socket buffer is full, poll_flush returns Pending - the flush isn't done. But the loop never checks. It moves on, sees wants_read_again is false, returns Ready. Then poll_shutdown fires a SHUT_WR syscall while 14.8 MB of image data still sits in hyper's buffer.

strace: The Only Tool That Saw the Truth

Application-level tracing and logging showed nothing. The Images service reported success. Every layer thought it did its job. The bug only appeared on the full production path with real concurrency - local curl requests never triggered it because curl drains the socket too fast.

The team attached strace to the Images service. Filtering syscalls carefully - too broad and the overhead made the bug disappear. The output for a failing request: a single sendto of headers plus ~200 KB of body, then immediately shutdown(SHUT_WR). No second write. No error. The kernel had returned Poll::Pending on the first flush, hyper ignored it, and shut down with the buffer full.

Four Lines That Actually Flush

Initial fix: check the flush result in the dispatch loop. If Pending, return Pending to the async runtime. That worked but introduced backpressure on keepalive connections. The team found a more targeted spot.

In poll_shutdown, before calling poll_shutdown on the socket, flush any remaining data:

ready!(self.poll_flush(cx)?);
Pin::new(&mut self.io).poll_shutdown(cx)

Four lines. ready! ensures the flush completes before the SHUT_WR can fire. The dispatch loop stays untouched. The fix is merged into hyper via PR #4018 and will ship in a future release. Cloudflare runs an internal fork in the meantime.

This bug lived in hyper for years across multiple major versions. It took an architectural improvement that made the system faster to surface it: a few milliseconds of backpressure exposed a flaw in state machine logic that assumed flush always completes instantly. The kernel doesn't lie - but you have to ask it the right question.


Source: How we found a bug in the hyper HTTP library
Domain: blog.cloudflare.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.