One Wrapper Doubles CPU, Cuts Throughput: Go's sendfile Fast Path Explained

2,958 sendfile(2) calls vanish when you hide a *os.File behind a plain io.Reader. That one-line "harmless" wrapper - a logging reader, a tracing span, anything that turns a file into a generic Reader - replaces those sendfile calls with 131,093 read+write syscall pairs. The result: 24x more wall time inside the kernel and a CPU profile where 82% of samples live inside syscall.Read and syscall.Write.

Assel Meher from Grafana Labs took the time to benchmark this exact scenario on Linux 6.6, Go 1.22.12, and a warm 512 MiB page cache. The numbers are brutal. I'm writing this because every Go engineer who serves files over TCP should know where their fast path is hiding.

The Three Shapes of io.Copy

Meher compared three ways to hand a file to io.Copy on a *net.TCPConn. The raw version just passed the *os.File directly - that's the fast path. The wrapped version hid the file behind a justReader struct that did nothing except implement io.Reader. The limit version wrapped it in an
*io.LimitedReader, which the runtime explicitly detects and unwraps.

// raw: fast path
io.Copy(conn, f)

// wrapped: slow path (any middleware that returns an io.Reader)
type justReader struct { r io.Reader }
io.Copy(conn, justReader{r: f})

// limit: still fast because io.LimitedReader is special-cased
io.Copy(conn, io.LimitReader(f, fileSize))

Under strace -c, the raw version made 2,958 sendfile calls and only 7 read+7 write (TCP setup chatter). The wrapped version made zero sendfile calls, 65,546 write syscalls, and 65,547 reads. Every single 32 KiB chunk of that 512 MiB file bounced through a userspace buffer.

The Signature of a Lost Optimization

Meher's flamegraph shows the telltale sign: a nested io.copyBuffer inside *TCPConn.readFrom. When ReadFrom can't find a *os.File, it punts back to io.copyBuffer, which does the userspace bounce. The hot stack reads: io.Copy -> io.copyBuffer -> (*TCPConn).ReadFrom -> readFrom -> io.Copy -> io.copyBuffer. That's the fingerprint of a zero-copy path that died.

Go's detection chain lives in two files: net/sendfile_linux.go and os/zero_copy_linux.go. It does two type assertions - first for *io.LimitedReader (which it unwraps), then for *os.File. Pass it anything else, even a do-nothing wrapper, and the runtime falls back to generic read/write.

The practical lesson: if you write middleware that wraps an *os.File in an io.Reader for logging, metrics, or tracing, you just torpedoed your server's throughput. Every byte goes through a userspace buffer. CPU doubles. Throughput halves. One line.

What This Means for Production

Meher's post is from June 2026, but the architecture hasn't changed. The Linux sendfile(2) syscall still does what it's always done: splice page-cache pages directly into the socket's send queue, zero copy. Go's standard library still activates it automatically for *net.TCPConn.ReadFrom when it gets a *os.File. The vulnerability is the same: any wrapper that doesn't implement io.ReaderFrom or preserve the concrete type kills it.

Check your own code. That countingReader or tracingReader wrapping every file handler? That's likely your bottleneck. io.LimitedReader and io.SectionReader are safe - the runtime knows them. Everything else is a performance tax you didn't account for.

Source: Zero-copy in Go: sendfile, splice, and the cost of io.Copy
Domain: segflow.github.io

One Wrapper Doubles CPU, Cuts Throughput: Go's sendfile Fast Path Explained

The Three Shapes of io.Copy

The Signature of a Lost Optimization

What This Means for Production

More in Developer Tools