Source linked

Why Memcached Internal Response Times Lie - クライアントからの測定

memcached.org@logical_bear4 hours ago·Systems Engineering·3 comments

Memcached プロセスは、負荷に関係なく 1 ミリ秒以内に GET リクエストを取得します; 実際の遅延は OS ネットワークの列とクライアント側の待機に隠されています。

memcacheddormandolatency measurementdistributed cachingclient side metricssystems engineering

A GET request to memcached takes the same amount of server time whether the machine is idle or gasping under load. That's not a bug - it's a feature of the threading model, and it makes internal response time metrics worse than useless.

The Start Time Trap

Memcached processes requests as fast as it reads them off the network socket. One worker thread per CPU core iterates through ready sockets, but there's no timestamp applied until the server actually starts processing. When a thread is busy reading a batch of sockets, requests sit in OS network buffers waiting their turn. The internal clock never ticks for that queued time.

Compare that to a typical web API: an application might spend tens of milliseconds competing for resources, making sub-requests, or waiting on disk. Memcached lives in the sub-millisecond world. The first chance to measure time is so close to the end that the measurement captures almost none of the actual wall-clock delay the client experiences.

What Actually Slows Down a Memcached Call?

Client perception is everything. If a client sends 100 requests at once, each individual GET may take 0.1 ms to process - but the client sees the full batch completion time. That last request in the queue waited for 99 others to finish before it was even read from the network.

SET requests scale poorly under high write loads, while GETs remain fast from the server's perspective. The bottleneck moves from processing to network buffering. Large responses (megabytes vs. kilobytes) take the same server time to generate but dramatically longer for the client to read and parse. Even extstore SSD writes are a legitimate latency source, but that's one metric you can actually measure separately: time waiting on disk.

The Fix: Sample from the Client

Dormando's recommendation is direct: measure total response time from the client. That gives you the real impact on your service. From there, correlate with server CPU, network drops, or bulk-loading sprees. Top-down correlation beats bottom-up guesswork.

The memcached proxy's built-in logging can sample response times if you run it locally on application hosts. There's also a connection tester script in the docs. No single utility fits every telemetry stack, but a quick sampled telemetry setup will tell you everything internal metrics can't.

Stop trusting server-side timers for sub-millisecond systems. The only number that matters is the one your client sees.


Source: How Long Does That Response Take... For Real?
Domain: memcached.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.