What is the significance of: Netflix Handles 10 Million Ops Per Second With a Low-Latency Graph Engine?

Netflix's new Graph Abstraction processes nearly 10 million operations per second across 650 TB of data, delivering single-digit millisecond latencies for node reads and 1-hop traversals. The architecture blends a...

Netflix Handles 10 Million Ops Per Second With a Low-Latency Graph Engine

Netflix’s Graph Abstraction now handles almost 10 million operations per second across 650 TB of data, delivering sub‑millisecond latencies for node reads and 1‑hop traversals.

Architecture Overview

The engine sits on top of Netflix’s existing KV abstraction, using a property‑graph model to store nodes and edges with strongly typed properties. Each namespace maps to a KV table; node types live in separate namespaces, while edges are split into forward and reverse link indexes plus a separate property index. This separation lets edge links be upserted without pulling the entire property set, preventing wide rows in the underlying store.

Write‑aside caching of edge links keeps the write amplification low: a short‑lived cache avoids re‑writing a link that already exists. Read‑aside caching leverages EVCache; the system invalidates records on write for consistency‑critical graphs or uses TTL‑driven invalidation for high‑frequency updates. The KV layer guarantees idempotent writes via a timestamp‑based LWW policy.

Performance & Scaling

Edge and node persistence achieve single‑digit millisecond latencies (p99 shown in red, p90 in orange, p50 in green). 1‑hop traversals run in the same time window, while 2‑hop traversals—used by the Real‑Time Distributed Graph (RDG)—reach up to 100 ms but keep p90 under 50 ms. The Count API can perform high‑rate counting traversals with comparable latencies. Asynchronous node deletions finish in sub‑second time, and the system replicates data across regions for eventual consistency.

Consistency & Caching

Because writes touch multiple KV namespaces, the abstraction uses Kafka‑driven retries to repair entropy. Node deletions are asynchronous; LWW conflict resolution guarantees that concurrent updates don’t corrupt the graph. Global replication of both cache and durable storage yields an eventually consistent system, acceptable for the OLTP use cases that drive Netflix’s real‑time services.

Netflix will continue to refine the traversal planner and explore write‑through caching in future releases, but the current design already powers millions of operations per second while keeping latency low enough for real‑time user experiences.

Source: High-Throughput Graph Abstraction at Netflix: Part I
Domain: netflixtechblog.com

Netflix Handles 10 Million Ops Per Second With a Low-Latency Graph Engine

Architecture Overview

Performance & Scaling

Consistency & Caching

More in Systems Engineering