Source linked

Netflix Maps 1000 microservices en temps réel

netflixtechblog.com@systems_wire6 days ago·Systems Engineering·14 comments

Netflix a construit une carte de service en temps réel pour aider les ingénieurs à comprendre les dépendances et à résoudre les problèmes plus rapidement, réduisant le temps de résolution des incidents et améliorant l'expérience des membres.

netflixmicroservicesservice topologyreal time systemsdistributed systems

Netflix operates thousands of microservices to deliver entertainment to its members. When an engineer gets paged at 3am due to elevated error rates, they need to quickly understand which services depend on each other and what's causing the issue. To solve this problem, Netflix built a real-time service map, providing a unified view of service dependencies.

The Challenge of Distributed Systems In a system with thousands of microservices, answering questions like "Which services depend on each other?" and "What's the blast radius?" can be challenging. Traditional observability tools show fragments of this picture, but none of them show the complete map of how everything connects.

Three Sources of Truth Netflix's service topology map is built using three complementary sources: eBPF network flows, IPC metrics, and end-to-end tracing. Each source creates its own graph of service relationships, which can be combined into a unified view or explored independently.

Bringing It Together: Multi-Layer Architecture The unified view is especially powerful because it combines the strengths of each source: network flows ensure completeness, IPC metrics provide application details, and tracing shows actual behavior. Each source compensates for the limitations of the others, resulting in a comprehensive and accurate view of service dependencies.

What Engineers Can Do Now Today, the service topology map is helping engineers across Netflix visualize dependencies, jump to detailed signals, understand blast radius, and overlay health status. Engineers can also query the topology programmatically, investigate faster, and plan changes confidently.

The Living Map: Always Current What makes this truly useful is that it's a living map, continuously updated based on actual traffic. When a new service starts calling an API, it appears in the topology with near real-time freshness. This means engineers can trust what they see, and the map reflects reality, not someone's idea of what the architecture should be.

What's Next Netflix is continuing to evolve the system with new capabilities, including change event overlay, richer context, and automated root cause analysis. The service topology map provides the knowledge graph foundation that makes this kind of intelligent automation possible, ultimately improving member experience.


Source: From Silos to Service Topology: Why Netflix Built a Real-Time Service Map
Domain: netflixtechblog.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.