From streaming to studio: The evolution of Node.js at Netflix

By September 24, 2020Blog, Case Study, Node.js, Project Update

As platforms grow, so do their needs. However, the core infrastructure is often not designed to handle these new challenges as it was optimized for a relatively simple task. Netflix, a member of the OpenJS Foundation, had to overcome this challenge as it evolved from a massive web streaming service to a content production platform. Guilherme Hermeto, Senior Platform Engineer at Netflix, spearheaded efforts to restructure the Netflix Node.js infrastructure to handle new functions while preserving the stability of the application. In his talk below, he walks through his work and provides resources and tips for developers encountering similar problems.

Check out the full presentation 

Netflix initially used Node.js to enable high volume web streaming to over 182 million subscribers. Their three goals with this early infrastructure was to provide observability (metrics), debuggability (diagnostic tools) and availability (service registration). The result was the NodeQuark infrastructure. An application gateway authenticates and routes requests to the NodeQuark service, which then communicates with APIs and formats responses that are sent back to the client. With NodeQuark, Netflix also created a managed experience — teams could create custom API experiences for specific devices. This allows the Netflix app to run seamlessly on different devices. 

Beyond streaming

However, Netflix wanted to move beyond web streaming and into content production. This posed several challenges to the NodeQuark infrastructure and the development team. Web streaming requires relatively few applications, but serves a huge user base. On the other hand, a content production platform houses a large number of applications that serve a limited userbase. Furthermore, a content production app must have multiple levels of security for employees, partners and users. An additional issue is that development for content production is ideally fast paced while platform releases are slow, iterative processes intended to ensure application stability. Grouping these two processes together seems difficult, but the alternative is to spend unnecessary time and effort building a completely separate infrastructure. 

Hermeto decided that in order to solve Netflix’s problems, he would need to use self-contained modules. In other words, plugins! By transitioning to plugins, the Netflix team was able to separate the infrastructure’s functions while still retaining the ability to reuse code shared between web streaming and content production. Hermeto then took plugin architecture to the next step by creating application profiles. The application profile is simply a list of plugins required by an application. The profile reads in these specific plugins and then exports a loaded array. Therefore, the risk of a plugin built for content production breaking the streaming application was reduced. Additionally, by sectioning code out into smaller pieces, the Netflix team was able to remove moving parts from the core system, improving stability. 

Looking ahead

In the future, Hermeto wants to allow teams to create specific application profiles that they can give to customers. Additionally, Netflix may be able to switch from application versions to application profiles as the code breaks into smaller and smaller pieces. 

To finish his talk, Hermeto gave his personal recommendations for open source projects that are useful for observability and debuggability. Essentially, a starting point for building out your own production-level application!  

Personal recommendations for open source projects 

Metrics and alerts: 

Centralized Logging 

Distributed tracing 

Diagnostics 

Exception Management