On Christmas Eve of last year, problems with Amazon Web Services brought down Netflix’s platform, causing frustrated customers to take to social networks and complain as their family movie nights ground to a screeching halt. Now, thanks to new capabilities, Netflix seems confident it can weather a disaster of even greater proportions.
In a new blog post, the company announced that its system currently has the capability to merge all of its traffic on to one AWS region, thanks to some new engineering work the company has done.
Currently, Netflix’s service is split across a number of AWS regions, so customers near the East Coast usually get served by Amazon’s US-East 1 data center in Virginia, those on the West Coast get served by the US-West 2 data center in Oregon, and so on. The new capabilities announced today provide a key weapon in preventing another disaster on the scale of what happened on Christmas Eve: if one region fails, Netflix just routes all of its traffic to the working region.
The system has already been battle-tested. Netflix experienced a problem in one of its clusters earlier this year, and instead of expeiriencing a major outage, the company routed the necessary traffic to the working region. While the outage didn’t go unnoticed, it seems like Netflix’s new system was able to contain the damage.
In addition, Netflix conducted a large-scale test that, according to the company, shows that it would be possible for them to run the entire service in the U.S. off just one AWS region without service disruptions.
Along with Isthmus, which allows the service to work around an Elastic Load Balancer outage like the one that caused the Christmas Eve catastrophe, Netflix seems prepared to handle much of what Amazon’s outages might throw at it.
If you’re interested in more nitty-gritty technical details about the company’s new system, check Netflix’s blog.