UPDATE: Amazon has released an explanation of what caused the big outage this week and has started putting safeguards in place so it doesn’t happen again.
Original story below
It’s been more than 24 hours since Amazon Web Services fixed problems with its Simple Storage Service, or S3 for short, that crippled significant chunks of the internet Tuesday, and the company has yet to say anything about the cause of the outage.
Amazon has not made a public statement about the outage, nor has it returned a request for comment from GeekWire.
That Amazon hasn’t said anything yet isn’t much of a surprise to one observer. Nick Kephart, senior director of product marketing for San Francisco-based network intelligence company ThousandEyes, monitored the outage throughout the day Tuesday. He told GeekWire that Amazon has traditionally been good about coming out publicly with causes and explanations for big outages within a few days of an event.
“In previous cases, Amazon usually will come out with a root cause within a week. That is what we’ve seen before for these major problems, so I would expect probably a similar time frame,” Kephart said. “The last major outage on this scale that they had was back in June 2015, and it took several days before they had detailed a root cause analysis.”
For those who missed it, or don’t use the internet much, here is what happened: starting a little after 9:30 a.m. Pacific time Tuesday, and lasting more than four hours, S3 started experiencing “high error rates” at data centers in Virginia. These problems knocked out access to a litany of websites and apps that run on AWS, including but not limited to Expedia, Slack, Medium, the U.S. Securities and Exchange Commission and many more. The outage even temporarily affected the AWS service health dashboard, which displays outages and events. Since many key aspects of AWS or built on top of each other, the outage also affected other services.
Amazon updated the dashboard throughout the day. AWS began to get issues under control about two hours after the initial outage, and everything was back online around 1:50 p.m. Pacific.
The outage illustrated just how big AWS has become. It is a moneymaker for Amazon, topping $12 billion in sales in 2016 — up 55 percent 2015 — blowing past a goal of reaching $10 billion in sales in 2016. It also has captured more than 40 percent of the cloud computing market, according to a recent report.
Several experts said the incident also underscored the need for redundancy in cloud computing, whether that be spreading data across multiple regions or using several different providers.