Amazon.com is experiencing a serious outage this morning with its Elastic Compute Cloud and other Web services, causing downtime at major sites such as HootSuite, Reddit and Foursquare. We’re tracking the issue and will update the post as we learn more, but Twitter is buzzing with complaints.

“The sky is falling! Amazon’s cloud seems to be down (raining?) so we’re experiencing some issues too. Be back soon!,” read a Tweet from mobile social gaming site SCVNGR.

Amazon Web Services “health dashboard” indicates problems with Elastic Compute Cloud servers in Northern Virginia, with the company reporting “instance connectivity, latency and error rates.” AWS’s Elastic Beanstalk and Relational Database Service also are experiencing problems.

It is unclear whether the problems are impacting sites in the Pacific Northwest, so let us know in the comments if you are an Amazon Web Services customer and experiencing troubles.

Many third party companies turn to Amazon.com to host their applications, and when the AWS service goes down hundreds or thousands of services can go down with it.

The issues started at 1:41 a.m. when Amazon issued an alert about Elastic Compute Cloud. As of 6:09 a.m., the company had said that “EBS API errors and volume latencies in the affected availability zone remain. We are continuing to work towards a resolution.”

UPDATE FROM AMAZON.COM AT 10:26 AM:

We have made significant progress in stabilizing the affected EBS control plane service. EC2 API calls that do not involve EBS resources in the affected Availability Zone are now seeing significantly reduced failures and latency and are continuing to recover. We have also brought additional capacity online in the affected Availability Zone and stuck EBS volumes (those that were being remirrored) are beginning to recover. We cannot yet estimate when these volumes will be completely recovered, but we will provide an estimate as soon as we have sufficient data to estimate the recovery. We have all available resources working to restore full service functionality as soon as possible. We will continue to provide updates when we have them.

John Cook is co-founder of GeekWire. Follow on Twitter: @geekwirenews and Facebook.

Comments

  • http://twitter.com/sib1013 Scott Blanksteen

    This outage has been affecting us at AppStoreHQ all day.

    It seems to be very localized in that EC2 instances (the compute machines) are fine while EBS (storage) is down. For the geeks out there, the fun is that we’re seeing load averages of ~52 on EC2 instances that are still responsive at the command line because so many processes are hung waiting for EBS to respond.

    I guess on a sunny day in Seattle it’s bad to rely on the cloud?

  • http://twitter.com/sib1013 Scott Blanksteen

    This outage has been affecting us at AppStoreHQ all day.

    It seems to be very localized in that EC2 instances (the compute machines) are fine while EBS (storage) is down. For the geeks out there, the fun is that we’re seeing load averages of ~52 on EC2 instances that are still responsive at the command line because so many processes are hung waiting for EBS to respond.

    I guess on a sunny day in Seattle it’s bad to rely on the cloud?

  • http://twitter.com/ChiefDoorman Keith Smith

    You can add BigDoor to that list. This has been going on since around 1:30 AM PT. AWS is failing miserably in their communication about this issue and from what we can tell still don’t have a full grasp on the severity of it. They claim that this is only impacting the creation of new EBS volumes and EBS backed volumes – but this is significantly more than that.

    • johnhcook

      Isn’t BigDoor right next to Amazon’s new HQ? Can’t you just go over there and ask them to fix the problem. :)

      • http://twitter.com/ChiefDoorman Keith Smith

        Yes! And we’ve more or less done that. But so far we are only getting lots of sincere apologies and promises that they understand the gravity of the situation – all still devoid of any real information.

  • http://twitter.com/ChiefDoorman Keith Smith

    You can add BigDoor to that list. This has been going on since around 1:30 AM PT. AWS is failing miserably in their communication about this issue and from what we can tell still don’t have a full grasp on the severity of it. They claim that this is only impacting the creation of new EBS volumes and EBS backed volumes – but this is significantly more than that.

  • http://twitter.com/ggoodale Grant Goodale

    WordSquared is hosted over at Blue Box Group (http://bluebox.net), so no outage here. We do host our static assets on Amazon’s S3, but that doesn’t appear to be affected by the EC2 issues.

  • http://www.twitter.com/algard algard

    WhitePages has some limited services on AWS (mainly via Heroku). We also noticed that Omniture has been kind of wonky, and that would affect many companies. In addition, CarDomain and StreetFire are hosted on AWS, so not a fun day for them either.

  • http://www.facebook.com/ikristoph Kristoph Cichocki-Romanov

    Zapd.com was spared as all our instances are ephemeral. We had only a brief network disruption.

    Inkd.com was less lucky as we use an EBS. We were able to make a snapshot of the volume around 1pm and recreate the EBS in a different (working) zone. We’re up and running now.

  • rickg

    Ok, what I don’t get is that, if it’s just the VA instances… why is everyone down? Doesn’t that point out that the ‘cloud’ you’re on is just a grid located in a given data center? One of the supposed benefits of a cloud is that it’s not just independent of server failures, but it’s not tied to any one datacenter. To the end user the resource is just one big resource without regard to any particular server or datacenter. If instances exist in one and only one datacenter… um…

    • http://profiles.google.com/mcddsl Michael McDermott

      Good point! Hot backup should be available elsewhere, one would think.

  • johnhcook

    Keith Smith of BigDoor gives a first-hand account of the AWS outage in this GeekWire guest post: “Amazon.com’s real problem isn’t the outage, it’s the communication.” A great read in my opinion:

    http://www.geekwire.com/2011/amazoncoms-real-problem-outage-communication

Job Listings on GeekWork