Updated below with new information from Microsoft, saying service has been restored to the majority of customers.

Microsoft’s cloud computing platform, Windows Azure, has been experiencing outages in numerous regions around the world for much of the day, apparently resulting from a glitch related to Leap Day, the extra day in February that occurs once every four years.

Mary Jo Foley of ZDNet has a play-by-play rundown of the incident. Microsoft issued this statement earlier today.

“On February 28th, 2012 at 5:45 PM PST Microsoft became aware of an issue impacting Windows Azure service management in a number of regions.  Windows Azure engineering teams developed, validated and deployed a fix that resolved the issue for the majority of our customers. Some customers in 3 sub regions – North Central US, South Central and North Europe – remain affected.  Engineering teams are actively working to resolve the issue as soon as possible  We will update the Service Dashboard, hourly until this incident is resolved.”

See the Windows Azure service dashboard here for the status of different Windows Azure services in regions around the world.

As noted by Data Center Knowledge, it’s the latest in a series of cloud computing outages, with the Amazon Web Services meltdown of last year also standing out as an example.

Unlike that AWS outage, there don’t appear to be reports of major consumer websites going down as a result of today’s Azure outage, which could provide more insights into the nature of Microsoft’s cloud customer base.

Update, 5:10 p.m.: Bill Laing, the Microsoft corporate vice president in charge of Windows Azure engineering, apologizes and offers an update in this blog post, confirming that it appears to have been a leap year issue. Here’s an excerpt …

Yesterday, February 28th, 2012 at 5:45 PM PST Windows Azure operations became aware of an issue impacting the compute service in a number of regions.  The issue was quickly triaged and it was determined to be caused by a software bug.  While final root cause analysis is in progress, this issue appears to be due to a time calculation that was incorrect for the leap year.  Once we discovered the issue we immediately took steps to protect customer services that were already up and running, and began creating a fix for the issue.  The fix was successfully deployed to most of the Windows Azure sub-regions and we restored Windows Azure service availability to the majority of our customers and services by 2:57AM PST, Feb 29th.

However, some sub-regions and customers are still experiencing issues and as a result of these issues they may be experiencing a loss of application functionality.  We are actively working to address these remaining issues.  Customers should refer to the Windows Azure Service Dashboard for latest status.  Windows Azure Storage was not impacted by this issue.

Comments

  • Guest

    Thank you to Microsoft for acknowledging the issue and for working swiftly to resolve it.

    • Sandokan

      Lol the Microsoft fanbois are at it again. Or… Steve is that you? Did you sneak away from the orderlies again? You know you should not do that. You know how you get when you do that. Remember that little incident with the flying chair?

      Anyway, this is the umpteenth time this Azure thing has gone down. Who takes their cloud stuff still serious? How can you run a business on a platform that just goes away for hours on end? Time to look for something more stable, robust and available. Looking at what’s behind pretty much all the global critical infrastructure the answer is clearly: Linux. And it’s Free.

      Now Steve, put down that chair. Really Steve waving a chair like that is dangerous. Put it down *now* before you hurt……

      • Guest

        Congrats on hitting most of the tired troll talking points. You even managed to shill for Linux. Good job! The only ones you missed was to call Ballmer “Monkey Boy” and unnecessarily CAPITALIZE at least something in your silly rant. So minor demerit for those two omissions, but otherwise a solid 9/10.

        • Guest

          I think we’re looking at a solid 10/10 here. In addition to all the usual anti-Microsoft points, Sandy demonstrates a clear attraction toward Steve, whom she sees as a sort of father figure. Note the playful criticism and the loving admiration that a child often gives to her father. We’ve found that Electra complexes such as these are fairly common among the persons who are fixated on critiquing Steve Ballmer.

          I find the most alluring feature of GeekWire to be this window into the repressed sexual feelings of the technology community. Keep an eye out for my ethnographic panel on the topic at this year’s Gender Odyssey conference.

  • http://www.christopherbudd.com Christopher Budd

    It’s good that Laing jumped on this and put his name to an update on the issue. That shows they took it seriously and gave it the attention it deserves.

    As someone with a tech background, though, I do wish Azure and others would do like Google has been doing and following up with a detailed technical postmortem. You can see an example here: http://groups.google.com/group/google-appengine-downtime-notify/browse_thread/thread/aa0ae888a6b1d57d?pli=1.

    It’s good they’re giving specifics here (that it was leap day related) but it’s still high-level enough that I’m wondering exactly what the leap day problem was. 
    And in a way, it’s helpful for the industry for everyone to be more open about their operations. Let’s face it, we’re in a new world with cloud when it comes to infrastructure management and so everyone is learning at they go. Sharing best practices and learnings can help the industry to mature, which is better for everyone. Uptime like security really shouldn’t be a competitive differentiator: it should just be there so everyone can focus on features and capabilities.

  • http://twitter.com/freelock John Locke

    But really, who could have predicted that a February would have 29 days?

    • Guest

      You can always count on MS to forget about sweating the details…

  • http://twitter.com/freelock John Locke

    But really, who could have predicted that a February would have 29 days?

Job Listings on GeekWork

Find more jobs on GeekWork. Employers, post a job here.