Updated below with new information from Microsoft, saying service has been restored to the majority of customers.
Microsoft’s cloud computing platform, Windows Azure, has been experiencing outages in numerous regions around the world for much of the day, apparently resulting from a glitch related to Leap Day, the extra day in February that occurs once every four years.
Mary Jo Foley of ZDNet has a play-by-play rundown of the incident. Microsoft issued this statement earlier today.
“On February 28th, 2012 at 5:45 PM PST Microsoft became aware of an issue impacting Windows Azure service management in a number of regions. Windows Azure engineering teams developed, validated and deployed a fix that resolved the issue for the majority of our customers. Some customers in 3 sub regions – North Central US, South Central and North Europe – remain affected. Engineering teams are actively working to resolve the issue as soon as possible We will update the Service Dashboard, hourly until this incident is resolved.”
See the Windows Azure service dashboard here for the status of different Windows Azure services in regions around the world.
Unlike that AWS outage, there don’t appear to be reports of major consumer websites going down as a result of today’s Azure outage, which could provide more insights into the nature of Microsoft’s cloud customer base.
Update, 5:10 p.m.: Bill Laing, the Microsoft corporate vice president in charge of Windows Azure engineering, apologizes and offers an update in this blog post, confirming that it appears to have been a leap year issue. Here’s an excerpt …
Yesterday, February 28th, 2012 at 5:45 PM PST Windows Azure operations became aware of an issue impacting the compute service in a number of regions. The issue was quickly triaged and it was determined to be caused by a software bug. While final root cause analysis is in progress, this issue appears to be due to a time calculation that was incorrect for the leap year. Once we discovered the issue we immediately took steps to protect customer services that were already up and running, and began creating a fix for the issue. The fix was successfully deployed to most of the Windows Azure sub-regions and we restored Windows Azure service availability to the majority of our customers and services by 2:57AM PST, Feb 29th.
However, some sub-regions and customers are still experiencing issues and as a result of these issues they may be experiencing a loss of application functionality. We are actively working to address these remaining issues. Customers should refer to the Windows Azure Service Dashboard for latest status. Windows Azure Storage was not impacted by this issue.