MS_Outlook_com_lockup-webA firmware upgrade failure at one of Microsoft’s datacenters caused the facility to overheat and shut down access to Outlook.com and Hotmail for some users earlier this week, Microsoft said in a blog post this morning explaining the cause of the problem.

The outage, which started on Tuesday afternoon and lasted until early Wednesday morning, came at an inopportune moment for Microsoft, as the company tries to attract new users to its Outlook.com service. The outage also affected a smaller number SkyDrive users.

Of course, Microsoft is hardly the only online service provider to ever suffer an outage.

Here’s the root cause analysis from the Microsoft post …

On the afternoon of the 12th, in one physical region of one of our datacenters, we performed our regular process of updating the firmware on a core part of our physical plant. This is an update that had been done successfully previously, but failed in this specific instance in an unexpected way. This failure resulted in a rapid and substantial temperature spike in the datacenter. This spike was significant enough before it was mitigated that it caused our safeguards to come in to place for a large number of servers in this part of the datacenter.

These safeguards prevented access to mailboxes housed on these servers and also prevented any other pieces of our infrastructure to automatically failover and allow continued access. This area of the datacenter houses parts of the Hotmail.com, Outlook.com, and SkyDrive infrastructure, and so some people trying to access those services were impacted.

PreviouslyOutlook.com arrives: Microsoft gets aggressive in new bid to dethrone Gmail

Comments

  • http://www.facebook.com/al.langevin.5 Al Langevin

    Perhaps Microsoft should save some of their advertising dollars and build data centers that are actually redundant. It’s kind of a joke in this cloud-based world to see a single data center cause this. Reminds you of their infamous DNS brainfart where they put their public DNS servers all on the same network. I’ll stick with Gmail which is actually cloud-based. Nice try Microsoft.

    • TheOtherGuest

      LOL, I can’t wait to see the reaction of the MSFT PR minions. How does one spin a Single Point of Failure?

    • guest

      Perhaps you should have spent less time writing your criticism and more reading the linked article on the Gmail failure, which is one of several that have occurred. I guess Google didn’t think to build data centers that were redundant either? You gotta love condescending a$$hats who can’t even read, but think they know how to run cloud data center better than people who are acknowledged as word-class at it. There’s a reason why none of them base their service guarantees on 100% uptime. Nice try, tool.

      • TheOtherGuest

        “word-class” (sic) you say? Like in forgetting to push a new SSL certificate so Azure goes poof? You really need to grab a dictionary and look up the meaning of “world class”.

        • guest

          Nobody forgot, tool. Don’t tell me that with all the time you spend trolling MS you didn’t read the incident report? Mistakes happen. And if you think MS isn’t on a very short list of world class data center operators, then you’re obviously even dumber than your comments would already have us believe.

  • GabrielJM

    Similar was told about outage at London Stock Exchange in 2009. It was really curious that it was migrated to Linux system, 2 years after the issue. http://tech.slashdot.org/story/08/09/08/185238/the-london-stock-exchange-goes-down-for-whole-day

Job Listings on GeekWork