Friday, March 15, 2013

Outlook’s 16-hour outage was due to firmware overheating a datacenter


Datacenter-telecom
If you use Outlook.com for email, the last few days may have been a frustrating time for you. That’s because Outlook suffered a 16 hour outage, taking with it Hotmail, SkyDrive, and Outlook account access for an unspecified number of users.
Microsoft has been carrying out firmware upgrades across its datacenters in order to allow for the transition of users from Hotmail to Outlook accounts. Although the process has been done several times before, for some reason it failed at one of the datacenters earlier this week.
The knock on effect of that was the datacenter started heating up quickly, which kicked into action safeguards to protect the hardware. Those safeguards meant a lot of servers went offline and Microsoft was left to restore everything after the heat had dissipated.
Clearly that process took a long time and required both human intervention and infrastructure software be utilized. Hence we got 16 hours of downtime.
Full access had been restored to all accounts by 5:30am yesterday, but Microsoft is still looking into exactly why this happened when the upgrade had gone so smoothly many times before. It’s bad enough to lose access for an hour, but 16 is getting to be ridiculous. I’m also left wondering why Microsoft couldn’t route the traffic through another datacenter in the interim.
If nothing else, this acts as a reminder that relying on cloud services does also mean enduring outages when the unexpected happens. Only in this case, it was quite severe

No comments:

Post a Comment