A widespread outage on CenturyLink’s cloud network that lasted almost two days and hampered emergency phone service in Washington and other states was caused by a faulty network management card, according to a customer notice.
Brian Krebs, a veteran security journalist, posted a copy of a notice sent to CenturyLink’s “core customers” to his Twitter feed Saturday that blamed a card at its data center in Colorado for “propagating invalid frame packets across devices,” causing a series of issues that forced the company to reboot much of its networking equipment. It took CenturyLink more than two days from when it first identified the issues to sound the all-clear on Saturday morning, a period during which 911 services in several states including Washington were down or spotty.
it came from CL itself to core customers. Sorry, can't share the source. pic.twitter.com/21T9x458gr
— briankrebs (@briankrebs) December 29, 2018
CenturyLink representatives did not immediately respond to a request to verify the notice.
By the standards of modern cloud service providers, a two-day outage is an eternity. And it’s not clear how a single piece of equipment could cause an outage of such magnitude given the layers of redundancy that cloud providers build into their systems.
An FCC investigation into the outage might turn up some answers, unless CenturyLink is willing to post a more detailed post-mortem on the outage, which is becoming a standard part of incident response.