2018 greeted CIOs around the planet in quite rude fashion, following the revelation of two 20-year-old undiscovered security flaws in the chips that run their clouds and data centers last week. The patches required to fix the flaws haven’t affected everyone equally, but customers who now find themselves with workloads that run 10 percent to 20 percent slower than they did a month ago can’t be pleased.
Someone is going to have to pay for this.
The tech world was stunned by the revelation last week that researchers at Google and several universities discovered two design flaws in the way almost all modern processors handle the execution of code, and we’re only just beginning to understand the impact of this debacle.
The first lawsuits were filed around the same time the first patches were applied, surprising no one, but it’s going to be a very interesting year for the relationship between Intel — which enjoys around 95 percent of the server chip market — and “Super Seven” cloud companies that buy its chips in bulk. It will also be an interesting test of the bond between cloud computing customers and vendors, depending on their responses.
A representative for Intel did not respond to a request for comment. Microsoft declined to comment on specific plans for cloud customers affected by the Meltdown and Spectre patches.
In a statement, Amazon Web Services said its engineers have been working with affected customers to reduce the performance impact of the patches as much as possible, but it is not clear if the company is passing the cost of that additional work onto the customer or anyone else.
We have not observed meaningful performance impact for the overwhelming majority of EC2 workloads. There have been isolated cases where a specific workload needed attention after patching. Our engineers have helped customers optimize their applications and in almost every case, prevent significant changes to their costs.
For its part, Google released a blog post Thursday morning explaining how its fix (which it has open-sourced) to the chip problems it discovered has allowed its customers to go about their business without even really noticing a performance impact. “During the entire update process (at the end of last year), nobody noticed: we received no customer support tickets related to the updates.”
For cloud and data center customers that were affected, most of the industry burden will likely fall on Intel, given its role at the center of this issue. Chips from AMD and processor cores from ARM also contain the design flaws that create the vulnerabilities, but in the cloud, Intel is really the only game in town.
Company executives appear to be keenly aware of that fact. In late November, six weeks before any of us associated the words “Meltdown” and “Spectre” with Intel but long after Google had informed the company of the issues, the company’s chief financial officer warned attendees at an investor conference that Intel would be facing “increasing competitive dynamics” among data center chip buyers. And, of course, CEO Brian Krzanich is taking quite a bit of heat for his decision to alter a pre-arranged trading plan after the vulnerabilities were discovered, selling off all his shares beyond the ones his contract requires him to hold.
And Intel just created a mile-wide opening for AMD and Qualcomm, both poised with new server chips and a market that knows the only current long-term solution to Spectre is replacing their hardware. AMD has said its processor designs face a “near-zero” fix from the Spectre vulnerability, although its not clear what effect this has had on Qualcomm’s new server processor since it uses a custom-designed ARM core, rather than an off-the-shelf one. In any event, it’s not like there are tons of irritated Qualcomm server customers out there.
The chip company has insisted that the patches prevent the most serious of security issues from being exploited, and it’s probably a safe bet that whatever design was in the works for the next generation of its processors is being hastily redrawn to get around these issues. But while the worst performance-related fears seem to be overblown, there’s a large enough set of workloads affected that led Red Hat and Microsoft to provide details Intel couldn’t or wouldn’t share on the applications that will be affected by the patches.
After downplaying the effects of the patches in its early statements on the matter, Intel finally acknowledged Thursday what Red Hat, Microsoft, and concerned engineers around the world have been saying: there are real-world performance impacts from these design flaws. “We commit to provide frequent progress reports of patch progress, performance data and other information,” Krzanich said in an open letter.
While the initial impact of these patches certainly could have been worse, a lot of cloud computing and security practitioners are starting to really worry about follow-on effects from the patches. After all, the operating systems were designed around the assumption that Intel had secured that part of the processor. The applications were designed around the existing code in that operating system, and new patches at the heart of an operating system could cause problems for some of those apps.
We may have only begun to understand the financial impact these design flaws are going to have on the software industry. It’s not hard to see the dominoes start to fall as more and more companies test the patches on their systems and find they have to make their own changes to ensure performance, security, or both.
If nothing else, responding to Meltdown and Spectre has created an enormous amount of work for cloud vendors and operating system developers, likely delaying product roadmaps and requiring their best engineers to spend time working on a fix for a problem they didn’t create. It’s hard to imagine that the shaky alliance formed the world’s most powerful tech companies out of the need to deal with such a fundamental problem will be able to withstand the long-term effects of Spectre.