Amazon Web Services’ secret weapon: Its custom-made hardware and network

James Hamilton, an AWS VP and Distinguished Engineer, revs up for his talk at re:Invent 2016 in Las Vegas. (GeekWire Photo / Dan Richman)

It’s not unusual for internet and software giants to design and even make their own hardware, to increase efficiency and build a competitive advantage.

Google custom-designs its own servers, filling them with millions of chips from Intel, and announced in May that it has designed its own application-specific integrated circuit (ASIC) for use on neural networks. Facebook uses its own switches in its data centers.

But market-leading public-cloud company Amazon Web Services may have gone farthest down this path — designing not only its own routers, chips, storage servers and compute servers but also its own high-speed network.

“We’ve got, in the same company . . . digital designers working on (chipsets), hardware designers working on NICs (network interface cards), and software developers,” said James Hamilton, an AWS VP and Distinguished Engineer, during a keynote at the AWS re:Invent conference in November. “When you own the horizontal and vertical, we get to move at the pace we’re used to, we get the make changes at the pace we’re used to, we get to respond to customer requirements at the pace we’re used to. We think this is a really big deal.”

During a 1.5-hour geekfest, replete with close-up photos of servers, racks and cables, Hamilton posited that the networking world is going the way of the mainframe, getting “chopped up into vertical stacks” with different companies innovating and competing on those stacks. That’s helping make networking gear a commodity, he said — one that AWS can save money on by building its own.

(Via Amazon video)

“We run our own custom-made routers, made to our specifications, and we have our own protocol-development team,” Hamilton said. “It was cost that caused us to head down our own path, and though there’s a big cost (improvement) . . .  the biggest gain is in reliability.” This custom-made gear “has one requirement, from us, and we show judgment and keep it simple. As fun as it would be to have a lot of tricky features, we just don’t do it, because we want it to be reliable.”

If AWS were using standard commercial routers and a problem arose, “the most committed, most serious company would take six months” to resolve the issues, he said. “It’s a terrible place to be. So we love where we are right now.”

AWS has standardized on 25-gigabit Ethernet (25 GbE) as a fiber networking transfer speed, even though “that looks like a crazy decision,” Hamilton said. “I was heavily involved in this decision, so I’ll defend it.”

The industry standards, he noted, are 10 GbE and 40 GbE, with 10 GbE representing a single optical wave and 40 GbE representing four waves but at almost four times the optics cost, he said. “Well, 25 gigs is almost the same price as 10 gigs, which means we can run 50 gigs at much less cost (than 40 gigs). From an optics standpoint, it’s absolutely the right answer. . . . I believe this is where the industry will end up.”

Hamilton displays a custom-made ASIC. (Via Amazon video)

AWS routers run a customized Broadcom Tomahawk ASIC with 7 billion transistors and 128 ports of 25 GbE, for a total flow-through of 3.2 terabits (Tbit). “These are absolute monsters,” Hamilton said, holding one aloft. Similar chips with 6 Tbit and 13 Tbit capacity are coming, at around the same price, he said.

Another key piece of AWS’s networking strategy is software-defined networking — the ability of network administrators to change and managed network behavior through interfaces. Part of that is moving processes from software into hardware to the extent possible.

“Sometime about 2011 we made an obvious but important observation: whenever you have a workload that’s very repetitive . . .  you’re better of taking some of that down into hardware,” he said.

“People say, ‘Hey, the reason (AWS) had to go to customer networking gear is you could never have the bandwidth we have in our data centers if you didn’t,’ Hamilton related. “That’s not true. I could give you any bandwidth you want . . . with anyone’s equipment. It’s absolutely not hard to do. . . . You know what is hard to do? Latency. That is physics. . . . I tell software people, the things you’re measuring are milliseconds (one thousandth of a second). In hardware, they measure nanoseconds (one billionth of a second) and microseconds (one millionth of a second). So this is the right place for us to go.”

(Via Amazon video.)

AWS also produces its own customized chipsets, emblazoned with the name “Annapurnalabs,” for use in “every server we deploy,” Hamilton said. Amazon bought Israeli chipmaker Annapnurna last January for a reported $350 million. This was the first time AWS has made clear it’s using chips from the company.

“Do you believe we’re in the semiconductor business?” Hamilton exclaimed. “Not only are we building hardware, but we built this,” he said, showing off the chipset. “This is a very big deal. If I’m right on those trends I told you about in hardware implementation (and) latency — and I’m fairly confident on that one — that means we get to implement it in silicon.”

AWS uses power-switching gear with custom firmware for its data centers, ensuring that if faults occur outside the center, the facility continues to operate, and if they occur inside, the load is not dropped. That, Hamilton said, avoids problems like the airline that lost $100 million when switching gear locked out reserve generators, and a 34-minute gap in 2013 Super Bowl coverage that occurred for the same reason.

AWS’s more recent custom storage servers hold 11 petabytes (one million GB) of data on 1,100 disks contained in a single standard-size 42U rack, up from the 8.8 PB in the 880-disk model that Hamilton showed. He said company policy prohibited showing the latest model.

A custom AWS compute server. (Via AWS)

The company’s custom compute servers, each occupying one slot on a rack, are sparsely populated, Hamilton conceded. “Turns out this is implemented for thermal and power efficiency. . . . What the OEMs are selling to customers are probably three, four, five times more dense than this, and they’re less efficient. But they make up for it in cost.”

The AWS compute servers’ power supplies and voltage regulators operate at greater than 90 percent efficiency. And because AWS spends hundreds of millions of dollars on electricity, “if this power supply is 1 percent better, that gets to be a pretty interesting number.”

AWS’s data centers, in 16 regions worldwide, are linked by a 100-Gb private network controlled exclusively by AWS, with no interconnection sites administered by other companies. “It’s many, many parallel 100-Gb links. There’s no way a single link will ever have any on impact on anyone in this room, because we have the capacity to survive a link failure. We engineer it that way. We’d be crazy not to.”

(Via AWS)

“Every link is 100 gigs, absolutely everywhere,” Hamilton said. “This is a pretty important asset. When we started this, I was a little concerned, because it’s really, really, really expensive. The networking team is 100 percent committed that this is the right thing to do.”

AWS uses short- and long-term leases, dark fiber lit under contract, and “in several cases we’re laying our own cable,” he said. “We’ll do whatever’s most cost-effective to get the resources we need.” Amazon is an investor in the Hawaiki Submarine Cable, a 14,000-kilometer cable linking Australia, Hawaii, New Zealand and Oregon.

(Via AWS)

Hamilton provided details, too, on Amazon’s data centers, which are usually surrounded by secrecy. It’s already known that each “availability zone” within a region contains at least one data center, but Hamilton added that most new data centers consume between 25 and 32 megawatts (MW) of power. That’s a fairly modest size: the most power-hungry data center in the world, China Telecom’s Inner Mongolia Information Park, consumes more than 150 MW on an ongoing basis.

“You get really big gains in cost advantage as you get bigger, and we could easily build 250-MW facilities, but if it goes down, it’s not a good place to be,” he said. “So our take right now is that (25-32 MW) is about the right size facility. It costs us a tiny bit more to go down this path, but we think it’s the right thing for customers.”

Watch the full video of Hamilton’s talk below.