In order to keep up with demand for its services inside its data centers, Facebook developed a new networking architecture based on the same scale-out principles that revolutionized the server industry decades ago, and it is releasing that architecture as part of the Open Compute Project on Tuesday.
Facebook is donating what it calls the Fabric Aggregation Layer to the OCP, an industry group that shares designs for hardware and data center infrastructure in a similar fashion to how open-source software is shared around the industry. Facebook played a prominent role in the creation of the OCP, which is holding its annual summit in San Jose on Tuesday.
The new architecture was developed so Facebook could expand some of its current data center complexes, said Omar Baldonado, director of software engineering for infra at Facebook. Like the cloud vendors, Facebook operates data centers in several areas around the world, with each separate location referred to as a “region.” Then within each region, Facebook’s complexes generally have a couple of buildings that house its servers.
Over the past year or so, the company identified the need to expand its footprint within some of those locations to accommodate as many as six buildings. But big powerful switches, which made sense when trying to connect two buildings, start to become expensive and unwieldy when trying to connect a larger number of buildings within a region, he said.
So Facebook took the custom switch it designed a few years ago — the Wedge100 — and designed a new networking architecture that relied on lots of smaller switches like the Wedge100 and a more sophisticated cabling strategy to link multiple buildings in a data center region. This also improves the connection between those data centers and the end user, he said.
When the internet was first getting off the ground back in the late 1990s and early 2000s, data center operators quickly realized they could get better performance and scale capacity much more quickly by stringing together networks of relatively cheap servers, instead of buying one huge expensive machine from the likes of IBM or Sun Microsystems. The Fabric Aggregation Layer is more or less the same idea; network administrators can add less-expensive switches to a rack of networking equipment as needed without having to make a big bet on expensive equipment to handle expected future demand.
The architecture has other benefits: “we’re actually being much more power efficient in terms of the actual designs,” Baldonado said. Power is a huge consideration among massive data center operators, and it takes a lot of power to run the big networking switches that Facebook used to use for connecting its buildings.
“It’s requiring more and more power to keep the signal integrity with that amount of data” using big switches, Baldonado said. “We can do it, but at what cost? Any power spent for the network is power not spent for the servers doing compute,” he said.
Facebook has been using the Fabric Aggregation Layer for production workloads within some of its data centers, and eventually all of its infrastructure will shift over to this method.