QFX10000 - a no compromise switching system with innovation in silicon and memory technology
Mar 11, 2015
Typically, networking silicon and systems in general can be split across two categories.
High I/O Scale & shallow memory. In order to break the 200-300Gbps throughput barrier on the switch silicon and build silicon that can provide a much higher forwarding throughput that is upwards of 1Tbps, typically a silicon is designed as a “Switch on Chip (SOC)”. What that means is all the forwarding logic as well as buffers to store the incoming & outgoing packets in the system are self-contained and stored natively on the silicon and not on memory external to the silicon. The reason it is done is due to the memory-to-asic bandwidth constraints. The moment memory to store the packets incoming & outgoing of the system is external to the forwarding silicon, the silicon throughput will immediately be gated by that interface that exists between the silicon and the memory. As a result, in order to build systems with very high I/O capacity, a compromise is often made to have very shallow buffers and lookup capacity that is natively available on the silicon itself and not have that slow speed memory to silicon interface. This places certain constraints on the network designs. An example of such a constraint would be that systems designed with an SOC cannot be used in data center edge applications as data center edge application requires full FIB capacity and an SOC would typically not have enough memory to hold a full routing table. Another example would be where applications could be bursty in nature or application could not respond to congestion events in the network by flow controlling themselves and require a fairly deep amount of buffering to be provided by the network. Typically, an SOC of 1Tbps would have about 10MB-24MB or so of buffer shared across all ports and a small amount of tcam for table lookups.
Low I/O scale and large external memory: There is also a need for network elements that have larger FIB tables for applications such as internet peering as well as provide some level of buffering capacity for applications that can be bursty in nature and absorb network speed impedance between Data centers (high bandwidth) and WAN (low bandwidth). In order to make this possible, network elements are required that have fairly high amount of memory in the orders 100s of MB or more. Such systems cannot be designed with a SoC since SoC cannot provide that much amount of memory natively on a switch silicon. If an application required 1GB of memory, that has to be external to the silicon & as a result, the I/O bandwidth goes down to the lowest common factor which is a memory interface in most cases.
As data centers are in the midst of transformation from smaller distributed data centers to large scale public or private cloud data centers, new requirements are placed on what the networks should look like in those cases. The large scale public cloud networks may host sometimes of the order of 100s of thousands of hosts on a shared multi-tenant network. In certain designs, the state of those end hosts are visible to the network elements in the form of host routes & in some cases that can be hidden by creative network designs such as overlay networks that are extended to the servers.
At the same time, the applications hosted on these networks may have varying degree of responsiveness to network congestion events requiring networks to be able to respond to that bursty or unpredictable traffic patterns. This transformation places new requirements on the data center switching systems & requires a completely new and revolutionary type of switching silicon building block that can scale on I/O to keep up with the bandwidth demands of the network and provide enough buffering capacity and lookup capacity to handle any type of applications in the cloud as well as provide enough flexibility for any type of network designs, the ones that expose the tenant state to the network and the ones that do not. Enter Juniper Q5 silicon designed with a ground breaking memory technology called Hybrid Memory Cube.
The Q5 ASIC
The Juniper Q5 asic has a unique blend of high I/O capacity and high logical scale achieved by connecting a new and innovative class of memory called Hybrid Memory Cube that provides unprecedented memory-to-asic bandwidth. For the first time in the industry, Juniper Q5 asic in conjunction with external memory systems has broken a 1Tbps forwarding capacity barrier on a silicon. Juniper Q5 asic and HMC are the key building blocks of recently announced QFX10000 switches that for the first time can scale to 96 Tbps within a single system. Without this key innovation, building a 96Tbps system that can allow customers to build large scale public and private cloud data centers would have been impossible with the existing silicon and memory technologies.
Hybrid Memory Cube
Hybrid Memory Cube is a new class of 3D memory, breaking the bandwidth barrier of 1Tbps and used in conjunction with the Juniper Q5 silicon for QFX10000 series switches. When used instead of DDR3/4 memories, HMC provides up to 17% better power efficiency, 84% improvement in system board design, and can scale to 1 Tbps ASIC performance. It would require 45 DDR3/4 memories or more to match the bandwidth of a single HMC for a system of this scale. The following illustration draws a comparison between two asics, one designed with DDR3/4 memory and the other designed with HMC or hybrid cube memory. The following illustration compares the two 1Tbps systems, one built with DDR memories and one with HMC.
1Tbps forwarding asic with external memory using DDR3/4 technology (left) and with HMC memory (right)
The following table provides a comparison between a chip designed with DDR3/4 and HMC memories for a 1Tbps forwarding chip.
Hybrid Memory Cube (HMC)
Number of memory devices
90 and up
Total number of pins between asic and memory
More than 2400
Memory surface area
12750mm2 or more
It would require more than 90 DDR3/DDR4 memories to build a 1Tbps switching silicon where 2 HMC memories can provide the same scale. Since it is not feasible to use 90 memory blocks per chip, previously achieving that scale has simply not been possible.
When coupled with the Q5 ASIC in the QFX10000, HMC provides the following functions:
Packet buffering and virtual output queue memory
Improved logical system scale by augmenting local tables such as FIB, host, MPLS and MAC on the Q5 ASIC, providing significant improvements and architectural flexibility.
In summary, Juniper Q5 asic and HMC together have broken many technology barriers in order to build massive scale data centers and truly allow for the first time to build systems are massively scalable in multiple dimensions.