Data Center Technologists
Showing results for 
Search instead for 
Do you mean 

QFX10000 - a no compromise switching system with innovation in silicon and memory technology

by Juniper Employee ‎03-11-2015 07:45 PM - edited ‎03-15-2015 06:04 PM

Typically, networking silicon and systems in general can be split across two categories.

High I/O Scale & shallow memory. In order to break the 200-300Gbps throughput barrier on the switch silicon and build silicon that can provide a much higher forwarding throughput that is upwards of 1Tbps, typically a silicon is designed as a “Switch on Chip (SOC)”. What that means is all the forwarding logic as well as buffers to store the incoming & outgoing packets in the system are self-contained and stored natively on the silicon and not on memory external to the silicon. The reason it is done is due to the memory-to-asic bandwidth constraints. The moment memory to store the packets incoming & outgoing of the system is external to the forwarding silicon, the silicon throughput will immediately be gated by that interface that exists between the silicon and the memory. As a result, in order to build systems with very high I/O capacity, a compromise is often made to have very shallow buffers and lookup capacity that is natively available on the silicon itself and not have that slow speed memory to silicon interface. This places certain constraints on the network designs. An example of such a constraint would be that systems designed with an SOC cannot be used in data center edge applications as data center edge application requires full FIB capacity and an SOC would typically not have enough memory to hold a full routing table. Another example would be where applications could be bursty in nature or application could not respond to congestion events in the network by flow controlling themselves and require a fairly deep amount of buffering to be provided by the network. Typically, an SOC of 1Tbps would have about 10MB-24MB or so of buffer shared across all ports and a small amount of tcam for table lookups.

 

Low I/O scale and large external memory:  There is also a need for network elements that have larger FIB tables for applications such as internet peering as well as provide some level of buffering capacity for applications that can be bursty in nature and absorb network speed impedance between Data centers (high bandwidth) and WAN (low bandwidth). In order to make this possible, network elements are required that have fairly high amount of memory in the orders 100s of MB or more. Such systems cannot be designed with a SoC since SoC cannot provide that much amount of memory natively on a switch silicon. If an application required 1GB of memory, that has to be external to the silicon & as a result, the I/O bandwidth goes down to the lowest common factor which is a memory interface in most cases.

 

As data centers are in the midst of transformation from smaller distributed data centers to large scale public or private cloud data centers, new requirements are placed on what the networks should look like in those cases. The large scale public cloud networks may host sometimes of the order of 100s of thousands of hosts on a shared multi-tenant network. In certain designs, the state of those end hosts are visible to the network elements in the form of host routes & in some cases that can be hidden by creative network designs such as overlay networks that are extended to the servers.

 

At the same time, the applications hosted on these networks may have varying degree of responsiveness to network congestion events requiring networks to be able to respond to that bursty or unpredictable traffic patterns. This transformation places new requirements on the data center switching systems & requires a completely new and revolutionary type of switching silicon building block that can scale on I/O to keep up with the bandwidth demands of the network and provide enough buffering capacity and lookup capacity to handle any type of applications in the cloud as well as provide enough flexibility for any type of network designs, the ones that expose the tenant state to the network and the ones that do not. Enter Juniper Q5 silicon designed with a ground breaking memory technology called Hybrid Memory Cube.

 

The Q5 ASIC

 

q5.jpg

 

The Juniper Q5 asic has a unique blend of high I/O capacity and high logical scale achieved by connecting a new and innovative class of memory called Hybrid Memory Cube that provides unprecedented memory-to-asic bandwidth. For the first time in the industry, Juniper Q5 asic in conjunction with external memory systems has broken a 1Tbps forwarding capacity barrier on a silicon. Juniper Q5 asic and HMC are  the key building blocks of recently announced QFX10000 switches that for the first time can scale to 96 Tbps within a single system. Without this key innovation, building a 96Tbps system that can allow customers to build large scale public and private cloud data centers would have been impossible with the existing silicon and memory technologies.

 

Hybrid Memory Cube

 

hmc.jpg

Hybrid Memory Cube is a new class of 3D memory, breaking the bandwidth barrier of 1Tbps and used in conjunction with the Juniper Q5 silicon for QFX10000 series switches. When used instead of DDR3/4 memories, HMC provides up to 17% better power efficiency, 84% improvement in system board design, and can scale to 1 Tbps ASIC performance. It would require 45 DDR3/4 memories or more to match the bandwidth of a single HMC for a system of this scale. The following illustration draws a comparison between two asics, one designed with DDR3/4 memory and the other designed with HMC or hybrid cube memory. The following illustration compares the two 1Tbps systems, one built with DDR memories and one with HMC.

 

ddr.jpg                  q5-hmc.jpg

 

 

 

1Tbps forwarding asic with external memory using DDR3/4 technology (left) and with HMC memory (right)

The following table provides a comparison between a chip designed with DDR3/4 and HMC memories for a 1Tbps forwarding chip.

 

 

DDR3/4

Hybrid Memory Cube (HMC)

Number of memory devices

90 and up

2

Total number of pins between asic and memory

More than 2400

422

Power

61W

49W

Memory surface area

12750mm2 or more

1922mm2

 

 

It would require more than 90 DDR3/DDR4 memories to build a 1Tbps switching silicon where 2 HMC memories can provide the same scale. Since it is not feasible to use 90 memory blocks per chip, previously achieving that scale has simply not been possible.

 

 

 

When coupled with the Q5 ASIC in the QFX10000, HMC provides the following functions:

  • Packet buffering and virtual output queue memory
  • Improved logical system scale by augmenting local tables such as FIB, host, MPLS and MAC on the Q5 ASIC, providing significant improvements and architectural flexibility.

 

 

In summary, Juniper Q5 asic and HMC together have broken many technology barriers in order to build massive scale data centers and truly allow for the first time to build systems are massively scalable in multiple dimensions.

 

Announcements
Juniper Networks Technical Books
About the Author
  • Anil Lohiya is a Principal Engineer in the Campus and Data Center Business unit in Juniper Networks. In his current role, he is leading some of the SDN and Network Virtualization initiatives.
  • I am an Engineer with expertise in Data Packet Forwarding, Software Design & Programming with major domain expertise in QoS (Quality of Services). I have worked across the domains in Data communications field. I love water and am a good swimmer too.
  • Remarkably organized stardust. https://google.com/+JamesKelly
  • I have been in the networking industry for over 35 years: PBXs, SNA, Muxes, ATM, routers, switches, optical - I've seen it all. Twelve years in the US, over 25 in Europe, at companies like AT&T, IBM, Bay Networks, Nortel Networks and Dimension Data. Since 2007 I have been at Juniper, focusing on solutions and services: solving business problems via products and projects. Our market is characterized by amazing technological innovations, but technology is no use if you cannot get it to work and keep it working. That is why services are so exciting: this is where the technology moves out of the glossy brochures and into the real world! Follow me on Twitter: @JoeAtJuniper For more about me, go to my LinkedIn profile: http://fr.linkedin.com/pub/joe-robertson/0/4a/34a
  • Ken Briley is Data Center TME at Juniper Networks focused on Juniper switching product lines. Prior to Juniper Networks, Ken worked at Cumulus Networks as a TME supporting the dis-aggregation movement and before that he spent 15 years at Cisco Systems working in various roles: Technical Support, Technical Marketing Engineer, Network Consulting Engineer and Product Management. Ken has an MS in Electrical Engineering and is CCIE # 9754.
  • Michael Pergament, JNCIE-SP #510, JNCIE-ENT #23, JNCIE-DC #3
  • Raj is a Sr. Cloud Technology Architect with Juniper Networks and focuses on technologies such as VMware, SDN, and OpenStack etc.
  • Rakesh Dubey is the engineering head for Campus and Data Center business unit at Juniper Networks. He has been with Juniper for past six years leading multiple switching products.
  • Sarath Chandra Mekala is a staff engineer with Juniper networks and focuses on implementing Juniper's Openstack Neutron plugins in the areas of Switching, Routing, Firewall and VPN. He is an official contributor to Openstack Neutron FWaaS v2.
  • Sriram is a Sr. Manager in the Campus and Datacenter Business Unit. He is part of the Network Director team and focuses on technologies such as VMware integration, OpenStack etc.