Juniper Employee , Juniper Employee Juniper Employee
Why Hash Based Load Balancing in a Chassis is a really bad idea
Mar 27, 2014

Jonathan Davidson



What makes a good Data Center aggregation switch? Is it efficiency? Power utilization? Line rate capacity? Bandwidth? Programmability? Investment protection?


There is a new class of chassis based switches coming on the market from some vendors that utilize what I will call “Hash Based Load Balancing Chassis Fabrics” (HBLBCF) to interconnect the various line cards in the chassis.  I know, HBLBCF rolls off the tongue.


Before I explain the problem being created by this new class of product, first you must understand a little bit about how fabrics have been built to deliver traffic INSIDE of a given chassis (i.e. delivering traffic from one line card to another). 





In Figure 1 we are showing a very basic Ethernet chassis with two Ethernet line cards. Today, the Chassis Fabric that interconnects these line cards (LC1 to LC2) ensures that all packets that come into the Chassis Fabric are able to successfully egress the chassis. Vendors have their own mechanisms for how to ensure this egress capability. The egress capability relies upon the Fabric inside of the chassis to guarantee that the packet will be delivered from ingress (LC1) to egress (LC2).


In Juniper products this involves a two-way handshake from the ingress line card (LC1) to the egress line card (LC2). Additionally, Juniper creates equal sized cells to be sent on ALL paths across the Chassis Fabric from LC1 to LC2. This ensures optimal and efficient usage of all available bandwidth and paths on the Chassis Fabric with no drops. The benefit of this handshake is that your data is guaranteed to be delivered across the Fabric, enabling application performance.


The two-way handshake also avoids all Hash Based Load Balancing Chassis Fabric (HBLBCF) issues and you gain the ability to use ALL available bandwidth in the Chassis Fabric. Aha, but what is an HBLBCF issue? Hash Based Load Balancing is a mechanism where you create a “hash” using source/destination data and then select a given path for a given flow. IP networks are often load balanced in this manner when multiple paths are available. If this hashing is not done well, then you run into utilization issues where not all bandwidth available can be used. The impact is that you have one link running at maximum capacity and dropping traffic while the second link to the same destination is being underutilized.


A hash algorithm does not know up front the size of every packet that will traverse the chassis for a specific flow. In an IP network, this can be handled with careful network design, understanding traffic patterns, inherent statistical multiplexing, and understanding oversubscription ratios.  Breaking each packet into multiple cells is a mechanism that Juniper uses to ensure equal distribution on all paths between line cards.


There are three new products available in the market where an Ethernet Fabric has been chosen for the Fabric inside the Chassis – what we have been calling the Chassis Fabric. What this means, and this part is really important, is that you will now see Hash Based Load Balancing issues inside of your Chassis. In your own IP Network you can resolve these hashing issues by adding new links for more bandwidth or creating new physical layers of routing or switching. However, in a Chassis Fabric you do not have the ability to create additional layers or create more bandwidth as the capacity is fixed, unless you are willing to do a hardware upgrade.


In a statistically perfect Hash Based Load Balanced Chassis Fabric (which is not the normal scenario), the best utilization you can expect to achieve is 90% utilization of the available bandwidth in the Chassis Fabric. I recommend reading section 5.2 of Nick McKeown’s paper on backplanes for more details.


In summary:

- Cell based chassis fabrics offer the best mechanisms for guaranteeing packet delivery


- Improving your hash algorithm between ingress and egress is best achieved by adding external bandwidth (I.e. more 40G uplinks)  to a L3 Clos or VCF enabled switches. This cannot be done on a chassis without significant expense as they have fixed copper traces. 


- VCF and QFabric have an additional benefit over a standard L3 Clos or HBLBCF by using full topology (not just next hop), link capacity, and packet path into account to work around some of these hashing issues


There are many things you need to be aware of when deploying a new Data Center switch, having to worry about the additional complexity of dropping packets due to HBLBCF should not be one of them.