Automated congestion avoidance with the NorthStar Controller
Oct 5, 2018
By Julian Lucek and Ryan McMeniman
Anyone living in Silicon Valley, Los Angeles, New York or London knows that a car commute can be anything but fast and easy. Now imagine adding more on-ramps to the major freeways in those cities without adding more lanes or widening them. Pretty simple to see that would result in major congestion. So, if 5G radios are like freeways, the forthcoming bandwidth increase is kind of like that nightmarish traffic scenario. As service providers operating these “freeways” continue migrating to and accommodating 5G, they must take into consideration the impact its expanded capabilities will across the entire network. Let’s delve into how service providers can match the speed of their access networks with their underlying transport network to deliver on the 5G promise.
Service providers are faced with an all-too-common paradox when seeking to optimize their transport infrastructure: how to achieve high performance at scale while lowering cost per bit in a network that has an increasingly diverse and complex topology. Due to cost pressures, the old operating model that would have been used to handle the forthcoming traffic surge is now economically unfeasible and architecturally complex. Like as done in other parts of their network, service providers need to look to software defined network (SDN) technologies with flexible traffic optimization and engineering capabilities that enable them to run their networks “hotter” while ensuring predictability, resiliency, and service-level guarantees. Solving this challenge is even more imperative as service providers plan their 5G strategy across their entire networks.
Let’s look at the practical application of this technology in a live network.
As with the traffic analogy, network operators must control the paths taken by packets across the network. This means instantiating Traffic Engineered Label Switch Paths (TE LSPs) across the network in accordance with the user intent. However, disturbances such as link failures or traffic congestion perturb the network state. Streaming telemetry gives the SDN Controller visibility of the current state of the network. Using this information, it modifies the paths of LSPs as needed in order to get closer to the desired network state.
All of the information shown in Figure 1 has been acquired automatically by the NorthStar SDN controller via protocols or telemetry. The topology of the network is known via BGP Link-State (BGP-LS). The link traffic data shown on each link are known via streaming telemetry.
Figure 1: Network topology showing percentage link utilization
Figure 2 highlights the path of a particular LSP. As you can see, it travels from vmx102 to vmx104 via vmx105 and vmx107.
Figure 2: Path of LSP from vmx102 to vmx104, highlighted in orange
The NorthStar Controller has been configured to move LSPs away from a link if the traffic exceeds 85%. Let’s now increase the amount of traffic in the network. The screenshot in Figure 3 shows that some of the links are now only just below the threshold. The graph in the lower part of the figure shows the traffic as a function of time on the link from vmx102 to vmx105 - you can see that the link utilization has increased from about 60% to about 80%.
Figure 3: status of the network after increasing the traffic volume
Increasing the traffic some more, we cross the threshold. As you can see in Figure 4, the LSP that we looked at before in Figure 2 has been moved by the NorthStar Controller to a different path. It selected which LSP to move, recomputed the path and sent the details of the new path to the ingress router, vmx102, via PCEP. In turn, vmx102 moved the traffic onto the new path in a make-before-break manner, so no traffic was lost. The LSP now follows the shortest IGP metric path such that the link traffic threshold is not exceeded (the IGP metric on the link between vmx102 and vmx106 is greater than the sum of the IGP metrics on the link between vmx102 and vmx101, the link between vmx101 and vmx105 and the link between vmx105 and vmx106). As you can see, the traffic on each link is now comfortably below the 85% congestion threshold.
Figure 4: New path of LSP, highlighted in orange
Traditionally, the above workflow is carried out manually when someone on the network operations team notices that a link is getting congested. Based on the conditions at that moment in time, they then have to work out which LSP(s) to move and what path the LSP(s) should follow. They then need to configure new paths for the LSPs on the ingress routers, perhaps using a couple of loose hops to pin the LSPs away from the congestion point. This sequence of steps is quite time consuming and error-prone, so having the job done automatically is much more efficient and accurate.
Service providers are facing a significant challenge of economically optimizing their network traffic flow. Just like the innovators creating autonomous cars that will banish the monotony of driving ourselves, we’re helping service providers to adopt software-defined technologies that can automatically control and manage their traffic across their backbone network.
To find out more, see this recording of a Nanog presentation on this topic, and come and hear more about it at NXTWORK 2018.