Automation & Programmability
Showing results for 
Search instead for 
Do you mean 

Junos Telemetry : Detecting Microbursts

by Juniper Employee ‎08-06-2017 06:53 AM - edited ‎08-08-2017 08:47 AM



Managing networks is actually not that difficult when things are working as designed. Operational headaches happen when things go wrong. And even then, when they go fantastically wrong (like hard failures that are easily identifiable), troubleshooting or remediation can be relatively straightforward.


Rather, the biggest challenges for network operators is diagnosing transient issues. The only information that is available is often an observation about some downstream consequence (“the network is slow” or “my application isn’t responding”). To correctly diagnose issues here, there must be real-time telemetry that is fine-grained enough to provide meaningful input.


Take microbursts as an example. A microburst is a short spike of packets received in a relatively small interval at a rate much higher than the configured guaranteed bandwidth for a given queue.


It’s not hard to imagine scenarios where microbursts might impact the business such as high frequency trading platforms. Those platforms depend on real-time market data to formulate trading strategies. Microbursts will result in stale data delivery and trading algorithms will be out of sync with the market which can be catastrophic to their business.


What network operators need are fine-grained monitoring tools that can detect issues as they are happening. Snapshots of average queue depths do not help identify issues much less provide real-time remediation. This is why we have introduced a queue monitoring sensor as part of the Junos Telemetry Interface (JTI) in Junos release 17.1.


What can cause Microbursts?


The main factors that can cause micro bursts in a network are:

  • Multiple sources sending packets to a single queue
  • Significant speed mismatch between ingress and egress interfaces (for example, a 100G/40G ingress interface forwarding packets to 10G/1G egress interfaces or to a queue which is shaped at a lower rate)
  • Multicast replication done by egress Packet Forwarding Engine (PFE) to a large number of receivers on the same egress interface


A microburst may result in dropped packets if queues are configured with small buffers. If queues are configured with adequate buffers to absorb the microburst, there won’t be any drops but it will introduce additional latency in delivering packets due to increased queue utilization. Dropped packets are properly accounted and easy to troubleshoot. However, determining the source of additional latency can be quite challenging in the network for many reasons:


  • Typical network topologies consist of multiple routers, and it is difficult to identify the router that is introducing the latency.
  • Monitoring tools (SNMP or CLI based polling) query interface statistics every 30 or 60 seconds by default. That interval provides good average utilization but is not sufficient to detect microbursts. The polling interval needs to be less than 1ms in order to detect microbursts reliably. It is not practical to poll at that high a rate from the routing engine or line card CPU.



How to detect Microbursts?


For MPCs 7E/8E/9E, a new queue monitoring sensor will be introduced as part of the Junos Telemetry Interface. The queue monitoring sensor will periodically export peak queue depth information to an external collector.


Screen Shot 2017-08-04 at 3.39.58 PM.png

The microcode engine in the Trio ASIC will monitor queue depths for all configured queues and build/export JVision telemetry packets encoded in Google Protocol Buffer (GPB) format with all necessary information. Since all required tasks are performed in-line in the Trio ASIC without adding any additional load on the line card and routing engine CPUs, the queue monitoring sensor can monitor a large number of queues (32,000 queues for MPC7 and 64,000 queues for MPC8/9) simultaneously.


 Screen Shot 2017-08-04 at 3.41.15 PM.png



The following configuration will enable the queue-mon sensor on interface et-5/0/0:

Screen Shot 2017-08-04 at 3.42.12 PM.png

The GPB proto format for the queue-mon sensor:

Screen Shot 2017-08-04 at 3.44.53 PM.png

Screen Shot 2017-08-04 at 3.45.12 PM.png




With the addition of the queue monitoring sensor to the existing library of rich sensors in JTI, network operators will have much better visibility into queue utilization as compared to the average utilization supported on most routers. And this continues Juniper’s commitment to producing the single most automation-friendly network operating system in the industry.



by sgopalkr
on ‎08-08-2017 07:58 AM

Wow! That's a good read.


Just a couple of thoughts on the future work:


Do you think this data can be collected and be used by some kind of AI system to predict the traffic in a router? I had done a school project to collect cpu data to predict and scale VMs based on their usage. I think something similar can be done here to get a traffic prediction and take preventive actions.


Also, how difficult is it to translate these effects to the actual services that get affected. I think it's still a huge task for the network admin to interpret this into business meaningful data.




by Distinguished Expert
on ‎08-13-2017 05:44 AM



You are correct to see the connection between telemetry and predictive network operations.  Juniper has outlined a vision of the "other SDN" Self Driving Networks that behave just as you ask.  Taking in data from all these sources and making automated corrections to drive the network on demain.


White Papers on the Self Driving Network Part 1 and 2



on ‎08-21-2017 11:34 AM
Now this is my kind of topic.
What would be awesome would be for this functionality to be provided on SRX, moreover, on SRX HE devices running ExpressPath/Services-Offload. I note that this capability seems to have a dependency on Trio, suggesting it is an MX only feature. Is that the case and are other platforms like SRX on the viable roadmap for this?
by Juniper Employee
on ‎08-21-2017 01:50 PM
QFX5100 supports microburst threshold monitoring/data export.
by Mohit Singh
on ‎09-04-2017 11:04 AM
Simply great ! You are also allowing to counter shrewd DADOS through your 'the other SDN' and it's surely the first SDN'++. Machine learning can help congestion control deal with all these figures while deciding and tuning algorithms and their parameters.

The 'other control' is seemingly simply great.
by Mohit Singh
on ‎09-04-2017 11:11 AM
15 years ago, I got an IEEE communications magazine paper on COPS based management (Salsano et al) and it's realisation efforts let me to bear comments like - how do you know when and how and what to control.

15 years later, the future seems a better understood history through things like this.
Juniper Networks Technical Books
About the Author
  • Ben has been working with service providers around the world for the last 15 years developing business cases for a variety of product concepts and new ventures. Ben holds an MBA from MIT and a BS & MS in Mechanical Engineering from Johns Hopkins University.
  • Part of Juniper PS EMEA since 2005 Primarily interested in making technology do the boring repetitive work so I can do fun new work.
  • Donyel Jones-Williams is the Director of Service Provider Product Marketing Management overseeing all of Juniper's Service Provider Products for Juniper Networks. In this role, he leads all of the internal and external marketing activities for Juniper with respect to routing, automation, SDN and NFV. Prior to joining Juniper Networks in January 2014, Donyel was a Senior Product Line Manager for Cisco Systems with in the High End Optical Routing Group managing product lifecycle for multiple products lines helping telecom providers operate efficiently and effectively including; ONS 155xx Product Family, ONS 15216, ONS 15454 MSTP, Carrier Packet Transport Product Family, ME 2600x, & ASR 9000v. He also negotiated favorable agreements with 3rd-party vendors furnishing components and parts and conducted both outbound and inbound marketing (webinars, case study-development, developed and delivered both business & technical at Cisco Live 2005-2012). Donyel graduated from California Polytechnic State University-San Luis Obispo with a Bachelor of Science in Computer Science. While attending Cal Poly SLO he was a collegiate student athlete playing football as a wide receiver and a key member of the National Society of Black Engineers. Donyel is now an active volunteer for V Foundation.
  • Dwayne loves everything related to automation and enjoys talking about it: Automation benefits outweigh any associated disruption.
  • Ebben Aries is a Principal Engineer for Junos Manageability in the Juniper Development and Innovation Division.
  • Michael Pergament, JNCIE-SP #510, JNCIE-ENT #23, JNCIE-DC #3
  • Marcel Wiget is a member of the Routing TME team. His career within Juniper started back in 2009 as a Senior Systems Engineer driving one of the first MX based Broadband Edge deployment to success. Prior to Juniper, Marcel held various positions in pre-sales, professional services and development at Chantry Networks, Spring Tide, Nortel Networks and Wellfleet.
  • Surya Nimmagadda is a Distinguished Engineer working on packet forwarding software for Juniper Networks routing and switching platforms.
  • Pallavi Mahajan is Vice President Engineering, Junos Engineering, and leads the Junos Programmability & Automation teams
  • Product Manager, JUNOS Automation