Ethernet Switching
Highlighted
Ethernet Switching

EX4600: IPv6 multicast traffic causing all IRB interfaces to become unresponsive

[ Edited ]
‎01-19-2020 12:31 PM

Hi,

I'm hoping that before I contact support, someone could maybe guide me to the right direction over this thing, as I'm not sure whether I'm just not understanding some fundamental aspect of the box.

 

I have been testing streaming over the IPv6 multicast recently with decent bandwidth (~ 40Mbps). I tested that from a computer connected to EX2300, which is then connected to EX4600 (typical star topology), from where we go to the router.

Weirdly, just running the test for a few seconds caused the management traffic on EX4600 to become very slow, and I could no longer ping or SSH to the box, even though it was different VLAN. It was still forwarding traffic, but it was impossible to manage. Ping showed increasingly higher latency times, up to several seconds, then failing to respond.

 

I have determined that the issue was that I had an irb.20 interface with family inet6, which was a L3-interface for our VLAN 20. There was only a link-local address assigned (the use case was a MLD querier/router at the time). Weirdly, when I ran

show system processes extensive

there was no change in CPU utilization during the incident. It therefore looks like some resources or queues became full / exhausted, although I'm not sure which ones.

Also, the multicast traffic was very well visible using

 

monitor traffic interface irb.20

 

This issue does not happen with IPv4 multicast at all.
Also it does not matter whether mld, mld-snooping or igmp-snooping is enabled, the issue is still happening. The issue disappears when family inet6 is deactivated on the irb.20.

 

The situation seems particularly similar to this issue, although I'm sure that the causes are different (my issue does not exhibit on EX2300):
https://forums.juniper.net/t5/Ethernet-Switching/irb-interface-dhcp-periodically-stops-being-reachab...

 

EX4600 SW version: JUNOS 18.4R2-S2.4

 

I'd be very grateful for any hint or advice.
Best regards,
-Pavel

2 REPLIES 2
Highlighted
Ethernet Switching

Re: EX4600: IPv6 multicast traffic causing all IRB interfaces to become unresponsive

‎01-19-2020 07:16 PM

Hi paulosv,

 

Did you try to access the device out of band? Check if there are any CPU queues hit using the following:

show chassis routing-engine
start shell
cprod -A fpc0 -c 'set dc bc "show c cpu"' <<<<<<<<<check a few times repeatedly to find any queue drops.

 

Hope this helps.

Regards,
-r.

--------------------------------------------------

If this solves your problem, please mark this post as "Accepted Solution."
Kudos are always appreciated Smiley Happy.

 

Highlighted
Ethernet Switching

Re: EX4600: IPv6 multicast traffic causing all IRB interfaces to become unresponsive

‎01-20-2020 01:31 PM

Hi mriyaz,

thanks for your reply. The serial console stays responsive, so I have tried to collect some stats from the commands.

 

I have simulated a lighter version of the IPv6 multicast "flood", just enough to trigger the IRB not available.

CPU utilization went from somewhere around 10 percent per user and kernel, and 0 per interrupts:

    5 sec CPU utilization:
      User                       8 percent
      Background                 0 percent
      Kernel                     6 percent
      Interrupt                  0 percent
      Idle                      86 percent

to

    5 sec CPU utilization:
      User                      20 percent
      Background                 0 percent
      Kernel                    15 percent
      Interrupt                  4 percent
      Idle                      61 percent

On the queue subject:
Without the multicast flood, MCQ_DROP_ lines are missing from the output, and the MC_PERQ_BYTE(28) is in order of tens of thousands. With the flood, the drops could be clearly seen:

root@eight:RE:0% cprod -A fpc0 -c 'set dc bc "show c cpu"'


HW (unit 0)
IBCAST.cpu0             :             3,090,549                  +2               1/s
ING_NIV_RX_FR.cpu0      :             6,454,862                  +2               1/s
MC_PERQ_PKT(8).cpu0     :            13,845,492                 +10
MC_PERQ_PKT(14).cpu0    :                14,126                  +1
MC_PERQ_PKT(16).cpu0    :             3,939,840                  +7
MC_PERQ_PKT(19).cpu0    :            43,736,339                +427               1/s
MC_PERQ_PKT(28).cpu0    :            52,214,959             +17,153              87/s
MC_PERQ_PKT(33).cpu0    :            22,367,484                +294
MC_PERQ_PKT(34).cpu0    :            17,237,213                 +45
MC_PERQ_PKT(43).cpu0    :            12,787,910                 +52
MC_PERQ_BYTE(8).cpu0    :         1,945,899,342              +3,148             445/s
MC_PERQ_BYTE(14).cpu0   :               959,596                 +68              41/s
MC_PERQ_BYTE(16).cpu0   :           365,627,616                +786              57/s
MC_PERQ_BYTE(19).cpu0   :         9,152,981,497             +80,180           4,494/s
MC_PERQ_BYTE(28).cpu0   :        42,519,089,155         +23,662,826       3,771,190/s
MC_PERQ_BYTE(33).cpu0   :         1,768,896,420             +25,026             913/s
MC_PERQ_BYTE(34).cpu0   :         1,185,830,552              +3,702             560/s
MC_PERQ_BYTE(43).cpu0   :         1,175,344,202              +4,282             768/s
MCQ_DROP_PKT(28).cpu0   :             1,946,432              +1,522             123/s
MCQ_DROP_BYTE(28).cpu0  :         2,658,538,788          +1,832,544         323,657/s

So the drops are clearly happening... since I can't find anything other in my config that could cause this, I'm probably going to contact our partner with this, to see what they have to say about it.


Best regards,
-Pavel