Ethernet Switching
Highlighted
Ethernet Switching

QFX VC suddenly stops handling packets

‎04-22-2020 02:12 PM

Hi,

Two times the last couple of weeks a QFX (5100-48s) VC (two switches) has just stop handling packets on two different L2 trunk ports.

And the only way (that i have found) to get it to start handling packets again is to change MTU on the port.

How and what should i search for in the logs to get to the bottom of this problem?

 

After i change MTU on the port (from lets say 1600 to 1599, or from 1514 to 1516) i get this in the logs:

LBCM-L2,brcm_port_init_def(),1110:(brcm_port_init_def:1110) Setting L2 Learn (unit 0, port_num 3), learn_flg 0
LBCM-L2,brcm_ifl_l2_init(),2300:(brcm_ifl_l2_init:2300) Setting port learning config :ge-0/0/2, learn_flg 5, move_flg 13
LBCM-L2,brcm_port_learning_config(),1259:(brcm_port_learning_config:1259) Setting L2 learning unit:0,port_num:3, learn_flg 5

And the traffic flows as normal again.

 

Kind Regards

ehsab

7 REPLIES 7
Highlighted
Ethernet Switching

Re: QFX VC suddenly stops handling packets

‎04-22-2020 07:29 PM
Hello ehsab,

Changing MTU on the interface is a disruptive process and causing the interface address to be reset, so I don't believe these log messages are giving much clue rather the logs may be expected on any QFX you change MTU on. Also check if the interface itself is flapping after the change? So you may just be masking the problem which temporarily fixes due to this reset, unless you have any technical reason like large packets getting dropped etc. that correlates to MTU.

It's better to spend a few minutes in troubleshooting when the issue occurs to get to the bottom of this. Some questions that come to mind are:
a) Check if there are protocol adjacencies over the impacted trunks and if they remain stable.
b) Check if all traffic is impacted or some? When you say VC stops handling packets, does it mean there's ingress but no egress?
c) Check and clear interface statistics - for broadcast packet count, errors if any.
d) Check if MAC learning works as expected:
show ethernet-switching table
show ethernet-switching mac-learning-log <<<<<<<<<<<
e) Do we have STP enabled? Check if there's any frequent change topology changes.
Etc.

Hope this helps.

Regards,
-r.

--------------------------------------------------

If this solves your problem, please mark this post as "Accepted Solution."
Kudos are always appreciated :).
Highlighted
Ethernet Switching

Re: QFX VC suddenly stops handling packets

‎04-23-2020 12:16 AM

Hi, and thanks for taking the time.

I'll share the information i gathered in both cases.

 

Case 1

> show configuration interfaces ge-0/0/24
mtu 1600 unit 0 { family ethernet-switching { interface-mode trunk; vlan { members [ v804 v805 ]; } } }

Traffic stops for no obvious reason, no mac-addresses are learnt on that port. The interface has no errors and has not flapped.

> show ethernet-switching interface ge-0/0/24.0
Routing Instance Name : default-switch

Logical          Vlan          TAG     MAC         STP         Logical           Tagging
interface        members               limit       state       interface flags
ge-0/0/24.0                            294912                                     tagged
                 v804          804     294912      Forwarding                     tagged
                 v805          805     294912      Forwarding                     tagged

Looks normal, but no traffic or macs learnt.

Physical interface: ge-0/0/24, Enabled, Physical link is Up
  Interface index: 676, SNMP ifIndex: 573
  Description: 
  Link-level type: Ethernet, MTU: 1600, MRU: 0, Speed: 1000mbps, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Disabled, Auto-negotiation: Enabled, Remote fault: Online, Media type: Fiber
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 12 supported, 12 maximum usable queues
  Current address: ec:3e:f7:97:8d:db, Hardware address: ec:3e:f7:97:8d:db
  Last flapped   : 2020-04-16 08:17:42 UTC (01:37:03 ago)
  Input rate     : 1304 bps (2 pps)
  Output rate    : 1040 bps (1 pps)
  Active alarms  : None
  Active defects : None
  Interface transmit statistics: Disabled
 
  Logical interface ge-0/0/24.0 (Index 591) (SNMP ifIndex 574)
    Flags: Up SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Input packets : 0
    Output packets: 0
    Protocol eth-switch, MTU: 1600
      Flags: Trunk-Mode

ge-0024-bps.jpgge-0024-non-unicast.jpgge-0024-packets.jpg

 

 

Case 2

Traffic drops at 20:48:37 (hh:mm:ss).

> show configuration interfaces ae0
description "Trunk to sw01";
aggregated-ether-options {
    minimum-links 1;
    lacp {
        active;
    }
}
unit 0 {
    family ethernet-switching {
        interface-mode trunk;
        vlan {
            members [ v357 v361 v362 v364 v365 v356 v159 ];
        }
    }
}
> show ethernet-switching interface ae0.0
Routing Instance Name : default-switch

Logical          Vlan          TAG     MAC         STP         Logical           Tagging
interface        members               limit       state       interface flags
ae0.0                                  294912                                     tagged
                 v159          159     294912      Forwarding                     tagged
                 v356          356     294912      Forwarding                     tagged
                 v357          357     294912      Forwarding                     tagged
                 v361          361     294912      Forwarding                     tagged
                 v362          362     294912      Forwarding                     tagged
                 v364          364     294912      Forwarding                     tagged
                 v365          365     294912      Forwarding                     tagged

There is OSPF/BGP running over vlan 159 with an irb interface, i see no OSPF neighbor and the BGP session is down.

No mac-addresses are seen on the port.

Flapping the interface has no affect.

22:52:08 i change mtu from 1514 to 1516 and do a commit confirmed 4 sync, during that 4 minutes nothing changes, still no traffic. But when the VC rollsback the commit, the traffic flows fine again.

 

ae0-bps.jpgae0-non-unicast.jpgae0-packets.jpg

 

STP is globally disabled.

Any ideas what could cause this or what to look for when it happends? Any particular logfile that i should look och search in?

 

Kind Regards

ehsab

Highlighted
Ethernet Switching

Re: QFX VC suddenly stops handling packets

‎04-23-2020 05:44 AM

Hello ehsab,

 

You can keep "set system syslog file messages any any" to see if there's anything on the device logging at the time.  However, from your notes, it seems as though traffic is not even coming in on this interface, is that right? If that's right, then there's a lot of possibilities here.  It's impossible for the L3 protocols to be up and you won't learn any MAC addresses on the port.  So better to take a copy of the MAC table "show ethernet-switching table" in working state and compare in non-working.

 

Also, you've shown two different interfaces altogether (ae0 and ge-0/0/24) does "ae0" have links in both FPCs and are these both expected to be uplinks or connect to complete different network segments? This data isn't given much meat to think about as far as I know.

 

Hope this helps.

Regards,
-r.

--------------------------------------------------

If this solves your problem, please mark this post as "Accepted Solution."
Kudos are always appreciated :).

Highlighted
Ethernet Switching

Re: QFX VC suddenly stops handling packets

‎04-23-2020 05:45 AM

Hello Ehsab,

 

It should be good if you open a case with JTAC.  Those kinkd of issues requires some live troubleshooting.  I may suspect of several things, but at the end a troubleshooting session is a better approach that just assuming things.

 

Those kind of situations most of times are related to memory leaks, high CPU, DDoS policers, file system corruption or a misprogramming issue.

 

Possible workarounds could be a mastership change, a reboot of the affected member or all members in a virtual-chassis, an upgrade or a format install.  However, you will need some troubleshooting to determine the root cause.

 

 

Regards,

 

 

Randall

Highlighted
Ethernet Switching

Re: QFX VC suddenly stops handling packets

‎04-23-2020 12:19 PM

@mriyaz

I'm not sure if there is any traffic coming in to the interfaces when this happens. I will setup a mirror port to better be able to troubleshoot this and to see what actually is on the interface next time.

I did do a copy of the mac table when the problem was active, and compared it with the table once it was "fixed" and it was only missing mac-addresses on that port.

Ge-0/0/24 is connected to a different segment then what ae0 is, and ae0 only has one interface (from the same fpc as ge-0/0/24) at the moment.

There must be some logic to why this has happend to two different trunks/interfaces, there are lots of other interfaces and ae's that could be affected, i would like to narrow it down to whats causing this.

There surely must be some logfiles i could search in to find more information?

What happends in the switch/interface when mtu is changed?

Highlighted
Ethernet Switching

Re: QFX VC suddenly stops handling packets

‎04-23-2020 12:22 PM

@randero

Hi and thanks for your reply.

I don't have a valid support contract on the switches, so i', fairly sure that JTAC whont handle the case.

The workarounds you suggest are last resort for me 🙂

 

Kind Regards

ehsab

 

Highlighted
Ethernet Switching

Re: QFX VC suddenly stops handling packets

‎04-24-2020 05:45 AM

Hi Ehsab,

 

There are some outputs you may check to find a root cause.

 

>show system processes extensive | except 0.00

>show chassis fpc 

>show system core-dumps

>request pfe execute command "show syslog messages" target fpc0 

>show ddos-protection protocols statistics terse

 

 

Not sure if you have those workarounds, but you can try rebooting the devices, upgrade to another version or a recovery install to refresh the file system.

 

Hope this helps 🙂

 

Randall

 

Feedback