Junos
Junos

EX8208 high CPU load

03.17.17   |  
‎03-17-2017 12:49 AM

Can anybody help us to identify the root cause of high CPU load on our EX8208 ? 

 

Our EX8208 has JunOS15.1R5.5, few weeks ago we had to change the chassis because there was a bad slot.. After we changed the chassis(only chassis, all cards + RE remain the same) we've got some problems. First of all, in a random period of time, it sends us this logs:

Mar 17 09:22:17 EX8208-ST19-re1 eswd[19882]: ESWD_MAC_SMAC_BRIDGE_MAC_IDENTICAL: Bridge Address Add: 80:71:1f:77:52:01 SMAC is equal to bridge mac hence don't learn
Mar 17 09:24:22 EX8208-ST19-re1 eswd[19882]: ESWD_MAC_SMAC_BRIDGE_MAC_IDENTICAL: Bridge Address Add: 80:71:1f:77:52:01 SMAC is equal to bridge mac hence don't learn
Mar 17 09:26:27 EX8208-ST19-re1 eswd[19882]: ESWD_MAC_SMAC_BRIDGE_MAC_IDENTICAL: Bridge Address Add: 80:71:1f:77:52:01 SMAC is equal to bridge mac hence don't learn

 

Also, we've noticed that there is a high CPU load, by the graphs it has an average of 80% CPU load, while maximum which is most of the time it is 99 - 100%! 

If we check which processes increase CPU load, we see this:

# run show system processes summary
last pid: 20706; load averages: 1.75, 2.36, 2.72 up 8+05:50:26 09:32:11
169 processes: 10 running, 129 sleeping, 30 waiting

Mem: 698M Active, 69M Inact, 246M Wired, 396M Cache, 112M Buf, 574M Free
Swap:


PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND
10 root 155 52 0K 16K RUN 141.5H 47.07% idle
19978 root 76 0 0K 16K RUN 36:41 8.15% peerproxy80000006
19882 root 70 0 67004K 43484K RUN 30:26 5.08% eswd
6778 root 64 0 151M 99104K RUN 229:54 3.71% rpd

 

So, ok, about rpd I know, and it is ok, but why does eswd has such a big CPU load, especially given to the fact that we have no problems with STP(actually we use MSTP), last topology change was ~4-5 hours ago, no MAC flappig(except the log which I wrote before), why does this process till increace CPU load ?

Also, what does peerproxy80000006 process mean ?

Is it ok if we have SNMP enabled, and idle we have just 10%, after that if we deactivate SNMP we have 45% idle.. Is it ok ?

 

We have the problem with multicast traffic which is flowing thourgh this EX(EX has PIM with our RP, and downstream to clients we have igmp snooping enabled). When 1 host joins a multicast group, EX doesn't forward traffic into the clients port,while his IGMP join messages is received by EX and we can see that client is asking for that group on port ge-x/x/x, still he starts forwarding multicast in about 2-3-5 minutes. This is probably because of a high CPU load, as at night, when it's CPU decrease to 40-70% it works fine..

 

 

Also, does anybody know is CPU load of 40-60% on EX8208 ok ? As we have a lot of EX8208's in our network, and most of them has just 35-45% idle.. Still, it depends how many clients we have attached to EX. 

I've attached the graphs which shows CPU load on EX which cause us problems..

Attachments

8 REPLIES
Junos

Re: EX8208 high CPU load

03.17.17   |  
‎03-17-2017 09:11 AM

Do you have a layer 2 loop? Do you see any unexpected duplication of mac addresses on ports? Do you see yourself as an lldp neighbor?

Junos

Re: EX8208 high CPU load

03.17.17   |  
‎03-17-2017 09:45 AM

1)No, there is no layer2 loop, as if it would be, then EX should send logs with "Dublicate MAC address received" or "MAC flapping between two ports" etc...  Also we have MSTP enabled on all ports, and if there would be a layer 2 loop, then we should see every few seconds/minutes topology changes or STP process looping.

 

2)We don't see any strange longs from EX, the only one is:

"Mar 17 18:40:04 EX8208-ST19-re1 eswd[19882]: ESWD_MAC_SMAC_BRIDGE_MAC_IDENTICAL: Bridge Address Add: 80:71:1f:77:52:01 SMAC is equal to bridge mac hence don't learn
Mar 17 18:40:37 EX8208-ST19-re1 eswd[19882]: ESWD_MAC_SMAC_BRIDGE_MAC_IDENTICAL: Bridge Address Add: 80:71:1f:77:52:01 SMAC is equal to bridge mac hence don't learn"

If there would be dublication of mac addresses, then it should send logs.. We can't monitor this as this EX is 6 FPC EX8200-48F, and most of the ports are active, so there is a big layer2 network.. But according to the logs, there is not dublication of mac addresses.

 

3)Actually we don't have lldp enabled, but just for test, I enabled it, and I don't see myself in lldp neighbors table...

 

 

Do you know what is peerproxy80000006  process ? As I see him always RUN and with 7-10% of CPU load.

 

 

# run show system processes summary 
last pid: 22371; load averages: 5.43, 4.30, 4.07 up 8+15:03:10 18:44:55
164 processes: 12 running, 121 sleeping, 31 waiting
Mem: 689M Active, 68M Inact, 247M Wired, 395M Cache, 112M Buf, 584M Free
Swap:

PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND
10 root 155 52 0K 16K RUN 145.0H 35.16% idle
19978 root 76 0 0K 16K RUN 94:52 10.74% peerproxy80000006
19882 root 76 0 67004K 43876K RUN 65:15 5.03% eswd
19890 root 76 0 36136K 16716K RUN 43:07 5.03% mcsnoopd
6783 root 73 0 55984K 25648K RUN 45:41 3.66% chassisd

 

Junos

Re: EX8208 high CPU load

03.17.17   |  
‎03-17-2017 07:43 PM

Hi Folks,

I do find a reference documents which states the below messages are harmless.

 

ESWD_MAC_SMAC_BRIDGE_MAC_IDENTICAL

In this case the message is categorized as harmless and may be trigger when the management and one of rvi in the switch belong to the same network range.

 

https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR442818

 

So we have to concentrate on the other observations...

 

-Python JNCIP-DC|ENT|SP CCIP JNCDA ITIL
#Please mark my solution as accepted if it helped, Kudos are appreciated as well.
Junos

Re: EX8208 high CPU load

03.18.17   |  
‎03-18-2017 02:04 AM

Thanks pythin for your answer.

 

I've checked, and it is not that case:

> show configuration interfaces | display set | match 172.16.
set interfaces lo0 unit 10 family inet address 172.16.178.19/32
set interfaces me0 unit 0 family inet address 172.16.1.19/24 master-only
set interfaces vlan unit 92 family inet address 172.16.216.254/24
set interfaces vlan unit 93 family inet address 172.16.215.254/24
set interfaces vlan unit 96 family inet address 172.16.201.254/24
set interfaces vlan unit 98 family inet address 172.16.213.254/24
set interfaces vlan unit 99 family inet address 172.16.214.254/24
set interfaces vlan unit 100 family inet address 172.16.64.129/26
set interfaces vlan unit 120 family inet address 172.16.116.1/24
set interfaces vlan unit 121 family inet address 172.16.180.77/30
set interfaces vlan unit 174 family inet address 172.16.118.254/24
set interfaces vlan unit 911 family inet address 172.16.180.62/30
set interfaces vlan unit 919 family inet address 172.16.180.6/30

The IP address on me interface is not overlapping, it ha different address the IP addresses on other interfaces..

 

Any other ideas ?

Junos

Re: EX8208 high CPU load

[ Edited ]
03.18.17   |  
‎03-18-2017 04:35 AM

Hello,

 

What are peer devices of EX8208?

Is EX8208 standalone or part of VC?

 

Also is there any correlation between appearance of 'ESWD_MAC_SMAC_BRIDGE_MAC_IDENTICAL' log message and CPU spike(from 80 to 99)?

 

And about 'peerproxy80000006':- 

 

The peer proxy is a kernel thread which gets created when a new peer comes up. Thereafter it is responsible for receiving/sending any updates from/to its peer. 

 

Regards,

 

Rushi

Junos

Re: EX8208 high CPU load

03.20.17   |  
‎03-20-2017 01:18 AM

Hi Rushi,

 

Thanks for your answer, finally I unterstood what is peerproxy process Smiley Happy.

 

EX is standalone, it has 2 OSPF and 2 PIM sessions with 2 Cisco C6509(they act as RP for PIM, and are configured with Anycast RP), there are just few networks which are announced by OSPF ~10-15 from both sides,and there are no other protocols which can establish an adjacency. With the same topology we have about 7 more EX8208 which works fine and they have a CPU load of 40-50%.

EX has 2x10G uplinks in a PE router, there is no routing protocols, only 1 default route to the router.

Also we have about 6 FPC's EX8200-48F(48x 100 Base-FX/1000 Base-X) in most of the cards we have 100% of used ports, and just in 1 of them we have about 50% of used ports. In every port, we have 1-3 access switches(Cisco 2950) which are looped in 2 different cards of EX8208(for redundancy) and we run MSTP on EX and C2950.

I've attached our topology to make it clear.

 

"ESWD_MAC_SMAC_BRIDGE_MAC_IDENTICAL" comes every 30sec - 2min, and there are no so many spikes to 100% load, but in peak hour it is always at 90%-100% load. We disabled SNMP and CPU load got a little bit lower but still it is too high,I attached one more graph, where you can see its CPU load for last few days.

 

Anybody have any ideas ? 

Attachments

Junos

Re: EX8208 high CPU load

03.21.17   |  
‎03-21-2017 01:52 PM

I forgot to say that we have non-stop routing enabled on EX, and I've just noticed that when I reboot backup routing engine, then peeproxy process dissapears, and cpu load is less then 40%, I see about 50-65% idle...

Is this a problem? Or with non-stop routing enabled it should work in this way ? 

Junos

Re: EX8208 high CPU load

03.22.17   |  
‎03-22-2017 12:53 AM

Today I've deactivated graceful-switchover and CPU load went down to 40%:

# show chassis 
inactive: redundancy {
    graceful-switchover;
}

And now my CPU has about 50% idle:

# run show system processes summary    
last pid:  4383;  load averages:  1.27,  2.38,  2.84  up 0+11:13:27    09:52:27
166 processes: 9 running, 127 sleeping, 30 waiting

Mem: 665M Active, 62M Inact, 207M Wired, 356M Cache, 112M Buf, 693M Free
Swap:


  PID USERNAME      PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
   10 root          155   52     0K    16K RUN    339:42 53.03% idle
 2698 root           55    0 48488K 31740K RUN     22:54  6.30% mib2d
 2673 root           59    0 67004K 43784K RUN     28:42  5.03% eswd
 1236 root           49    0 56712K 26316K RUN     19:25  4.69% chassisd
 2070 root           51    0 89344K 34396K RUN     14:08  2.05% rpd
 2678 root           50    0 36136K 16016K RUN      9:30  1.95% mcsnoopd
   40 root          -84 -187     0K    16K RUN     13:33  1.17% irq38: tsec3

Is it normal to have high CPU load with graceful-switchover enabled ?