Junos
Highlighted
Junos

High CPU load on FPC with little traffic

[ Edited ]
‎03-08-2020 01:33 PM

Hello,

 

on our MX240 with Junos 17.3R3.10 and 2x MPC 3D 16x 10GE we got the problem that FPC2 is nearly always at 100% CPU Load, FPC1 is around 80%. We only have 1.2GB/s Traffic on this device..

 

I already disabled traffic sampling and rpf-checks, but it didn't help.

show chassis fpc    
                     Temp  CPU Utilization (%)   CPU Utilization (%)  Memory    Utilization (%)
Slot State            (C)  Total  Interrupt      1min   5min   15min  DRAM (MB) Heap     Buffer
  0  Empty           
  1  Online            46     80         23       71     72     75    2048       36         29
  2  Online            46    100         26       96     92     94    2048       36         43

 

request pfe execute target fpc2 command "show threads"    
SENT: Ukern command: show threads

PID PR State     Name                   Stack Use  Time (Last/Max/Total) cpu
--- -- -------   ---------------------  ---------  ---------------------
  1 H  asleep    Maintenance            304/2048   0/0/2 ms  0%
  2 L  running   Idle                   320/2048   0/0/2701361988 ms 16%
  3 H  asleep    Timer Services         312/2056   0/0/1387505 ms  0%
  5 L  asleep    Ukern Syslog           312/4096   0/0/0 ms  0%
  6 L  asleep    Heap Accouting         496/4104   0/1/32472580 ms  0%
  7 L  asleep    Sheaf Background       400/2056   0/0/24175 ms  0%
  8 H  ready     IGMP                  1248/16384  0/66/3626921 ms  0%
  9 H  asleep    IPv4 PFE Control Background   296/8200   0/0/0 ms  0%
 10 M  asleep    OTN                    376/8200   0/0/115551 ms  0%
 11 M  asleep    GR253                  408/4096   0/0/107172 ms  0%
 12 M  asleep    CXP                    424/4104   0/0/103846 ms  0%
 13 M  asleep    QSFP                   536/4096   0/0/148381 ms  0%
 14 M  asleep    DCC Background         280/4096   0/0/0 ms  0%
 15 M  asleep    DSX50ms                328/4104   0/0/2179498 ms  0%
 16 M  asleep    DSXonesec              320/4096   0/0/106497 ms  0%
 17 M  asleep    mac_db                 296/8192   0/0/0 ms  0%
 18 M  asleep    RSMON syslog thread   2424/4104   0/35/12051441 ms  0%
 19 M  asleep    MSA300PIN              376/4096   0/0/17077 ms  0%
 20 M  asleep    CFP                    448/4104   0/0/111144 ms  0%
 21 M  asleep    XFP                    424/4096   0/0/106951 ms  0%
 22 M  asleep    SFP                   1856/4096   0/34/38453557 ms  0%
 23 L  asleep    Firmware Upgrade       320/4104   0/0/0 ms  0%
 24 L  asleep    Syslog                1072/4104   0/0/64668 ms  0%
 25 M  asleep    50ms Periodic          224/8192   0/0/0 ms  0%
 26 M  asleep    100ms Periodic         224/8200   0/0/0 ms  0%
 27 M  asleep    1s Medium Periodic     632/8192   0/0/1458605 ms  0%
 28 M  asleep    10s Medium Periodic   1536/8200   0/0/12421 ms  0%
 29 L  asleep    1s Low Periodic        896/8192   0/0/792266 ms  0%
 30 L  asleep    10s Low Periodic       320/8200   0/0/181667 ms  0%
 31 M  asleep    TTRACE Creator         360/4104   0/0/0 ms  0%
 32 M  asleep    TTRACE Tracer          432/4096   0/0/0 ms  0%
 33 L  asleep    LKUP ASIC Wedge poll thread   696/4104   0/0/5695833 ms  0%
 34 L  asleep    TOE Coredump           408/4104   0/0/0 ms  0%
 35 L  asleep    L2PD                   392/4096   0/0/2168328 ms  0%
 36 L  asleep    PQ3 PCI Periodic      1472/8192   0/0/77768 ms  0%
 37 M  asleep    Host Loopback Periodic   448/8200   0/0/438680 ms  0%
 38 M  asleep    HSL2                   944/4096   0/1/33411471 ms  0%
 39 H  asleep    TCP Timers            1544/8200   0/0/3662812 ms  0%
 40 H  asleep    TCP Receive           1760/8192   0/1/246461063 ms  1%
 41 H  asleep    TNP Hello              504/2048   0/0/523088 ms  0%
 42 M  asleep    UDP Input              344/2048   0/0/3876 ms  0%
 43 H  asleep    TTP Receive           1896/4096   0/1/224172092 ms  1%
 44 H  asleep    TTP Transmit          1528/4104   0/9/2452163527 ms 15%
 45 H  asleep    RDP Timers             208/4096   0/0/0 ms  0%
 46 H  asleep    RDP Input              280/2056   0/0/0 ms  0%
 47 M  asleep    RCM Pfe Manager        824/4104   0/0/1 ms  0%
 48 L  asleep    CLNS Err Input         280/4096   0/0/0 ms  0%
 49 L  asleep    CLNS Option Input      280/4104   0/0/0 ms  0%
 50 H  asleep    L2TP-SF KA Transmit    296/4096   0/0/0 ms  0%
 51 M  asleep    RPM Msg thread         368/8200   0/0/0 ms  0%
 52 M  asleep    RFC2544 periodic       456/8192   0/0/104912 ms  0%
 53 H  asleep    Pfesvcsor              592/8192   0/0/8816419 ms  0%
 54 M  asleep    PIC Periodic          1736/8192   0/1/141307810 ms  0%
 55 M  asleep    PIC                    208/4104   0/0/0 ms  0%
 56 M  asleep    TNPC CM               8576/16384  0/494/67588052 ms  0%
 57 M  asleep    CLKSYNC Manager       1616/8200   0/0/19812 ms  0%
 58 M  asleep    RDMAN                 1536/4096   0/0/4911 ms  0%
 59 H  asleep    CFM Manager           1384/32776  0/0/9086111 ms  0%
 60 M  asleep    CFM Data thread       1384/8192   0/0/0 ms  0%
 61 M  asleep    PPM Manager           4760/8200   0/0/65659 ms  0%
 62 M  ready     PPM Data thread       1712/16392  0/0/15890963 ms  0%
 63 L  asleep    IFCM                  1640/4096   0/0/6453 ms  0%
 64 M  asleep    VRRP Manager          1440/8200   1/1/1 ms  0%
 65 M  asleep    L2ALM Manager         2256/8192   0/0/147167 ms  0%
 67 L  asleep    ICMP6 Input           1576/4104   0/1/4925313 ms  0%
 68 L  asleep    IP6 Option Input      1400/4096   0/0/0 ms  0%
 69 L  asleep    ICMP Input            1104/4096   0/1/14828845 ms  0%
 70 L  asleep    IP Option Input       1384/4104   0/0/191 ms  0%
 71 M  asleep    IGMP Input            1384/4096   0/0/0 ms  0%
 72 L  asleep    DFW Alert              688/4104   0/0/91414 ms  0%
 73 L  asleep    cos halp stats daemon  1296/4104   0/1/28315712 ms  0%
 74 L  asleep    NH Probe Service       304/4096   0/0/96 ms  0%
 75 L  asleep    IPC Test Daemon        672/2056   0/0/0 ms  0%
 76 M  asleep    PFE Manager           9760/32776  0/82/3007834430 ms 18%
 77 L  asleep    PFEMAN Service Thread  1176/16384  0/0/0 ms  0%
 78 L  asleep    PFEMAN SRRD Thread     504/16392  0/0/8291 ms  0%
 79 H  asleep    SNTP Daemon           1488/8200   0/0/6421 ms  0%
 81 L  asleep    Console               2224/16384  0/0/0 ms  0%
 82 L  asleep    Console               2224/16392  0/0/0 ms  0%
 83 M  asleep    PFE Statistics        4392/16384  0/2/35420061 ms  0%
 84 L  asleep    VBF Walker             352/16384  0/0/0 ms  0%
 85 L  asleep    VBF MC Purge           256/8200   0/0/0 ms  0%
 86 M  asleep    PZARB Timeout          336/4104   0/0/0 ms  0%
 87 L  asleep    LU Background Service  1664/4104   0/1/708882863 ms  4%
 88 L  ready     LKUP ASIC UCODE Rebalance Service  1760/4096   1/8/182377472 ms  1%
 89 M  asleep    MQ Chip                800/4104   0/0/569788 ms  0%
 90 L  asleep    MQ Chip Stats          640/4096   0/1/75520530 ms  0%
 91 M  asleep    PZARB Timeout          336/4096   0/0/0 ms  0%
 92 M  asleep    MQ Chip                584/4096   0/0/575772 ms  0%
 93 L  asleep    MQ Chip Stats          640/4104   0/1/75106159 ms  0%
 94 M  asleep    PZARB Timeout          336/4104   0/0/0 ms  0%
 95 M  asleep    MQ Chip                800/4096   0/0/573469 ms  0%
 96 L  asleep    MQ Chip Stats          640/4104   0/1/75075026 ms  0%
 97 M  asleep    PZARB Timeout          320/4104   0/0/0 ms  0%
 98 M  asleep    MQ Chip                584/4104   0/0/570315 ms  0%
 99 L  asleep    MQ Chip Stats          640/4096   0/1/74962658 ms  0%
100 M  asleep    Cassis Free Timer     1024/4104   0/4/1608653986 ms 10%
101 M  asleep    JNH Partition Mem Recovery  1080/4096   0/1/872710 ms  0%
102 M  asleep    LU-CNTR Reader         392/8200   0/0/4005 ms  0%
103 M  asleep    Stats Page Ager        384/8200   0/0/15808 ms  0%
104 H  asleep    Cube Server           1392/4104   0/0/484073 ms  0%
105 L  asleep    IP Reassembly         2224/4096   1/1/2814085 ms  0%
106 M  asleep    Services TOD          1192/4096   0/0/2819106 ms  0%
107 M  asleep    Trap_Info Read PFE 0.0   704/4104   0/0/60620 ms  0%
108 M  asleep    Services TOD          1192/4104   0/0/2817964 ms  0%
109 M  asleep    Trap_Info Read PFE 1.0   704/4104   0/0/55583 ms  0%
110 M  asleep    Services TOD          1192/4096   0/0/2817119 ms  0%
111 M  asleep    Trap_Info Read PFE 2.0   704/4096   0/0/56795 ms  0%
112 M  asleep    Services TOD          1192/4104   0/0/2847280 ms  0%
113 M  asleep    Trap_Info Read PFE 3.0   704/4104   0/0/96236 ms  0%
114 L  asleep    JNH Exception Counter Background Thread  1640/4096   0/2/4613397 ms  0%
115 L  asleep    DDOS Policers         2560/4096   0/4/80878031 ms  0%
116 L  asleep    jnh errors daemon      376/4104   0/0/15745 ms  0%
117 L  asleep    JNH KA Transmit       1504/4104   0/0/474769 ms  0%
118 L  asleep    VBF PFE Events         352/4104   7/7/7 ms  0%
119 M  asleep    bulkget Manager       4920/8192   0/1/34584727 ms  0%
120 M  asleep    PRECL Chip Generic     488/4096   0/0/182022 ms  0%
121 M  asleep    PRECL Chip Generic     488/4096   0/0/180806 ms  0%
122 M  asleep    PRECL Chip Generic     488/4104   0/0/176042 ms  0%
123 M  asleep    PRECL Chip Generic     488/4104   0/0/181378 ms  0%
163 L  asleep    Virtual Console        944/32776  0/0/0 ms  0%
167 L  running   Cattle-Prod Daemon    4272/32768  0/0/6 ms  0%
168 L  asleep    Cattle-Prod Daemon    2128/32776  0/0/0 ms  0%

I also noticed these log messages, which appears for both FPC's and ~10 times per day:

show log messages | grep fpc
Mar 8 18:05:15 fpc2 io_err bus 0 busy timeout Mar 8 18:05:15 fpc2 Failed to disable PCA9548(0x76)->channel(0-7) Mar 8 18:05:15 fpc2 i2c_npc_pca8548_cleanup: Failed to disable I2C Mux Mar 8 18:05:15 fpc2 PQ3_IIC(WR): bus 0 busy timeout Mar 8 18:05:15 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0xa1, i2c_ctl[0]=0x80, bus_addr=0x76) Mar 8 18:05:16 fpc2 io_err bus 0 busy timeout Mar 8 18:05:16 fpc2 Failed to enable PCA9548(0x76):grp(0x0)->channel(5) Mar 8 18:05:16 fpc2 PQ3_IIC(WR): bus 0 busy timeout, attempting to clear Mar 8 18:39:03 fpc2 PQ3_IIC(WR): bus transfer timeout on byte 0 Mar 8 18:39:03 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x85, i2c_ctl[0]=0xb0, bus_addr=0x70) Mar 8 18:39:03 fpc2 Failed to disable PCA9548(0x70)->channel(0-7) Mar 8 19:03:03 fpc2 PQ3_IIC(WR): bus transfer timeout on byte 1 Mar 8 19:03:03 fpc2 PQ3_IIC(WR): transfer not complete on byte 1 Mar 8 19:03:03 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x21, i2c_ctl[0]=0xb0, bus_addr=0x76) Mar 8 19:03:03 fpc2 Failed to enable PCA9548(0x76):grp(0x0)->channel(5) Mar 8 19:48:55 fpc1 PQ3_IIC(RD): bus transfer timeout on byte 0 Mar 8 19:48:55 fpc1 PQ3_IIC(RD): transfer not complete on byte 0 Mar 8 19:48:55 fpc1 PQ3_IIC(RD): I/O error (i2c_stat=0x25, i2c_ctl[0]=0xb0, bus_addr=0x1c) Mar 8 19:48:55 fpc1 PQ3_IIC(WR): bus arbitration lost on byte 0 Mar 8 19:48:55 fpc1 PQ3_IIC(WR): I/O error (i2c_stat=0x93, i2c_ctl[0]=0x80, bus_addr=0x76) Mar 8 19:48:55 fpc1 PQ3_IIC(WR): bus 0 busy timeout Mar 8 19:48:56 fpc1 PQ3_IIC(WR): I/O error (i2c_stat=0xa1, i2c_ctl[0]=0x90, bus_addr=0x76) Mar 8 19:48:56 fpc1 io_err bus 0 busy timeout Mar 8 19:48:56 fpc1 Failed to disable PCA9548(0x76)->channel(0-7) Mar 8 19:48:56 fpc1 i2c_npc_pca8548_cleanup: Failed to disable I2C Mux Mar 8 19:48:56 fpc1 PQ3_IIC(WR): bus 0 busy timeout, attempting to clear Mar 8 20:04:01 fpc2 PQ3_IIC(WR): bus transfer timeout on byte 0 Mar 8 20:04:01 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x81, i2c_ctl[0]=0xb0, bus_addr=0x76) Mar 8 20:04:01 fpc2 Failed to enable PCA9548(0x76):grp(0x0)->channel(5) Mar 8 20:08:57 fpc2 PQ3_IIC(WR): bus transfer timeout on byte 1 Mar 8 20:08:57 fpc2 PQ3_IIC(WR): transfer not complete on byte 1 Mar 8 20:08:57 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x21, i2c_ctl[0]=0xb0, bus_addr=0x76) Mar 8 20:08:57 fpc2 Failed to enable PCA9548(0x76):grp(0x0)->channel(5) Mar 8 20:21:53 fpc2 PQ3_IIC(RD): bus transfer timeout on byte 0 Mar 8 20:21:53 fpc2 PQ3_IIC(RD): transfer not complete on byte 0 Mar 8 20:21:53 fpc2 PQ3_IIC(RD): I/O error (i2c_stat=0x25, i2c_ctl[0]=0xb0, bus_addr=0x1c) Mar 8 20:21:54 fpc2 PQ3_IIC(WR): bus arbitration lost on byte 0 Mar 8 20:21:54 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x93, i2c_ctl[0]=0x80, bus_addr=0x76) Mar 8 20:21:54 fpc2 PQ3_IIC(WR): bus 0 busy timeout

Any ideas how to track this down? Maybe a software bug?

12 REPLIES 12
Highlighted
Junos

Re: High CPU load on FPC with little traffic

‎03-08-2020 06:45 PM

HI, looks like these messages matches to PR1374450

Highlighted
Junos

Re: High CPU load on FPC with little traffic

‎03-09-2020 02:20 AM

thank you, and the high CPU usage?

we have 100% CPU Usage on FPC2, how can this be with so less traffic?

 

disabling firewall filters and rpf-checks didnt push the cpu usage down. Problem is that the snmp graphs have gaps, because the CPU is not responding on any SNMP measurement

Highlighted
Junos

Re: High CPU load on FPC with little traffic

‎03-09-2020 11:20 AM

Hi Freemind

 

I hope you are doing great,

 

Please do:

 

>monitor traffic interface irb no-resolve size 1500

 

Do not specify the IRB number

 

Check what kind of traffic is hitting the RE and make sure it is legit, you can also share the output here if possible and I can provide some feedback.

 

Warm regards!

Pablo Restrepo -

Highlighted
Junos

Re: High CPU load on FPC with little traffic

‎03-09-2020 11:50 AM

Hi Freemind,

 

Greetings, Are you using an MPC2 or MPC5  by any chance? Also, can you share the output from the following commands?

 

From CLI

show interfaces extensive |match "phy|uni|br|mul" (Check if a specific interface has an abnormal utilization)
show chassis routing-engine
show system virtual-memory | no-more

show chassis environment (Check temperatures)

show task io

show system virtual-memory

show system alarms
show chassis alarms

show system core-dumps
request pfe execute target fpc2 command "show halp-pkt pkt-stats"
show system processes extensive all-members | except 0.00 ( which process is going high, take 3 to 5 iterations)

 

From shell:

 

Root% rtsockmon –nt (Gather the output for 5 minutes or so)
Root% top –H (Gather the output for 10 to 20 secs)

 

If this solves your problem, please mark this post as "Accepted Solution" so we can help others too \Smiley Happy/

Regards,

Lil Dexx
JNCIE-ENT#863, 3X JNCIP-[SP-ENT-DC], 4X JNCIA [cloud-DevOps-Junos-Design], Champions Ingenius, SSYB

 

 

Highlighted
Junos

Re: High CPU load on FPC with little traffic

‎03-09-2020 01:12 PM

Looks like some traffic is congesting PFE. Please put back the filter with last term discard/syslog. Check if you can capture some culprit flows.

 

Other things to look at:

1. >show policer >>>If there's any ARP policer drop

2. DDOS violation 


Mengzhe Hu
JNCIE x 3 (SP DC ENT)
Highlighted
Junos

Re: High CPU load on FPC with little traffic

‎03-09-2020 01:15 PM

Hello Freemind,

 

This definitely looks like a bug, i tried searching for something that could provide the solution, but I could not find, based on the logs the PR1374450 mention may fit, but since it doesn't have a solution is almost like having nothing. 

 

Is there impact in the FPC performace? you can try disconnecting all cables on the FPC and see if the usage goes down, and then connect the ports systematically to verify if the problem is related to one port or some ports, then you can move from there.

 

If you open a case, the engineers may be able to check the FPC scheduler and verify if a diagnose can be provided.

 

The easiest solution, for me would be to upgrade the chassis to the latest recommended relase and check the results. you can maybe try the latest 18.x or 19.x code and check the FPC usage.

 

Cheers.

Benjamin

 

Highlighted
Junos

Re: High CPU load on FPC with little traffic

‎03-09-2020 03:43 PM

Hello guys,

 

thanks for your effort first, really nice! Smiley Happy

 

I tested all your suggestions and the most commands didn't show any anomalies. The best find was "rtsockmon -nt" - which showed a real SPAM of route add/delete in very short time(1MB of text in just 10 seconds), so I grepped for one IP:

 

% rtsockmon -nt | grep "1.2.3.4"
Spoiler
[23:10:00:237.935] kernel P route add inet 1.2.3.4 tid=0 plen=32 type=dest flags=0x180 nh=hold nhflags=0x1 nhidx=85960 rt_nhiflist = 25484 altfwdnhidx=0 filtidx=0 lr_id = 0 featureid=0 rpf:len=1 idx=25484 rt_mcast_nhiflist=0
[23:10:03:971.249] kernel P route delete inet 1.2.3.4 tid=0 plen=32 type=dest flags=0x180 nh=hold nhflags=0x1 nhidx=85960 rt_nhiflist = 25484 altfwdnhidx=0 filtidx=0 lr_id = 0 featureid=0 rpf:len=1 idx=25484 rt_mcast_nhiflist=0
[23:10:22:334.759] kernel P route add inet 1.2.3.4 tid=0 plen=32 type=dest flags=0x180 nh=hold nhflags=0x1 nhidx=102202 rt_nhiflist = 25484 altfwdnhidx=0 filtidx=0 lr_id = 0 featureid=0 rpf:len=1 idx=25484 rt_mcast_nhiflist=0
[23:10:26:361.518] kernel P route delete inet 1.2.3.4 tid=0 plen=32 type=dest flags=0x180 nh=hold nhflags=0x1 nhidx=102202 rt_nhiflist = 25484 altfwdnhidx=0 filtidx=0 lr_id = 0 featureid=0 rpf:len=1 idx=25484 rt_mcast_nhiflist=0
[23:11:07:530.544] kernel P route add inet 1.2.3.4 tid=0 plen=32 type=dest flags=0x180 nh=hold nhflags=0x1 nhidx=51565 rt_nhiflist = 25484 altfwdnhidx=0 filtidx=0 lr_id = 0 featureid=0 rpf:len=1 idx=25484 rt_mcast_nhiflist=0
[23:11:11:238.831] kernel P route delete inet 1.2.3.4 tid=0 plen=32 type=dest flags=0x180 nh=hold nhflags=0x1 nhidx=51565 rt_nhiflist = 25484 altfwdnhidx=0 filtidx=0 lr_id = 0 featureid=0 rpf:len=1 idx=25484 rt_mcast_nhiflist=0

I dont know what is causing this - shouldn't this route stay longer without beeing removed and readded?

Will try to disable the ports on the FPC tomorrow in a maintenance.. Do you guys have any idea? 

 

kind regards

Highlighted
Junos

Re: High CPU load on FPC with little traffic

‎03-10-2020 02:12 AM

Hi,

 

You can check for any link or neighborship flaps on the device. Such an event would trigger the continuous re-programming of routes we are seeing. This re-programming is choking up the KRT queue, and eventually resulting in high CPU on the FPC. You can monitor the KRT queue status using the below commands : 

 

show krt queue

show krt state

 

You would see the route updates pending to be sent to the PFE.

 

While you check for neighbor flaps, do check for wrong routing config as well, like a static route with a wrong/flapping next hop as well.

 

 

Vishal

Highlighted
Junos

Re: High CPU load on FPC with little traffic

‎03-10-2020 07:51 AM

I did not find any flapping interface or BGP Session.

show krt queue
Spoiler
Routing table add queue: 0 queued
Interface add/delete/change queue: 0 queued
Top-priority deletion queue: 0 queued
Top-priority change queue: 0 queued
Top-priority add queue: 0 queued
high priority V4oV6 tcnh delete queue: 0 queued
high prioriy anchor gencfg delete queue: 0 queued
High-priority multicast add/change: 0 queued
Indirect next hop top priority add/change: 0 queued
Indirect next hop add/change: 0 queued
high prioriy anchor gencfg add-change queue: 0 queued
MPLS add queue: 0 queued
Indirect next hop delete: 0 queued
High-priority deletion queue: 0 queued
MPLS change queue: 0 queued
High-priority change queue: 0 queued
High-priority add queue: 0 queued
Normal-priority indirect next hop queue: 0 queued
Normal-priority deletion queue: 0 queued
Normal-priority composite next hop deletion queue: 0 queued
Low prioriy Statistics-id-group deletion queue: 0 queued
Normal-priority change queue: 0 queued
Normal-priority add queue: 0 queued
Least-priority delete queue: 0 queued
Least-priority change queue: 0 queued
Least-priority add queue: 0 queued
Normal-priority pfe table nexthop queue: 0 queued
EVPN gencfg queue: 0 queued
Normal-priority gmp queue: 0 queued
Routing table delete queue: 0 queued
Low priority route retry queue: 0 queued
show krt state
Spoiler

General state:
Install job is not running
Number of operations queued: 0
Routing table adds: 0
Interface routes: 0
High pri multicast Adds/Changes: 0
Indirect Next Hop Adds/Changes: 0 Deletes: 0
MPLS Adds: 0 Changes: 0
High pri Adds: 0 Changes: 0 Deletes: 0
Normal pri Indirects: 0
Normal pri Adds: 0 Changes: 0 Deletes: 0
GMP GENCFG Objects: 0
Routing Table deletes: 0
Number of operations deferred: 0
Number of operations canceled: 0
Number of async queue entries: 0
Number of async non queue entries: 0
Time until next queue run: 0
Routes learned from kernel: 549437

Routing socket lossage:
Time until next scan: 35

any other ideas?

Highlighted
Junos

Re: High CPU load on FPC with little traffic

‎03-11-2020 09:15 PM

hello,

upgrading to the latest(recommended) release did NOT help, now the high CPU load is on fpc1 and not on fpc2 anymore, nothing more changed. Any ideas?

 

show chassis fpc    
                     Temp  CPU Utilization (%)   CPU Utilization (%)  Memory    Utilization (%)
Slot State            (C)  Total  Interrupt      1min   5min   15min  DRAM (MB) Heap     Buffer
  0  Empty           
  1  Online            48     99         22       91     74     39    2048       35         24
  2  Online            49     60         13       59     51     27    2048       34         24
Highlighted
Junos

Re: High CPU load on FPC with little traffic

‎03-12-2020 02:10 AM

Hi,

I'd suggest we'd need to follow your finding "real SPAM of route add/delete in very short time(1MB of text in just 10 seconds)". Can you check that output for the top routes churning, find out by way of which protocol you're learning it how and follow that direction.

Regards

Ulf

--
If this worked for you please flag my post as an 'Accepted Solution' so others can benefit. A kudo would be cool if you think I earned it.
Highlighted
Junos

Re: High CPU load on FPC with little traffic

[ Edited ]
‎03-12-2020 10:22 AM

Freemind,

 

This looks weird, the upgrade help in moving the problem from one FPC to the other, the only thing that comes to my mind is that the problem moved due to a redundancy feature, so maybe the impacting port/interface is active now on the other FPC.

 

one thing, are the logs for "

fpc1 i2c_npc_pca8548_cleanup: Failed to disable I2C Mux 

" still showing in the messages file after the upgrade?

 

As Ulf mentioned, we need to track the 'real spam' hogging the FPC, for a deep dive JTAC may be the best option, but I also think that checking from where the changes add/delete shown in the rtsockmon should help you resolving the issue. For  me, I would systematically check interfaces as possible causes, it may be dumb and slow, but problem may be due to a particular port or group of ports.

 

One thing I wonder now, what if you shut down the FPC1, will the CPU go high on the FPC2, can you move the problematic state between FPCs somehow?

 

Well, also at this point I think you may like to ask JTAC if they can help with the diagnosis.

 

Good Luck,

Cheers,

Benjamin