Junos OS

last person joined: 2 days ago 

Ask questions and share experiences about Junos OS.
Expand all | Collapse all

High CPU load on FPC with little traffic

  • 1.  High CPU load on FPC with little traffic

    Posted 03-08-2020 13:33

    Hello,

     

    on our MX240 with Junos 17.3R3.10 and 2x MPC 3D 16x 10GE we got the problem that FPC2 is nearly always at 100% CPU Load, FPC1 is around 80%. We only have 1.2GB/s Traffic on this device..

     

    I already disabled traffic sampling and rpf-checks, but it didn't help.

    show chassis fpc    
                         Temp  CPU Utilization (%)   CPU Utilization (%)  Memory    Utilization (%)
    Slot State            (C)  Total  Interrupt      1min   5min   15min  DRAM (MB) Heap     Buffer
      0  Empty           
      1  Online            46     80         23       71     72     75    2048       36         29
      2  Online            46    100         26       96     92     94    2048       36         43

     

    request pfe execute target fpc2 command "show threads"    
    SENT: Ukern command: show threads
    
    PID PR State     Name                   Stack Use  Time (Last/Max/Total) cpu
    --- -- -------   ---------------------  ---------  ---------------------
      1 H  asleep    Maintenance            304/2048   0/0/2 ms  0%
      2 L  running   Idle                   320/2048   0/0/2701361988 ms 16%
      3 H  asleep    Timer Services         312/2056   0/0/1387505 ms  0%
      5 L  asleep    Ukern Syslog           312/4096   0/0/0 ms  0%
      6 L  asleep    Heap Accouting         496/4104   0/1/32472580 ms  0%
      7 L  asleep    Sheaf Background       400/2056   0/0/24175 ms  0%
      8 H  ready     IGMP                  1248/16384  0/66/3626921 ms  0%
      9 H  asleep    IPv4 PFE Control Background   296/8200   0/0/0 ms  0%
     10 M  asleep    OTN                    376/8200   0/0/115551 ms  0%
     11 M  asleep    GR253                  408/4096   0/0/107172 ms  0%
     12 M  asleep    CXP                    424/4104   0/0/103846 ms  0%
     13 M  asleep    QSFP                   536/4096   0/0/148381 ms  0%
     14 M  asleep    DCC Background         280/4096   0/0/0 ms  0%
     15 M  asleep    DSX50ms                328/4104   0/0/2179498 ms  0%
     16 M  asleep    DSXonesec              320/4096   0/0/106497 ms  0%
     17 M  asleep    mac_db                 296/8192   0/0/0 ms  0%
     18 M  asleep    RSMON syslog thread   2424/4104   0/35/12051441 ms  0%
     19 M  asleep    MSA300PIN              376/4096   0/0/17077 ms  0%
     20 M  asleep    CFP                    448/4104   0/0/111144 ms  0%
     21 M  asleep    XFP                    424/4096   0/0/106951 ms  0%
     22 M  asleep    SFP                   1856/4096   0/34/38453557 ms  0%
     23 L  asleep    Firmware Upgrade       320/4104   0/0/0 ms  0%
     24 L  asleep    Syslog                1072/4104   0/0/64668 ms  0%
     25 M  asleep    50ms Periodic          224/8192   0/0/0 ms  0%
     26 M  asleep    100ms Periodic         224/8200   0/0/0 ms  0%
     27 M  asleep    1s Medium Periodic     632/8192   0/0/1458605 ms  0%
     28 M  asleep    10s Medium Periodic   1536/8200   0/0/12421 ms  0%
     29 L  asleep    1s Low Periodic        896/8192   0/0/792266 ms  0%
     30 L  asleep    10s Low Periodic       320/8200   0/0/181667 ms  0%
     31 M  asleep    TTRACE Creator         360/4104   0/0/0 ms  0%
     32 M  asleep    TTRACE Tracer          432/4096   0/0/0 ms  0%
     33 L  asleep    LKUP ASIC Wedge poll thread   696/4104   0/0/5695833 ms  0%
     34 L  asleep    TOE Coredump           408/4104   0/0/0 ms  0%
     35 L  asleep    L2PD                   392/4096   0/0/2168328 ms  0%
     36 L  asleep    PQ3 PCI Periodic      1472/8192   0/0/77768 ms  0%
     37 M  asleep    Host Loopback Periodic   448/8200   0/0/438680 ms  0%
     38 M  asleep    HSL2                   944/4096   0/1/33411471 ms  0%
     39 H  asleep    TCP Timers            1544/8200   0/0/3662812 ms  0%
     40 H  asleep    TCP Receive           1760/8192   0/1/246461063 ms  1%
     41 H  asleep    TNP Hello              504/2048   0/0/523088 ms  0%
     42 M  asleep    UDP Input              344/2048   0/0/3876 ms  0%
     43 H  asleep    TTP Receive           1896/4096   0/1/224172092 ms  1%
     44 H  asleep    TTP Transmit          1528/4104   0/9/2452163527 ms 15%
     45 H  asleep    RDP Timers             208/4096   0/0/0 ms  0%
     46 H  asleep    RDP Input              280/2056   0/0/0 ms  0%
     47 M  asleep    RCM Pfe Manager        824/4104   0/0/1 ms  0%
     48 L  asleep    CLNS Err Input         280/4096   0/0/0 ms  0%
     49 L  asleep    CLNS Option Input      280/4104   0/0/0 ms  0%
     50 H  asleep    L2TP-SF KA Transmit    296/4096   0/0/0 ms  0%
     51 M  asleep    RPM Msg thread         368/8200   0/0/0 ms  0%
     52 M  asleep    RFC2544 periodic       456/8192   0/0/104912 ms  0%
     53 H  asleep    Pfesvcsor              592/8192   0/0/8816419 ms  0%
     54 M  asleep    PIC Periodic          1736/8192   0/1/141307810 ms  0%
     55 M  asleep    PIC                    208/4104   0/0/0 ms  0%
     56 M  asleep    TNPC CM               8576/16384  0/494/67588052 ms  0%
     57 M  asleep    CLKSYNC Manager       1616/8200   0/0/19812 ms  0%
     58 M  asleep    RDMAN                 1536/4096   0/0/4911 ms  0%
     59 H  asleep    CFM Manager           1384/32776  0/0/9086111 ms  0%
     60 M  asleep    CFM Data thread       1384/8192   0/0/0 ms  0%
     61 M  asleep    PPM Manager           4760/8200   0/0/65659 ms  0%
     62 M  ready     PPM Data thread       1712/16392  0/0/15890963 ms  0%
     63 L  asleep    IFCM                  1640/4096   0/0/6453 ms  0%
     64 M  asleep    VRRP Manager          1440/8200   1/1/1 ms  0%
     65 M  asleep    L2ALM Manager         2256/8192   0/0/147167 ms  0%
     67 L  asleep    ICMP6 Input           1576/4104   0/1/4925313 ms  0%
     68 L  asleep    IP6 Option Input      1400/4096   0/0/0 ms  0%
     69 L  asleep    ICMP Input            1104/4096   0/1/14828845 ms  0%
     70 L  asleep    IP Option Input       1384/4104   0/0/191 ms  0%
     71 M  asleep    IGMP Input            1384/4096   0/0/0 ms  0%
     72 L  asleep    DFW Alert              688/4104   0/0/91414 ms  0%
     73 L  asleep    cos halp stats daemon  1296/4104   0/1/28315712 ms  0%
     74 L  asleep    NH Probe Service       304/4096   0/0/96 ms  0%
     75 L  asleep    IPC Test Daemon        672/2056   0/0/0 ms  0%
     76 M  asleep    PFE Manager           9760/32776  0/82/3007834430 ms 18%
     77 L  asleep    PFEMAN Service Thread  1176/16384  0/0/0 ms  0%
     78 L  asleep    PFEMAN SRRD Thread     504/16392  0/0/8291 ms  0%
     79 H  asleep    SNTP Daemon           1488/8200   0/0/6421 ms  0%
     81 L  asleep    Console               2224/16384  0/0/0 ms  0%
     82 L  asleep    Console               2224/16392  0/0/0 ms  0%
     83 M  asleep    PFE Statistics        4392/16384  0/2/35420061 ms  0%
     84 L  asleep    VBF Walker             352/16384  0/0/0 ms  0%
     85 L  asleep    VBF MC Purge           256/8200   0/0/0 ms  0%
     86 M  asleep    PZARB Timeout          336/4104   0/0/0 ms  0%
     87 L  asleep    LU Background Service  1664/4104   0/1/708882863 ms  4%
     88 L  ready     LKUP ASIC UCODE Rebalance Service  1760/4096   1/8/182377472 ms  1%
     89 M  asleep    MQ Chip                800/4104   0/0/569788 ms  0%
     90 L  asleep    MQ Chip Stats          640/4096   0/1/75520530 ms  0%
     91 M  asleep    PZARB Timeout          336/4096   0/0/0 ms  0%
     92 M  asleep    MQ Chip                584/4096   0/0/575772 ms  0%
     93 L  asleep    MQ Chip Stats          640/4104   0/1/75106159 ms  0%
     94 M  asleep    PZARB Timeout          336/4104   0/0/0 ms  0%
     95 M  asleep    MQ Chip                800/4096   0/0/573469 ms  0%
     96 L  asleep    MQ Chip Stats          640/4104   0/1/75075026 ms  0%
     97 M  asleep    PZARB Timeout          320/4104   0/0/0 ms  0%
     98 M  asleep    MQ Chip                584/4104   0/0/570315 ms  0%
     99 L  asleep    MQ Chip Stats          640/4096   0/1/74962658 ms  0%
    100 M  asleep    Cassis Free Timer     1024/4104   0/4/1608653986 ms 10%
    101 M  asleep    JNH Partition Mem Recovery  1080/4096   0/1/872710 ms  0%
    102 M  asleep    LU-CNTR Reader         392/8200   0/0/4005 ms  0%
    103 M  asleep    Stats Page Ager        384/8200   0/0/15808 ms  0%
    104 H  asleep    Cube Server           1392/4104   0/0/484073 ms  0%
    105 L  asleep    IP Reassembly         2224/4096   1/1/2814085 ms  0%
    106 M  asleep    Services TOD          1192/4096   0/0/2819106 ms  0%
    107 M  asleep    Trap_Info Read PFE 0.0   704/4104   0/0/60620 ms  0%
    108 M  asleep    Services TOD          1192/4104   0/0/2817964 ms  0%
    109 M  asleep    Trap_Info Read PFE 1.0   704/4104   0/0/55583 ms  0%
    110 M  asleep    Services TOD          1192/4096   0/0/2817119 ms  0%
    111 M  asleep    Trap_Info Read PFE 2.0   704/4096   0/0/56795 ms  0%
    112 M  asleep    Services TOD          1192/4104   0/0/2847280 ms  0%
    113 M  asleep    Trap_Info Read PFE 3.0   704/4104   0/0/96236 ms  0%
    114 L  asleep    JNH Exception Counter Background Thread  1640/4096   0/2/4613397 ms  0%
    115 L  asleep    DDOS Policers         2560/4096   0/4/80878031 ms  0%
    116 L  asleep    jnh errors daemon      376/4104   0/0/15745 ms  0%
    117 L  asleep    JNH KA Transmit       1504/4104   0/0/474769 ms  0%
    118 L  asleep    VBF PFE Events         352/4104   7/7/7 ms  0%
    119 M  asleep    bulkget Manager       4920/8192   0/1/34584727 ms  0%
    120 M  asleep    PRECL Chip Generic     488/4096   0/0/182022 ms  0%
    121 M  asleep    PRECL Chip Generic     488/4096   0/0/180806 ms  0%
    122 M  asleep    PRECL Chip Generic     488/4104   0/0/176042 ms  0%
    123 M  asleep    PRECL Chip Generic     488/4104   0/0/181378 ms  0%
    163 L  asleep    Virtual Console        944/32776  0/0/0 ms  0%
    167 L  running   Cattle-Prod Daemon    4272/32768  0/0/6 ms  0%
    168 L  asleep    Cattle-Prod Daemon    2128/32776  0/0/0 ms  0%

    I also noticed these log messages, which appears for both FPC's and ~10 times per day:

    show log messages | grep fpc
    Mar 8 18:05:15 fpc2 io_err bus 0 busy timeout Mar 8 18:05:15 fpc2 Failed to disable PCA9548(0x76)->channel(0-7) Mar 8 18:05:15 fpc2 i2c_npc_pca8548_cleanup: Failed to disable I2C Mux Mar 8 18:05:15 fpc2 PQ3_IIC(WR): bus 0 busy timeout Mar 8 18:05:15 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0xa1, i2c_ctl[0]=0x80, bus_addr=0x76) Mar 8 18:05:16 fpc2 io_err bus 0 busy timeout Mar 8 18:05:16 fpc2 Failed to enable PCA9548(0x76):grp(0x0)->channel(5) Mar 8 18:05:16 fpc2 PQ3_IIC(WR): bus 0 busy timeout, attempting to clear Mar 8 18:39:03 fpc2 PQ3_IIC(WR): bus transfer timeout on byte 0 Mar 8 18:39:03 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x85, i2c_ctl[0]=0xb0, bus_addr=0x70) Mar 8 18:39:03 fpc2 Failed to disable PCA9548(0x70)->channel(0-7) Mar 8 19:03:03 fpc2 PQ3_IIC(WR): bus transfer timeout on byte 1 Mar 8 19:03:03 fpc2 PQ3_IIC(WR): transfer not complete on byte 1 Mar 8 19:03:03 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x21, i2c_ctl[0]=0xb0, bus_addr=0x76) Mar 8 19:03:03 fpc2 Failed to enable PCA9548(0x76):grp(0x0)->channel(5) Mar 8 19:48:55 fpc1 PQ3_IIC(RD): bus transfer timeout on byte 0 Mar 8 19:48:55 fpc1 PQ3_IIC(RD): transfer not complete on byte 0 Mar 8 19:48:55 fpc1 PQ3_IIC(RD): I/O error (i2c_stat=0x25, i2c_ctl[0]=0xb0, bus_addr=0x1c) Mar 8 19:48:55 fpc1 PQ3_IIC(WR): bus arbitration lost on byte 0 Mar 8 19:48:55 fpc1 PQ3_IIC(WR): I/O error (i2c_stat=0x93, i2c_ctl[0]=0x80, bus_addr=0x76) Mar 8 19:48:55 fpc1 PQ3_IIC(WR): bus 0 busy timeout Mar 8 19:48:56 fpc1 PQ3_IIC(WR): I/O error (i2c_stat=0xa1, i2c_ctl[0]=0x90, bus_addr=0x76) Mar 8 19:48:56 fpc1 io_err bus 0 busy timeout Mar 8 19:48:56 fpc1 Failed to disable PCA9548(0x76)->channel(0-7) Mar 8 19:48:56 fpc1 i2c_npc_pca8548_cleanup: Failed to disable I2C Mux Mar 8 19:48:56 fpc1 PQ3_IIC(WR): bus 0 busy timeout, attempting to clear Mar 8 20:04:01 fpc2 PQ3_IIC(WR): bus transfer timeout on byte 0 Mar 8 20:04:01 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x81, i2c_ctl[0]=0xb0, bus_addr=0x76) Mar 8 20:04:01 fpc2 Failed to enable PCA9548(0x76):grp(0x0)->channel(5) Mar 8 20:08:57 fpc2 PQ3_IIC(WR): bus transfer timeout on byte 1 Mar 8 20:08:57 fpc2 PQ3_IIC(WR): transfer not complete on byte 1 Mar 8 20:08:57 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x21, i2c_ctl[0]=0xb0, bus_addr=0x76) Mar 8 20:08:57 fpc2 Failed to enable PCA9548(0x76):grp(0x0)->channel(5) Mar 8 20:21:53 fpc2 PQ3_IIC(RD): bus transfer timeout on byte 0 Mar 8 20:21:53 fpc2 PQ3_IIC(RD): transfer not complete on byte 0 Mar 8 20:21:53 fpc2 PQ3_IIC(RD): I/O error (i2c_stat=0x25, i2c_ctl[0]=0xb0, bus_addr=0x1c) Mar 8 20:21:54 fpc2 PQ3_IIC(WR): bus arbitration lost on byte 0 Mar 8 20:21:54 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x93, i2c_ctl[0]=0x80, bus_addr=0x76) Mar 8 20:21:54 fpc2 PQ3_IIC(WR): bus 0 busy timeout

    Any ideas how to track this down? Maybe a software bug?



  • 2.  RE: High CPU load on FPC with little traffic

    Posted 03-08-2020 18:46

    HI, looks like these messages matches to PR1374450



  • 3.  RE: High CPU load on FPC with little traffic

    Posted 03-09-2020 02:21

    thank you, and the high CPU usage?

    we have 100% CPU Usage on FPC2, how can this be with so less traffic?

     

    disabling firewall filters and rpf-checks didnt push the cpu usage down. Problem is that the snmp graphs have gaps, because the CPU is not responding on any SNMP measurement



  • 4.  RE: High CPU load on FPC with little traffic

     
    Posted 03-09-2020 11:50

    Hi Freemind,

     

    Greetings, Are you using an MPC2 or MPC5  by any chance? Also, can you share the output from the following commands?

     

    From CLI

    show interfaces extensive |match "phy|uni|br|mul" (Check if a specific interface has an abnormal utilization)
    show chassis routing-engine
    show system virtual-memory | no-more

    show chassis environment (Check temperatures)

    show task io

    show system virtual-memory

    show system alarms
    show chassis alarms

    show system core-dumps
    request pfe execute target fpc2 command "show halp-pkt pkt-stats"
    show system processes extensive all-members | except 0.00 ( which process is going high, take 3 to 5 iterations)

     

    From shell:

     

    Root% rtsockmon –nt (Gather the output for 5 minutes or so)
    Root% top –H (Gather the output for 10 to 20 secs)

     

    If this solves your problem, please mark this post as "Accepted Solution" so we can help others too \:)/

    Regards,

    Lil Dexx
    JNCIE-ENT#863, 3X JNCIP-[SP-ENT-DC], 4X JNCIA [cloud-DevOps-Junos-Design], Champions Ingenius, SSYB

     

     



  • 5.  RE: High CPU load on FPC with little traffic

    Posted 03-09-2020 15:44

    Hello guys,

     

    thanks for your effort first, really nice! 🙂

     

    I tested all your suggestions and the most commands didn't show any anomalies. The best find was "rtsockmon -nt" - which showed a real SPAM of route add/delete in very short time(1MB of text in just 10 seconds), so I grepped for one IP:

     

    % rtsockmon -nt | grep "1.2.3.4"
    Spoiler
    [23:10:00:237.935] kernel P route add inet 1.2.3.4 tid=0 plen=32 type=dest flags=0x180 nh=hold nhflags=0x1 nhidx=85960 rt_nhiflist = 25484 altfwdnhidx=0 filtidx=0 lr_id = 0 featureid=0 rpf:len=1 idx=25484 rt_mcast_nhiflist=0
    [23:10:03:971.249] kernel P route delete inet 1.2.3.4 tid=0 plen=32 type=dest flags=0x180 nh=hold nhflags=0x1 nhidx=85960 rt_nhiflist = 25484 altfwdnhidx=0 filtidx=0 lr_id = 0 featureid=0 rpf:len=1 idx=25484 rt_mcast_nhiflist=0
    [23:10:22:334.759] kernel P route add inet 1.2.3.4 tid=0 plen=32 type=dest flags=0x180 nh=hold nhflags=0x1 nhidx=102202 rt_nhiflist = 25484 altfwdnhidx=0 filtidx=0 lr_id = 0 featureid=0 rpf:len=1 idx=25484 rt_mcast_nhiflist=0
    [23:10:26:361.518] kernel P route delete inet 1.2.3.4 tid=0 plen=32 type=dest flags=0x180 nh=hold nhflags=0x1 nhidx=102202 rt_nhiflist = 25484 altfwdnhidx=0 filtidx=0 lr_id = 0 featureid=0 rpf:len=1 idx=25484 rt_mcast_nhiflist=0
    [23:11:07:530.544] kernel P route add inet 1.2.3.4 tid=0 plen=32 type=dest flags=0x180 nh=hold nhflags=0x1 nhidx=51565 rt_nhiflist = 25484 altfwdnhidx=0 filtidx=0 lr_id = 0 featureid=0 rpf:len=1 idx=25484 rt_mcast_nhiflist=0
    [23:11:11:238.831] kernel P route delete inet 1.2.3.4 tid=0 plen=32 type=dest flags=0x180 nh=hold nhflags=0x1 nhidx=51565 rt_nhiflist = 25484 altfwdnhidx=0 filtidx=0 lr_id = 0 featureid=0 rpf:len=1 idx=25484 rt_mcast_nhiflist=0

    I dont know what is causing this - shouldn't this route stay longer without beeing removed and readded?

    Will try to disable the ports on the FPC tomorrow in a maintenance.. Do you guys have any idea? 

     

    kind regards



  • 6.  RE: High CPU load on FPC with little traffic

     
    Posted 03-10-2020 02:12

    Hi,

     

    You can check for any link or neighborship flaps on the device. Such an event would trigger the continuous re-programming of routes we are seeing. This re-programming is choking up the KRT queue, and eventually resulting in high CPU on the FPC. You can monitor the KRT queue status using the below commands : 

     

    show krt queue

    show krt state

     

    You would see the route updates pending to be sent to the PFE.

     

    While you check for neighbor flaps, do check for wrong routing config as well, like a static route with a wrong/flapping next hop as well.

     

     

    Vishal



  • 7.  RE: High CPU load on FPC with little traffic

    Posted 03-10-2020 07:52

    I did not find any flapping interface or BGP Session.

    show krt queue
    Spoiler
    Routing table add queue: 0 queued
    Interface add/delete/change queue: 0 queued
    Top-priority deletion queue: 0 queued
    Top-priority change queue: 0 queued
    Top-priority add queue: 0 queued
    high priority V4oV6 tcnh delete queue: 0 queued
    high prioriy anchor gencfg delete queue: 0 queued
    High-priority multicast add/change: 0 queued
    Indirect next hop top priority add/change: 0 queued
    Indirect next hop add/change: 0 queued
    high prioriy anchor gencfg add-change queue: 0 queued
    MPLS add queue: 0 queued
    Indirect next hop delete: 0 queued
    High-priority deletion queue: 0 queued
    MPLS change queue: 0 queued
    High-priority change queue: 0 queued
    High-priority add queue: 0 queued
    Normal-priority indirect next hop queue: 0 queued
    Normal-priority deletion queue: 0 queued
    Normal-priority composite next hop deletion queue: 0 queued
    Low prioriy Statistics-id-group deletion queue: 0 queued
    Normal-priority change queue: 0 queued
    Normal-priority add queue: 0 queued
    Least-priority delete queue: 0 queued
    Least-priority change queue: 0 queued
    Least-priority add queue: 0 queued
    Normal-priority pfe table nexthop queue: 0 queued
    EVPN gencfg queue: 0 queued
    Normal-priority gmp queue: 0 queued
    Routing table delete queue: 0 queued
    Low priority route retry queue: 0 queued
    show krt state
    Spoiler

    General state:
    Install job is not running
    Number of operations queued: 0
    Routing table adds: 0
    Interface routes: 0
    High pri multicast Adds/Changes: 0
    Indirect Next Hop Adds/Changes: 0 Deletes: 0
    MPLS Adds: 0 Changes: 0
    High pri Adds: 0 Changes: 0 Deletes: 0
    Normal pri Indirects: 0
    Normal pri Adds: 0 Changes: 0 Deletes: 0
    GMP GENCFG Objects: 0
    Routing Table deletes: 0
    Number of operations deferred: 0
    Number of operations canceled: 0
    Number of async queue entries: 0
    Number of async non queue entries: 0
    Time until next queue run: 0
    Routes learned from kernel: 549437

    Routing socket lossage:
    Time until next scan: 35

    any other ideas?



  • 8.  RE: High CPU load on FPC with little traffic

    Posted 03-11-2020 21:16

    hello,

    upgrading to the latest(recommended) release did NOT help, now the high CPU load is on fpc1 and not on fpc2 anymore, nothing more changed. Any ideas?

     

    show chassis fpc    
                         Temp  CPU Utilization (%)   CPU Utilization (%)  Memory    Utilization (%)
    Slot State            (C)  Total  Interrupt      1min   5min   15min  DRAM (MB) Heap     Buffer
      0  Empty           
      1  Online            48     99         22       91     74     39    2048       35         24
      2  Online            49     60         13       59     51     27    2048       34         24
    


  • 9.  RE: High CPU load on FPC with little traffic

     
    Posted 03-12-2020 02:11

    Hi,

    I'd suggest we'd need to follow your finding "real SPAM of route add/delete in very short time(1MB of text in just 10 seconds)". Can you check that output for the top routes churning, find out by way of which protocol you're learning it how and follow that direction.

    Regards

    Ulf



  • 10.  RE: High CPU load on FPC with little traffic

    Posted 03-12-2020 10:22

    Freemind,

     

    This looks weird, the upgrade help in moving the problem from one FPC to the other, the only thing that comes to my mind is that the problem moved due to a redundancy feature, so maybe the impacting port/interface is active now on the other FPC.

     

    one thing, are the logs for "

    fpc1 i2c_npc_pca8548_cleanup: Failed to disable I2C Mux 

    " still showing in the messages file after the upgrade?

     

    As Ulf mentioned, we need to track the 'real spam' hogging the FPC, for a deep dive JTAC may be the best option, but I also think that checking from where the changes add/delete shown in the rtsockmon should help you resolving the issue. For  me, I would systematically check interfaces as possible causes, it may be dumb and slow, but problem may be due to a particular port or group of ports.

     

    One thing I wonder now, what if you shut down the FPC1, will the CPU go high on the FPC2, can you move the problematic state between FPCs somehow?

     

    Well, also at this point I think you may like to ask JTAC if they can help with the diagnosis.

     

    Good Luck,

    Cheers,

    Benjamin



  • 11.  RE: High CPU load on FPC with little traffic

     
    Posted 03-09-2020 13:13

    Looks like some traffic is congesting PFE. Please put back the filter with last term discard/syslog. Check if you can capture some culprit flows.

     

    Other things to look at:

    1. >show policer >>>If there's any ARP policer drop

    2. DDOS violation 



  • 12.  RE: High CPU load on FPC with little traffic

    Posted 03-09-2020 11:20

    Hi Freemind

     

    I hope you are doing great,

     

    Please do:

     

    >monitor traffic interface irb no-resolve size 1500

     

    Do not specify the IRB number

     

    Check what kind of traffic is hitting the RE and make sure it is legit, you can also share the output here if possible and I can provide some feedback.

     

    Warm regards!

    Pablo Restrepo -



  • 13.  RE: High CPU load on FPC with little traffic

    Posted 03-09-2020 13:15

    Hello Freemind,

     

    This definitely looks like a bug, i tried searching for something that could provide the solution, but I could not find, based on the logs the PR1374450 mention may fit, but since it doesn't have a solution is almost like having nothing. 

     

    Is there impact in the FPC performace? you can try disconnecting all cables on the FPC and see if the usage goes down, and then connect the ports systematically to verify if the problem is related to one port or some ports, then you can move from there.

     

    If you open a case, the engineers may be able to check the FPC scheduler and verify if a diagnose can be provided.

     

    The easiest solution, for me would be to upgrade the chassis to the latest recommended relase and check the results. you can maybe try the latest 18.x or 19.x code and check the FPC usage.

     

    Cheers.

    Benjamin