Hello,
on our MX240 with Junos 17.3R3.10 and 2x MPC 3D 16x 10GE we got the problem that FPC2 is nearly always at 100% CPU Load, FPC1 is around 80%. We only have 1.2GB/s Traffic on this device..
I already disabled traffic sampling and rpf-checks, but it didn't help.
show chassis fpc
Temp CPU Utilization (%) CPU Utilization (%) Memory Utilization (%)
Slot State (C) Total Interrupt 1min 5min 15min DRAM (MB) Heap Buffer
0 Empty
1 Online 46 80 23 71 72 75 2048 36 29
2 Online 46 100 26 96 92 94 2048 36 43
request pfe execute target fpc2 command "show threads"
SENT: Ukern command: show threads
PID PR State Name Stack Use Time (Last/Max/Total) cpu
--- -- ------- --------------------- --------- ---------------------
1 H asleep Maintenance 304/2048 0/0/2 ms 0%
2 L running Idle 320/2048 0/0/2701361988 ms 16%
3 H asleep Timer Services 312/2056 0/0/1387505 ms 0%
5 L asleep Ukern Syslog 312/4096 0/0/0 ms 0%
6 L asleep Heap Accouting 496/4104 0/1/32472580 ms 0%
7 L asleep Sheaf Background 400/2056 0/0/24175 ms 0%
8 H ready IGMP 1248/16384 0/66/3626921 ms 0%
9 H asleep IPv4 PFE Control Background 296/8200 0/0/0 ms 0%
10 M asleep OTN 376/8200 0/0/115551 ms 0%
11 M asleep GR253 408/4096 0/0/107172 ms 0%
12 M asleep CXP 424/4104 0/0/103846 ms 0%
13 M asleep QSFP 536/4096 0/0/148381 ms 0%
14 M asleep DCC Background 280/4096 0/0/0 ms 0%
15 M asleep DSX50ms 328/4104 0/0/2179498 ms 0%
16 M asleep DSXonesec 320/4096 0/0/106497 ms 0%
17 M asleep mac_db 296/8192 0/0/0 ms 0%
18 M asleep RSMON syslog thread 2424/4104 0/35/12051441 ms 0%
19 M asleep MSA300PIN 376/4096 0/0/17077 ms 0%
20 M asleep CFP 448/4104 0/0/111144 ms 0%
21 M asleep XFP 424/4096 0/0/106951 ms 0%
22 M asleep SFP 1856/4096 0/34/38453557 ms 0%
23 L asleep Firmware Upgrade 320/4104 0/0/0 ms 0%
24 L asleep Syslog 1072/4104 0/0/64668 ms 0%
25 M asleep 50ms Periodic 224/8192 0/0/0 ms 0%
26 M asleep 100ms Periodic 224/8200 0/0/0 ms 0%
27 M asleep 1s Medium Periodic 632/8192 0/0/1458605 ms 0%
28 M asleep 10s Medium Periodic 1536/8200 0/0/12421 ms 0%
29 L asleep 1s Low Periodic 896/8192 0/0/792266 ms 0%
30 L asleep 10s Low Periodic 320/8200 0/0/181667 ms 0%
31 M asleep TTRACE Creator 360/4104 0/0/0 ms 0%
32 M asleep TTRACE Tracer 432/4096 0/0/0 ms 0%
33 L asleep LKUP ASIC Wedge poll thread 696/4104 0/0/5695833 ms 0%
34 L asleep TOE Coredump 408/4104 0/0/0 ms 0%
35 L asleep L2PD 392/4096 0/0/2168328 ms 0%
36 L asleep PQ3 PCI Periodic 1472/8192 0/0/77768 ms 0%
37 M asleep Host Loopback Periodic 448/8200 0/0/438680 ms 0%
38 M asleep HSL2 944/4096 0/1/33411471 ms 0%
39 H asleep TCP Timers 1544/8200 0/0/3662812 ms 0%
40 H asleep TCP Receive 1760/8192 0/1/246461063 ms 1%
41 H asleep TNP Hello 504/2048 0/0/523088 ms 0%
42 M asleep UDP Input 344/2048 0/0/3876 ms 0%
43 H asleep TTP Receive 1896/4096 0/1/224172092 ms 1%
44 H asleep TTP Transmit 1528/4104 0/9/2452163527 ms 15%
45 H asleep RDP Timers 208/4096 0/0/0 ms 0%
46 H asleep RDP Input 280/2056 0/0/0 ms 0%
47 M asleep RCM Pfe Manager 824/4104 0/0/1 ms 0%
48 L asleep CLNS Err Input 280/4096 0/0/0 ms 0%
49 L asleep CLNS Option Input 280/4104 0/0/0 ms 0%
50 H asleep L2TP-SF KA Transmit 296/4096 0/0/0 ms 0%
51 M asleep RPM Msg thread 368/8200 0/0/0 ms 0%
52 M asleep RFC2544 periodic 456/8192 0/0/104912 ms 0%
53 H asleep Pfesvcsor 592/8192 0/0/8816419 ms 0%
54 M asleep PIC Periodic 1736/8192 0/1/141307810 ms 0%
55 M asleep PIC 208/4104 0/0/0 ms 0%
56 M asleep TNPC CM 8576/16384 0/494/67588052 ms 0%
57 M asleep CLKSYNC Manager 1616/8200 0/0/19812 ms 0%
58 M asleep RDMAN 1536/4096 0/0/4911 ms 0%
59 H asleep CFM Manager 1384/32776 0/0/9086111 ms 0%
60 M asleep CFM Data thread 1384/8192 0/0/0 ms 0%
61 M asleep PPM Manager 4760/8200 0/0/65659 ms 0%
62 M ready PPM Data thread 1712/16392 0/0/15890963 ms 0%
63 L asleep IFCM 1640/4096 0/0/6453 ms 0%
64 M asleep VRRP Manager 1440/8200 1/1/1 ms 0%
65 M asleep L2ALM Manager 2256/8192 0/0/147167 ms 0%
67 L asleep ICMP6 Input 1576/4104 0/1/4925313 ms 0%
68 L asleep IP6 Option Input 1400/4096 0/0/0 ms 0%
69 L asleep ICMP Input 1104/4096 0/1/14828845 ms 0%
70 L asleep IP Option Input 1384/4104 0/0/191 ms 0%
71 M asleep IGMP Input 1384/4096 0/0/0 ms 0%
72 L asleep DFW Alert 688/4104 0/0/91414 ms 0%
73 L asleep cos halp stats daemon 1296/4104 0/1/28315712 ms 0%
74 L asleep NH Probe Service 304/4096 0/0/96 ms 0%
75 L asleep IPC Test Daemon 672/2056 0/0/0 ms 0%
76 M asleep PFE Manager 9760/32776 0/82/3007834430 ms 18%
77 L asleep PFEMAN Service Thread 1176/16384 0/0/0 ms 0%
78 L asleep PFEMAN SRRD Thread 504/16392 0/0/8291 ms 0%
79 H asleep SNTP Daemon 1488/8200 0/0/6421 ms 0%
81 L asleep Console 2224/16384 0/0/0 ms 0%
82 L asleep Console 2224/16392 0/0/0 ms 0%
83 M asleep PFE Statistics 4392/16384 0/2/35420061 ms 0%
84 L asleep VBF Walker 352/16384 0/0/0 ms 0%
85 L asleep VBF MC Purge 256/8200 0/0/0 ms 0%
86 M asleep PZARB Timeout 336/4104 0/0/0 ms 0%
87 L asleep LU Background Service 1664/4104 0/1/708882863 ms 4%
88 L ready LKUP ASIC UCODE Rebalance Service 1760/4096 1/8/182377472 ms 1%
89 M asleep MQ Chip 800/4104 0/0/569788 ms 0%
90 L asleep MQ Chip Stats 640/4096 0/1/75520530 ms 0%
91 M asleep PZARB Timeout 336/4096 0/0/0 ms 0%
92 M asleep MQ Chip 584/4096 0/0/575772 ms 0%
93 L asleep MQ Chip Stats 640/4104 0/1/75106159 ms 0%
94 M asleep PZARB Timeout 336/4104 0/0/0 ms 0%
95 M asleep MQ Chip 800/4096 0/0/573469 ms 0%
96 L asleep MQ Chip Stats 640/4104 0/1/75075026 ms 0%
97 M asleep PZARB Timeout 320/4104 0/0/0 ms 0%
98 M asleep MQ Chip 584/4104 0/0/570315 ms 0%
99 L asleep MQ Chip Stats 640/4096 0/1/74962658 ms 0%
100 M asleep Cassis Free Timer 1024/4104 0/4/1608653986 ms 10%
101 M asleep JNH Partition Mem Recovery 1080/4096 0/1/872710 ms 0%
102 M asleep LU-CNTR Reader 392/8200 0/0/4005 ms 0%
103 M asleep Stats Page Ager 384/8200 0/0/15808 ms 0%
104 H asleep Cube Server 1392/4104 0/0/484073 ms 0%
105 L asleep IP Reassembly 2224/4096 1/1/2814085 ms 0%
106 M asleep Services TOD 1192/4096 0/0/2819106 ms 0%
107 M asleep Trap_Info Read PFE 0.0 704/4104 0/0/60620 ms 0%
108 M asleep Services TOD 1192/4104 0/0/2817964 ms 0%
109 M asleep Trap_Info Read PFE 1.0 704/4104 0/0/55583 ms 0%
110 M asleep Services TOD 1192/4096 0/0/2817119 ms 0%
111 M asleep Trap_Info Read PFE 2.0 704/4096 0/0/56795 ms 0%
112 M asleep Services TOD 1192/4104 0/0/2847280 ms 0%
113 M asleep Trap_Info Read PFE 3.0 704/4104 0/0/96236 ms 0%
114 L asleep JNH Exception Counter Background Thread 1640/4096 0/2/4613397 ms 0%
115 L asleep DDOS Policers 2560/4096 0/4/80878031 ms 0%
116 L asleep jnh errors daemon 376/4104 0/0/15745 ms 0%
117 L asleep JNH KA Transmit 1504/4104 0/0/474769 ms 0%
118 L asleep VBF PFE Events 352/4104 7/7/7 ms 0%
119 M asleep bulkget Manager 4920/8192 0/1/34584727 ms 0%
120 M asleep PRECL Chip Generic 488/4096 0/0/182022 ms 0%
121 M asleep PRECL Chip Generic 488/4096 0/0/180806 ms 0%
122 M asleep PRECL Chip Generic 488/4104 0/0/176042 ms 0%
123 M asleep PRECL Chip Generic 488/4104 0/0/181378 ms 0%
163 L asleep Virtual Console 944/32776 0/0/0 ms 0%
167 L running Cattle-Prod Daemon 4272/32768 0/0/6 ms 0%
168 L asleep Cattle-Prod Daemon 2128/32776 0/0/0 ms 0%
I also noticed these log messages, which appears for both FPC's and ~10 times per day:
show log messages | grep fpc
Mar 8 18:05:15 fpc2 io_err bus 0 busy timeout
Mar 8 18:05:15 fpc2 Failed to disable PCA9548(0x76)->channel(0-7)
Mar 8 18:05:15 fpc2 i2c_npc_pca8548_cleanup: Failed to disable I2C Mux
Mar 8 18:05:15 fpc2 PQ3_IIC(WR): bus 0 busy timeout
Mar 8 18:05:15 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0xa1, i2c_ctl[0]=0x80, bus_addr=0x76)
Mar 8 18:05:16 fpc2 io_err bus 0 busy timeout
Mar 8 18:05:16 fpc2 Failed to enable PCA9548(0x76):grp(0x0)->channel(5)
Mar 8 18:05:16 fpc2 PQ3_IIC(WR): bus 0 busy timeout, attempting to clear
Mar 8 18:39:03 fpc2 PQ3_IIC(WR): bus transfer timeout on byte 0
Mar 8 18:39:03 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x85, i2c_ctl[0]=0xb0, bus_addr=0x70)
Mar 8 18:39:03 fpc2 Failed to disable PCA9548(0x70)->channel(0-7)
Mar 8 19:03:03 fpc2 PQ3_IIC(WR): bus transfer timeout on byte 1
Mar 8 19:03:03 fpc2 PQ3_IIC(WR): transfer not complete on byte 1
Mar 8 19:03:03 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x21, i2c_ctl[0]=0xb0, bus_addr=0x76)
Mar 8 19:03:03 fpc2 Failed to enable PCA9548(0x76):grp(0x0)->channel(5)
Mar 8 19:48:55 fpc1 PQ3_IIC(RD): bus transfer timeout on byte 0
Mar 8 19:48:55 fpc1 PQ3_IIC(RD): transfer not complete on byte 0
Mar 8 19:48:55 fpc1 PQ3_IIC(RD): I/O error (i2c_stat=0x25, i2c_ctl[0]=0xb0, bus_addr=0x1c)
Mar 8 19:48:55 fpc1 PQ3_IIC(WR): bus arbitration lost on byte 0
Mar 8 19:48:55 fpc1 PQ3_IIC(WR): I/O error (i2c_stat=0x93, i2c_ctl[0]=0x80, bus_addr=0x76)
Mar 8 19:48:55 fpc1 PQ3_IIC(WR): bus 0 busy timeout
Mar 8 19:48:56 fpc1 PQ3_IIC(WR): I/O error (i2c_stat=0xa1, i2c_ctl[0]=0x90, bus_addr=0x76)
Mar 8 19:48:56 fpc1 io_err bus 0 busy timeout
Mar 8 19:48:56 fpc1 Failed to disable PCA9548(0x76)->channel(0-7)
Mar 8 19:48:56 fpc1 i2c_npc_pca8548_cleanup: Failed to disable I2C Mux
Mar 8 19:48:56 fpc1 PQ3_IIC(WR): bus 0 busy timeout, attempting to clear
Mar 8 20:04:01 fpc2 PQ3_IIC(WR): bus transfer timeout on byte 0
Mar 8 20:04:01 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x81, i2c_ctl[0]=0xb0, bus_addr=0x76)
Mar 8 20:04:01 fpc2 Failed to enable PCA9548(0x76):grp(0x0)->channel(5)
Mar 8 20:08:57 fpc2 PQ3_IIC(WR): bus transfer timeout on byte 1
Mar 8 20:08:57 fpc2 PQ3_IIC(WR): transfer not complete on byte 1
Mar 8 20:08:57 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x21, i2c_ctl[0]=0xb0, bus_addr=0x76)
Mar 8 20:08:57 fpc2 Failed to enable PCA9548(0x76):grp(0x0)->channel(5)
Mar 8 20:21:53 fpc2 PQ3_IIC(RD): bus transfer timeout on byte 0
Mar 8 20:21:53 fpc2 PQ3_IIC(RD): transfer not complete on byte 0
Mar 8 20:21:53 fpc2 PQ3_IIC(RD): I/O error (i2c_stat=0x25, i2c_ctl[0]=0xb0, bus_addr=0x1c)
Mar 8 20:21:54 fpc2 PQ3_IIC(WR): bus arbitration lost on byte 0
Mar 8 20:21:54 fpc2 PQ3_IIC(WR): I/O error (i2c_stat=0x93, i2c_ctl[0]=0x80, bus_addr=0x76)
Mar 8 20:21:54 fpc2 PQ3_IIC(WR): bus 0 busy timeout
Any ideas how to track this down? Maybe a software bug?