Junos
Junos

High CPU usage/Process question (4x 4300-48t and 1x 4300-32f non mixed VC)

‎06-27-2019 01:20 AM

Hey community,

 

we have a non-mixed VC with 5x EX4300 series switches. Since roughly 20 days the cpu usage is higher than it should be. In this timeframe we had a migration from one NetApp to another and we thought this would cause the additional utilization.

 

Unfortunately the migration is completed and the high cpu usage is still there.

 

After some research i found the following KBs:

https://kb.juniper.net/InfoCenter/index?page=content&id=KB31605&cat=&actp=LIST

https://kb.juniper.net/InfoCenter/index?page=content&id=KB26261

 

All the VC members are connected via DAC cables and there are currently only 3 SFP (2x 10g, 1x 1g) connections on the EX4300-32f.

 

 

Here are some informations about the REs:

 

root@vx-core-b6-01> show chassis routing-engine
Routing Engine status:
Slot 0:
Current state Master
Temperature 63 degrees C / 145 degrees F
CPU temperature 63 degrees C / 145 degrees F
DRAM 2048 MB
Memory utilization 70 percent
5 sec CPU utilization:
User 65 percent
Background 0 percent
Kernel 27 percent
Interrupt 8 percent
Idle 0 percent
Model EX4300-48T
Serial ID PE3717480060
Start time 2019-04-15 06:16:00 UTC
Uptime 73 days, 1 hour, 57 minutes, 26 seconds
Last reboot reason Router rebooted after a normal shutdown.
Load averages: 1 minute 5 minute 15 minute
1.55 1.52 1.25
Routing Engine status:
Slot 1:
Current state Backup
Temperature 56 degrees C / 132 degrees F
CPU temperature 56 degrees C / 132 degrees F
DRAM 2048 MB
Memory utilization 51 percent
5 sec CPU utilization:
User 15 percent
Background 0 percent
Kernel 13 percent
Interrupt 0 percent
Idle 72 percent
Model EX4300-48T
Serial ID PE3717480078
Start time 2019-04-15 06:16:00 UTC
Uptime 73 days, 1 hour, 57 minutes, 26 seconds
Last reboot reason Router rebooted after a normal shutdown.
Load averages: 1 minute 5 minute 15 minute
0.52 0.41 0.35

 

 

After checking the usage of the individual processes via "top", i got the following output:

 

last pid: 72354; load averages: 1.11, 0.97, 0.99 up 73+01:55:52 08:11:22
71 processes: 2 running, 68 sleeping, 1 zombie
CPU states: 34.6% user, 0.0% nice, 20.0% system, 4.0% interrupt, 41.4% idle
Mem: 1070M Active, 88M Inact, 144M Wired, 324M Cache, 112M Buf, 235M Free
Swap:

PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
1787 root 2 -52 -52 564M 209M select 549.0H 38.09% pfex_junos
72354 root 1 44 0 23784K 2280K RUN 0:00 3.00% top
1781 root 1 43 0 67352K 32676K select 36.9H 0.98% chassisd
1844 root 1 42 0 54488K 36532K select 20.6H 0.00% l2ald
1865 root 1 40 0 196M 99344K select 751:58 0.00% authd
1842 root 1 40 0 59072K 38708K select 552:06 0.00% mib2d
1338 root 1 41 0 28700K 14288K select 411:25 0.00% eventd
1876 root 2 4 0 14064K 3276K select 406:15 0.00% repd
1861 root 1 4 0 65700K 37552K kqread 363:04 0.00% l2cpd
1847 root 1 40 0 41232K 23148K select 349:52 0.00% pfed
1864 root 2 40 0 108M 28624K select 341:01 0.00% jdhcpd
1877 root 1 4 0 77560K 31632K kqread 294:27 0.00% mcsnoopd
1788 root 1 4 0 36024K 14444K kqread 276:43 0.00% chassism
1878 root 1 41 0 26456K 14864K select 246:50 0.00% license-check
1841 root 1 40 0 35760K 25056K select 177:30 0.00% snmpd
1873 root 1 4 0 75288K 31036K kqread 129:59 0.00% dot1xd-secure
1852 root 1 40 0 25448K 14652K select 113:30 0.00% ppmd
1786 root 1 4 -20 57788K 23800K kqread 93:34 0.00% vccpd
1870 root 1 40 0 21508K 8208K select 66:33 0.00% shm-rtsdbd
1867 root 1 40 0 19724K 6224K select 64:56 0.00% bdbrepd
1850 root 2 40 0 34244K 19512K select 39:06 0.00% cosd
1843 root 2 4 0 155M 58804K kqread 30:22 0.00% rpd
1836 root 1 40 0 2752K 1280K select 30:07 0.00% bslockd
1838 root 1 40 0 24552K 14424K select 27:23 0.00% alarmd
1785 root 1 40 0 12856K 5404K select 20:16 0.00% irsd
1860 root 1 40 0 28476K 15748K select 20:10 0.00% lacpd
1846 root 1 40 0 3476K 1500K select 15:32 0.00% tnp.sntpd
1869 root 1 40 0 27044K 10016K select 13:05 0.00% smid
1857 root 1 40 0 22096K 12032K select 11:19 0.00% bfdd
1889 root 1 40 0 64788K 24164K select 10:39 0.00% dcd
1882 root 1 40 0 23152K 12384K select 10:18 0.00% pkid
1863 root 1 40 0 22992K 12980K select 9:52 0.00% lfmd
1862 root 1 40 0 21740K 11848K select 5:33 0.00% oamd
1840 root 1 40 0 9792K 8844K select 5:07 0.00% ntpd
1851 root 1 40 0 35036K 16556K select 4:59 0.00% kmd
1859 root 1 40 0 61360K 18220K select 4:50 0.00% dfcd
1849 root 1 40 0 28256K 14760K select 4:16 0.00% rmopd
1848 root 1 40 0 21432K 11936K select 3:59 0.00% ilmid
1845 root 1 25 -15 21572K 11832K select 3:54 0.00% apsd
1839 root 1 40 0 21436K 11088K select 3:51 0.00% craftd
1853 root 1 40 0 13228K 5644K select 2:56 0.00% lmpd
1783 root 1 42 0 7168K 2596K select 2:06 0.00% inetd
1782 root 1 40 0 58424K 28672K select 1:44 0.00% mgd
2141 root 1 40 0 74804K 41744K select 1:03 0.00% cli
1855 root 1 40 0 42212K 25888K select 0:59 0.00% dfwd
1623 root 1 8 0 2324K 684K nanslp 0:22 0.00% cron
2152 root 1 40 0 59232K 13352K select 0:11 0.00% mgd
1789 root 1 8 0 2724K 1292K nanslp 0:10 0.00% getty
1790 root 1 8 0 2724K 1304K nanslp 0:10 0.00% getty
1835 root 1 40 0 2328K 952K select 0:08 0.00% watchdog
1858 root 1 40 0 16336K 6376K select 0:06 0.00% rdd
1911 root 1 40 0 2596K 1412K select 0:02 0.00% rlogind
1910 root 1 5 0 2400K 1276K ttyin 0:01 0.00% rlogin
71957 root 1 40 0 11692K 5540K select 0:00 0.00% sshd
1913 root 1 8 0 8908K 3336K wait 0:00 0.00% login
1868 root 1 4 0 14476K 5272K select 0:00 0.00% sendd
1866 root 1 4 0 47956K 5176K select 0:00 0.00% mplsoamd
71964 root 1 40 0 4888K 3180K pause 0:00 0.00% csh
1914 root 1 4 0 2400K 300K sbwait 0:00 0.00% rlogin

 

 

In addition to that top output:

 

root@vx-core-b6-01> show system processes extensive
last pid: 72363; load averages: 1.58, 1.15, 1.06 up 73+01:56:56 08:12:26
180 processes: 6 running, 152 sleeping, 22 waiting

Mem: 1101M Active, 87M Inact, 145M Wired, 324M Cache, 112M Buf, 203M Free
Swap:


PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND
1787 root 76 0 564M 209M RUN 549.0H 39.01% pfex_junos
1787 root -52 -52 564M 209M ll_blo 549.0H 29.44% pfex_junos
1844 root 72 0 54488K 36532K RUN 20.6H 5.18% l2ald
45 root 12 0 0K 16K sleep 44.5H 4.25% netdaemon
10 root 155 52 0K 16K RUN 895.6H 3.37% idle
72358 root 41 0 74716K 41332K select 0:01 2.25% cli
35 root -60 -163 0K 16K WAIT 26.9H 1.76% swi1: ipfwd
1781 root 48 0 67352K 32676K RUN 36.9H 1.07% chassisd
11 root -56 -159 0K 16K ll_blo 17.8H 0.93% swi2: netisr 0
2 root -16 0 0K 16K jfe_jo 973:19 0.00% jfe_job_0_0
1865 root 40 0 196M 99344K select 751:59 0.00% authd
2135 root 4 0 0K 16K select 714:08 0.00% ppt_07_80000005
1842 root 41 0 59072K 38708K select 552:06 0.00% mib2d
23 root -68 -171 0K 16K WAIT 516:06 0.00% irq41: besw0
12 root -36 -139 0K 16K RUN 476:38 0.00% swi7: clock
1338 root 41 0 28700K 14288K select 411:26 0.00% eventd
1876 root 40 0 14064K 3276K select 406:15 0.00% repd
1876 root 4 0 14064K 3276K select 406:15 0.00% repd
1861 root 4 0 65700K 37552K kqread 363:05 0.00% l2cpd
1847 root 40 0 41232K 23148K select 349:52 0.00% pfed
1864 root 40 0 108M 28624K select 341:01 0.00% jdhcpd
1864 root 40 0 108M 28624K select 341:01 0.00% jdhcpd
1877 root 4 0 77560K 31632K kqread 294:27 0.00% mcsnoopd
1788 root 4 0 36024K 14444K kqread 276:43 0.00% chassism
1878 root 42 0 26456K 14864K select 246:50 0.00% license-check
1841 root 40 0 35772K 25068K select 177:30 0.00% snmpd
14 root -16 0 0K 16K - 149:52 0.00% rand_harvestq
8 root 8 0 0K 16K - 133:23 0.00% thread taskq
1873 root 4 0 75288K 31036K kqread 129:59 0.00% dot1xd-secure
1852 root 40 0 25448K 14652K select 113:30 0.00% ppmd
1918 root 4 0 0K 16K select 106:36 0.00% ppt_11_80000013
36 root -16 0 0K 16K client 101:42 0.00% ifstate notify
1919 root 4 0 0K 16K select 96:56 0.00% ppt_11_80000014
1786 root 4 -20 57788K 23800K kqread 93:34 0.00% vccpd
1940 root 4 0 0K 16K select 91:50 0.00% ppt_11_80000010
1922 root 4 0 0K 16K select 79:59 0.00% ppt_11_80000011
1921 root 4 0 0K 16K select 74:58 0.00% ppt_11_80000012
1870 root 40 0 21508K 8208K select 66:33 0.00% shm-rtsdbd
1867 root 40 0 19724K 6224K select 64:56 0.00% bdbrepd
1850 root 40 0 34244K 19512K select 39:06 0.00% cosd
1850 root 40 0 34244K 19512K select 39:06 0.00% cosd
3 root -16 0 0K 16K jfe_jo 35:18 0.00% jfe_job_1_0
1843 root 4 0 155M 58804K kqread 30:22 0.00% rpd
1843 root 4 0 155M 58804K kqread 30:22 0.00% rpd
1836 root 40 0 2752K 1280K select 30:07 0.00% bslockd
1838 root 40 0 24552K 14424K select 27:23 0.00% alarmd
1785 root 40 0 12856K 5404K select 20:16 0.00% irsd
1860 root 40 0 28476K 15748K select 20:10 0.00% lacpd
1846 root 40 0 3476K 1500K select 15:32 0.00% tnp.sntpd
63 root 12 0 0K 16K - 14:29 0.00% schedcpu
41 root -4 0 0K 16K syncer 14:13 0.00% syncer
38 root 155 52 0K 16K pgzero 13:54 0.00% pagezero
1869 root 40 0 27044K 10016K select 13:05 0.00% smid
42 root 40 0 0K 16K vnlrum 11:59 0.00% vnlru_mem
1857 root 40 0 22096K 12032K select 11:19 0.00% bfdd
1889 root 40 0 64788K 24164K select 10:39 0.00% dcd
1882 root 40 0 23152K 12384K select 10:18 0.00% pkid
1863 root 40 0 22992K 12980K select 9:52 0.00% lfmd
57 root -16 0 0K 16K psleep 8:16 0.00% vmkmemdaemon
4 root -8 0 0K 16K - 5:53 0.00% g_event
5 root -8 0 0K 16K - 5:48 0.00% g_up
1862 root 40 0 21740K 11848K select 5:33 0.00% oamd
1840 root 40 0 9792K 8844K select 5:07 0.00% ntpd
1851 root 40 0 35036K 16556K select 4:59 0.00% kmd
6 root -8 0 0K 16K - 4:54 0.00% g_down
1859 root 40 0 61360K 18220K select 4:50 0.00% dfcd
1849 root 40 0 28256K 14760K select 4:16 0.00% rmopd
1848 root 40 0 21432K 11936K select 3:59 0.00% ilmid
1845 root 25 -15 21572K 11832K select 3:54 0.00% apsd
1839 root 40 0 21436K 11088K select 3:51 0.00% craftd
43 root -16 0 0K 16K sdflus 3:41 0.00% softdepflush
27 root -84 -187 0K 16K WAIT 3:28 0.00% irq96: fman0
21 root -68 -171 0K 16K WAIT 3:01 0.00% irq38: i2c0 i2c1
1853 root 40 0 13228K 5644K select 2:56 0.00% lmpd
1783 root 52 0 7168K 2596K select 2:06 0.00% inetd
9 root 8 0 0K 16K - 1:57 0.00% kqueue taskq
1782 root 40 0 58424K 28672K select 1:44 0.00% mgd
54 root 40 0 0K 16K select 1:38 0.00% jsr_kkcm
50 root 4 0 0K 16K select 1:34 0.00% devrt_kthread
28 root -80 -183 0K 16K WAIT 1:24 0.00% irq44: ehci0
34 root -52 -155 0K 16K WAIT 1:07 0.00% swi3: ip6opt ipopt
2141 root 40 0 74804K 41744K select 1:03 0.00% cli
1855 root 40 0 42212K 25888K select 0:59 0.00% dfwd
19 root -44 -147 0K 16K WAIT 0:52 0.00% swi5: cambio
40 root -4 0 0K 16K vlruwt 0:34 0.00% vnlru
1 root 8 0 1632K 736K wait 0:33 0.00% init
53 root -16 0 0K 16K psleep 0:32 0.00% vmuncachedaemon
39 root -16 0 0K 16K psleep 0:31 0.00% bufdaemon
1623 root 8 0 2324K 684K nanslp 0:22 0.00% cron
64 root 12 0 0K 16K no_rs 0:22 0.00% rtimeshare_thr
65 root -8 0 0K 16K mdwait 0:17 0.00% md0
490 root -8 0 0K 16K mdwait 0:12 0.00% md22
2152 root 40 0 59232K 13352K select 0:11 0.00% mgd
1789 root 8 0 2724K 1292K nanslp 0:10 0.00% getty
1790 root 8 0 2724K 1304K nanslp 0:10 0.00% getty
327 root -8 0 0K 16K mdwait 0:08 0.00% md14
1835 root 40 0 2328K 952K select 0:08 0.00% watchdog
37 root -16 0 0K 16K psleep 0:07 0.00% pagedaemon
1823 root 4 0 0K 16K select 0:07 0.00% ppt_11_80000010
1858 root 40 0 16336K 6376K select 0:06 0.00% rdd
58 root 8 0 0K 16K ifscli 0:03 0.00% ifsclientclosed
1911 root 40 0 2596K 1412K select 0:02 0.00% rlogind
356 root -8 0 0K 16K mdwait 0:02 0.00% md16
1910 root 5 0 2400K 1276K ttyin 0:01 0.00% rlogin
182 root -8 0 0K 16K mdwait 0:01 0.00% md4
32 root -8 0 0K 16K usbevt 0:01 0.00% usb1
29 root -8 0 0K 16K usbevt 0:01 0.00% usb0
153 root -8 0 0K 16K mdwait 0:00 0.00% md2
454 root -8 0 0K 16K mdwait 0:00 0.00% md21
5736 root -8 0 0K 16K mdwait 0:00 0.00% md34
641 root -8 0 0K 16K mdwait 0:00 0.00% md29
317 root -8 0 0K 16K mdwait 0:00 0.00% md13
22 root -64 -167 0K 16K WAIT 0:00 0.00% swi0: uart uart
71957 root 40 0 11704K 5552K select 0:00 0.00% sshd
269 root -8 0 0K 16K mdwait 0:00 0.00% md10
1913 root 8 0 8908K 3336K wait 0:00 0.00% login
211 root -8 0 0K 16K mdwait 0:00 0.00% md6
1868 root 4 0 14476K 5272K select 0:00 0.00% sendd
1866 root 4 0 47956K 5176K select 0:00 0.00% mplsoamd
72359 root 41 0 58464K 6160K select 0:00 0.00% mgd
71964 root 40 0 4888K 3180K pause 0:00 0.00% csh
1914 root 4 0 2400K 300K sbwait 0:00 0.00% rlogin
2103 root 64 0 5120K 3056K pause 0:00 0.00% csh
1837 root 4 0 4812K 2020K select 0:00 0.00% tnetd
415 root -8 0 0K 16K mdwait 0:00 0.00% md20
1856 root 4 0 8332K 2752K select 0:00 0.00% rtspd
1791 root 8 0 2956K 1492K wait 0:00 0.00% sh
346 root -8 0 0K 16K mdwait 0:00 0.00% md15
654 root -8 0 0K 16K mdwait 0:00 0.00% md30
0 root 12 0 0K 0K WAIT 0:00 0.00% swapper
5726 root -8 0 0K 16K mdwait 0:00 0.00% md33
628 root -8 0 0K 16K mdwait 0:00 0.00% md28
386 root -8 0 0K 16K mdwait 0:00 0.00% md18
288 root -8 0 0K 16K mdwait 0:00 0.00% md11
72363 root 44 0 23560K 1944K RUN 0:00 0.00% top
240 root -8 0 0K 16K mdwait 0:00 0.00% md8
589 root -8 0 0K 16K mdwait 0:00 0.00% md25
376 root -8 0 0K 16K mdwait 0:00 0.00% md17
667 root -8 0 0K 16K mdwait 0:00 0.00% md31
3223 root 40 0 58432K 2804K select 0:00 0.00% mgd
201 root -8 0 0K 16K mdwait 0:00 0.00% md5
405 root -8 0 0K 16K mdwait 0:00 0.00% md19
680 root -8 0 0K 16K mdwait 0:00 0.00% md32
602 root -8 0 0K 16K mdwait 0:00 0.00% md26
230 root -8 0 0K 16K mdwait 0:00 0.00% md7
1854 root 8 0 2400K 1228K nanslp 0:00 0.00% smartd
563 root -8 0 0K 16K mdwait 0:00 0.00% md23
143 root -8 0 0K 16K mdwait 0:00 0.00% md1
576 root -8 0 0K 16K mdwait 0:00 0.00% md24
172 root -8 0 0K 16K mdwait 0:00 0.00% md3
615 root -8 0 0K 16K mdwait 0:00 0.00% md27
259 root -8 0 0K 16K mdwait 0:00 0.00% md9
298 root -8 0 0K 16K mdwait 0:00 0.00% md12
51 root 4 0 0K 16K select 0:00 0.00% if_pic_listen0
56 root 12 0 0K 16K condsl 0:00 0.00% delayedexecd
31 root -80 -183 0K 16K WAIT 0:00 0.00% irq45: ehci1
52 root 4 0 0K 16K purge_ 0:00 0.00% kern_pir_proc
25 root -84 -187 0K 16K WAIT 0:00 0.00% irq16: bman0 qman0+
7 root 8 0 0K 16K - 0:00 0.00% mastership taskq
49 root 4 0 0K 16K pfenoc 0:00 0.00% if_pfe_listen
48 root 4 0 0K 16K dump_r 0:00 0.00% kern_dump_proc
59 root 8 0 0K 16K - 0:00 0.00% nfsiod 0
1449 root 8 0 0K 16K crypto 0:00 0.00% crypto
60 root 8 0 0K 16K - 0:00 0.00% nfsiod 1
61 root 8 0 0K 16K - 0:00 0.00% nfsiod 2
62 root 8 0 0K 16K - 0:00 0.00% nfsiod 3
1450 root 8 0 0K 16K crypto 0:00 0.00% crypto returns
55 root -20 0 0K 16K jsr_js 0:00 0.00% jsr_jsm
30 root 8 0 0K 16K usbtsk 0:00 0.00% usbtask
17 root -28 -131 0K 16K WAIT 0:00 0.00% swi9: Giant taskq
18 root -28 -131 0K 16K WAIT 0:00 0.00% swi9: task queue
16 root -32 -135 0K 16K WAIT 0:00 0.00% swi8: +
13 root -40 -143 0K 16K WAIT 0:00 0.00% swi6: vm
33 root -48 -151 0K 16K WAIT 0:00 0.00% swi4: ip6_mismatch+
44 root -56 -159 0K 16K WAIT 0:00 0.00% swi2: ndpisr-E
46 root -56 -159 0K 16K WAIT 0:00 0.00% swi2: ndpisr-I
20 root -68 -171 0K 16K WAIT 0:00 0.00% irq25: syspld0
15 root -84 -187 0K 16K WAIT 0:00 0.00% mpfe_drv_taskq16: +
24 root -84 -187 0K 16K WAIT 0:00 0.00% irq105: bman0
26 root -84 -187 0K 16K WAIT 0:00 0.00% irq104: qman0

 

 

Can someone clarify why there are two pfex_junos processes running and what the state "ll_blo" means - I found nothing clarifying that? Is there maybe a hung process, which is causing the additional utilization?

 

Thanks in advance,

Julian

5 REPLIES 5
Junos

Re: High CPU usage/Process question (4x 4300-48t and 1x 4300-32f non mixed VC)

‎06-27-2019 07:58 AM

Hi,

 

PFEX is the Packet Forwarding Engine Manager process on EX4300. EX4300 is a microkernel CPU based platform, and require this process make BSD OS calls. The cause of the CPU spike in this process requires further troubleshooting, perhaps a JTAC case is needed.

 

Following symptomts data could be checked - 

1. Any MAC move notifications on the system?

2. Any scheduler slips?

3. Non Juniper/unsupported optics? 

       - request pfe execute command "show sfp list" target fpc<no>

       -  show chassis pic fpc-slot <> pic-slot <>

4. Check uKernel CPU util for all FPCs in VC

      - request pfe execute command "show threads" target fpc<no>

      - request pfe execute command "show sched" target fpc<no>

      - request pfe execute command "show sfp list" target fpc<no>

Junos

Re: High CPU usage/Process question (4x 4300-48t and 1x 4300-32f non mixed VC)

‎06-28-2019 01:00 AM

Hello pranavs,

 

thank you for your answer!

 

 

1)

I have configured this VC to send syslog messages to our graylog. This is the current configuration:

 

root@vx-core-b6-01# show system syslog
user * {
any emergency;
}
host 192.168.1.100 {
authorization error;
daemon error;
ftp none;
security error;
kernel error;
user none;
dfc error;
external error;
firewall none;
pfe error;
conflict-log error;
change-log any;
interactive-commands none;
source-address 192.168.1.1;
}
file messages {
any notice;
authorization info;
}
file interactive-commands {
interactive-commands any;
}

 

In order to see MAC move notifications in our graylog, do I have to explicit configure the following or will the VC send strange MAC movements nevertheless to our graylog:

 

[edit switch-options]
user@switch# set mac-notification
 
Currently there is nothing regarding mac move notifications configured and I see no messages regarding mac move in our graylog.
 
 
2)
Regarding scheduler slips - there are no syslog entries regarding scheduler slips. Can you maybe clarify what to check?
 
 
3)
After checking - there are indeed unsupported optics in use. There are even multiple syslog messages regarding that:
 
vx-core-b6-01 fpc1 [EX-BCM PIC] ex_bcm_pic_optics_diags: Failed to read eeprom of Optic 1 of Pic 2
 
and 
 
vx-core-b6-01 fpc1 Link 51 FAILED
 
But the device, which is on that optic/sfp, is reachable and functional - Should I change the optic as soon as possible?
 
Here are some informations on that sfp:

{master:0}

root@vx-core-b6-01> show interfaces diagnostics optics ge-1/2/1

(no output here, because the VC cannot read data from the unsupported sfp?)

 

{master:0}
root@vx-core-b6-01> show interfaces ge-1/2/1
Physical interface: ge-1/2/1, Enabled, Physical link is Up
Interface index: 857, SNMP ifIndex: 813
Description: L12/08
Link-level type: Ethernet, MTU: 1514, LAN-PHY mode, Speed: 1000mbps, Duplex: Full-Duplex, BPDU Error: None, Loop Detect PDU Error: None, Ethernet-Switching Error: None,
MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled, Flow control: Enabled, Auto-negotiation: Enabled, Remote fault: Online, Media type: Fiber
Device flags : Present Running
Interface flags: SNMP-Traps Internal: 0x0
Link flags : None
CoS queues : 12 supported, 12 maximum usable queues
Current address: c0:42:d0:c0:d2:78, Hardware address: c0:42:d0:c0:d2:78
Last flapped : 2019-04-15 06:20:36 UTC (10w4d 00:15 ago)
Input rate : 83192 bps (66 pps)
Output rate : 2520680 bps (254 pps)
Active alarms : None
Active defects : None
PCS statistics Seconds
Bit errors 0
Errored blocks 0
Ethernet FEC statistics Errors
FEC Corrected Errors 0
FEC Uncorrected Errors 0
FEC Corrected Errors Rate 0
FEC Uncorrected Errors Rate 0
Interface transmit statistics: Disabled

Logical interface ge-1/2/1.0 (Index 776) (SNMP ifIndex 814)
Flags: Up SNMP-Traps 0x0 Encapsulation: Ethernet-Bridge
Input packets : 2774878
Output packets: 333076
Protocol eth-switch, MTU: 1514

 
Here is some additional output regarding the pic-slot 2 fpc-slot 1:
 

root@vx-core-b6-01> show chassis pic pic-slot 2 fpc-slot 1
FPC slot 1, PIC slot 2 information:
Type 4x 1G/10G SFP/SFP+
State Online
PIC version 3.2
Uptime 74 days, 18 minutes, 28 seconds

PIC port information:
Fiber Xcvr vendor Wave- Xcvr JNPR
Port Cable type type Xcvr vendor part number length Firmware Rev
0 10GBASE SR MM OEM GBIC-SFP+-SR-JEX 850 nm 0.0 REV 01
1 GIGE 1000LX10 SM FINISAR CORP. FTRJ1319P1BTL-PT 1310 nm 0.0
2 GIGE 1000SX MM Juniper OEM SFP-GE-SX-J 850 nm 0.0 REV 01
3 GIGE 1000SX MM Juniper OEM SFP-GE-SX-JEX 850 nm 0.0 REV 01

 

The "faulty" sfp should be the one from "FINISAR CORP." - I have read in another thread, that some "LX10" Gbics are not supported and could cause trouble, but Iam unable to find the thread right now.

 

4)

I am not able to use these commands. Can you please clarify?

Our VC uses Junos with ELS.

 

Greetings,

Julian_V

Junos

Re: High CPU usage/Process question (4x 4300-48t and 1x 4300-32f non mixed VC)

‎06-28-2019 07:37 AM

Based on the data you have shared, it makes sense to atleast temporarily remove the optics on ge-1/2/1 and see whether CPU util subsides.

 

For the "show sched" and "show thread" commands, what is the exact command you are using? If you suspect FPC1 contributing to the high CPU, you want to use the following -

 

labroot@router> start shell pfe network fpc1


BSD platform (QorIQ P202H processor, 0MB memory, 0KB flash)

(vty)# show threads
PID PR State     Name                   Stack Use  Time (Last/Max/Total) cpu
--- -- -------   ---------------------  ---------  ---------------------
  1 H  asleep    Maintenance            896/50272  0/8/8 ms  0%
  2 L  running   Idle                  1976/50280  2/726/3561366458 ms 84%
  3 H  asleep    Timer Services        1944/50280  0/76/5849591 ms  0%
<...>

(vty)# show threads    
PID PR State     Name                   Stack Use  Time (Last/Max/Total) cpu
--- -- -------   ---------------------  ---------  ---------------------
  1 H  asleep    Maintenance            896/50272  0/8/8 ms  0%
  2 L  running   Idle                  1976/50280  0/726/3561489801 ms 84
<...>

(vty)# show sfp list    
SFP Toolkit summary:
wakeup count: 4188089, debug: 0, periodic enabled: 1, diagnostics enabled: 1
  thread: 0x049ec378, itable: 0x049df0f8, itable iterator: 0, sem: 0x049f96f8
  polling interval delay:  1000 ms, polling max cpu:  100 ms
  poll for diags every  3 wakeups , SFPs polled for diags last time:  4
  last periodic CPU time:    1 ms, maximum periodic CPU time:     21 ms
  max SFPs to poll per interval: 0
SFP Toolkit syslog throttling parameters:
  period: 120 samples , disable threshold:  10, enable threshold   0
  SFP periodic diagnostics polling count: 65535

                                                                         diag
Index   Name             Presence     ID Eprom  PNO         SNO          calibr   Toxic
-----   --------------   ----------   --------  ----------  ----------- -------  -------
    1   Uplink SFP+ PIC(0)      Present   Complete  740-031980  AA1745300XB     int    Unknown
    2   Uplink SFP+ PIC(1)      Present   Complete  740-021308  ANM1PUP         int    Unknown
    3   Uplink SFP+ PIC(2)      Present   Complete  740-031980  MUC2CL8         int    Unknown
    4   Uplink SFP+ PIC(3)      Present   Complete  740-031980  ALP12X4         int    Unknown

I2C Acceleration table 
Index   Name             Presence     ID Eprom     Reg ID    I2C Master    I2C Group
-----   --------------   ----------   --------    --------    ------------   -------
<...>

From the above output, get the index for ge-1/2/1, and run "show sfp <index> info", like following example- 

(vty)# show sfp 1 info      
  index:                  0x01
  sfp name:               Uplink SFP+ PIC
  pic context:            0x05B6CE30
  id mem scanned:         true
  linkstate:              Up
  sfp_present:            true
  sfp_changed:            false
  i2c failure count:      0x0
  diag polling count:     0x5BC1
  no diag polling from RE:0x0
  run_periodic:           false

 

 

Regarding you questy about the sched slips, one of the symptoms that occur with high kernel CPU usage are messages with RPD_SCHED_SLIP in the logs, that is what you could grep on.

show log messages | match RPD_SCHED_SLIP

Jul 30 12:24:11 m10i-a-re0 rpd[1304]: RPD_SCHED_SLIP: 7 sec scheduler slip, user: 1 sec 339119 usec, system: 0 sec, 0 usec       Jul 30 12:25:29 m10i-a-re0 rpd[1304]: RPD_SCHED_SLIP: 4 sec scheduler slip, user: 1 sec 431622 usec, system: 0 sec, 0 usec       Jul 30 12:25:37 m10i-a-re0 rpd[1304]: RPD_SCHED_SLIP: 4 sec scheduler slip, user: 1 sec 528918 usec, system: 0 sec, 74784 usec       
Jul 30 17:47:55 m10i-a-re0 rpd[1304]: RPD_SCHED_SLIP: 4 sec scheduler slip, user: 0 sec 526784 usec, system: 0 sec, 4608 usec

Jul 30 17:48:03 m10i-a-re0 rpd[1304]: RPD_SCHED_SLIP: 4 sec scheduler slip, user: 1 sec 285283 usec, system: 0 sec, 19077 usec

 

Junos

Re: High CPU usage/Process question (4x 4300-48t and 1x 4300-32f non mixed VC)

‎06-30-2019 11:52 PM

Hello pranvas,

 

thank you very much for your insightful and very detailed reply.

 

I could collect the following informations:

 

 

root@vx-core-b6-01> start shell pfe network fpc1

 

BSD platform (QorIQ P202H processor, 0MB memory, 0KB flash)

(vty)# show threads
PID PR State Name Stack Use Time (Last/Max/Total) cpu
--- -- ------- --------------------- --------- ---------------------
1 H asleep Maintenance 904/50272 0/0/0 ms 0%
2 L running Idle 41864/50280 1/859/5810875810 ms 87%
3 H asleep Timer Services 2880/50272 0/119/7273751 ms 0%
5 L asleep Ukern Syslog 832/52328 0/0/0 ms 0%
6 L asleep Sheaf Background 2008/50272 0/54/13518 ms 0%
7 M asleep mac_db 808/56416 0/0/0 ms 0%
8 M asleep QSFP 41568/52328 0/231/2530975 ms 0%
9 M asleep SFP 6744/52320 0/96/232068 ms 0%
10 L asleep Firmware Upgrade 832/52320 0/0/0 ms 0%
11 L asleep Syslog 3456/52328 0/48/3257 ms 0%
12 M asleep Periodic 1928/56416 0/115/192475 ms 0%
13 H asleep TTP Receive 3608/52328 0/142/14222264 ms 0%
14 H asleep TTP Transmit 808/52320 1/1/1 ms 0%
15 L asleep DFW Alert 2336/52320 0/49/59387 ms 0%
16 M asleep L2ALM Manager 43904/56424 0/222/4657739 ms 0%
17 M asleep ARP Proxy 2136/56424 0/104/713219 ms 0%
18 M asleep RPM Msg thread 816/56416 0/0/0 ms 0%
19 H asleep Pfesvcsor 1936/56424 0/118/41958 ms 0%
20 L asleep NH Probe Service 816/52320 0/0/0 ms 0%
21 L asleep ICMP6 Input 1080/52320 2/2/2 ms 0%
22 L asleep IP6 Option Input 3360/52328 1/1/1 ms 0%
23 L asleep ICMP Input 3384/52328 2/2/2 ms 0%
24 L asleep IP Option Input 3344/52320 0/0/0 ms 0%
25 M asleep IGMP Input 3344/52328 0/0/0 ms 0%
26 M asleep PRIVATE PFE Manager 4568/64608 0/46/1759 ms 0%
27 M asleep PPM Manager 6712/56416 0/100/103003 ms 0%
28 M asleep PPM Data thread 2080/64608 0/111/498920 ms 0%
29 M asleep VRRP Manager 3384/56424 8/8/8 ms 0%
30 L ready Virtual Console 3648/64616 0/24/45 ms 0%
31 L asleep Virtual Console (private) 2560/64616 0/0/0 ms 0%
33 L asleep Console 3624/64616 4/4/4 ms 0%
34 M asleep PFE Manager 54776/64608 0/521/26630505 ms 0%
35 L asleep SFlow sample deamon 2008/56424 0/117/257284 ms 0%
36 M asleep bcmDPC 976/57440 0/0/0 ms 0%
37 M asleep socdmadesc.0 2184/57440 0/120/4403265 ms 0%
38 M asleep bcmIbodSync.0 2104/57448 0/116/2243868 ms 0%
40 M asleep bcmXGS3AsyncTX 1080/57440 2/2/2 ms 0%
41 M asleep bcmL2MOD.0 2520/57448 0/119/1835539 ms 0%
42 L asleep pfe_bcm_age 6320/64616 0/148/878017 ms 0%
43 L asleep pfe_bcm_sync 1592/56416 0/117/114075 ms 0%
44 M asleep PIC Periodic 2992/56424 1/678/22693966 ms 0%
45 M asleep PIC 808/52320 0/0/0 ms 0%
46 M asleep bcmLINK.0 43472/106600 2/729/383510049 ms 5%
47 M asleep bcmCNTR.0 42152/57448 0/540/36695080 ms 0%
48 M asleep CMVC 42376/64608 1/15/33 ms 0%
49 M asleep TNPC CM 45512/64608 0/120/1850993 ms 0%
50 M asleep PFE Statistics 3584/64616 1/122/6608168 ms 0%
51 L asleep AS Packet Process 2912/56416 7/7/12 ms 0%
52 M asleep bulkget Manager 6952/56416 0/58/9020 ms 0%
53 L asleep Cattle-Prod Daemon 3744/64608 0/0/0 ms 0%
54 L asleep Virtual Console 2560/64616 0/0/0 ms 0%

 


(vty)# show sfp list
SFP Toolkit summary:
wakeup count: 6645201, debug: 0, periodic enabled: 1, diagnostics enabled: 1
thread: 0x4e2fe80, itable: 0x4e22bc0, itable iterator: 0, sem: 0x4e30250
polling interval delay: 1000 ms, polling max cpu: 100 ms
poll for diags every 3 wakeups , SFPs polled for diags last time: 4
last periodic CPU time: 1 ms, maximum periodic CPU time: 96 ms
max SFPs to poll per interval: 0
SFP Toolkit syslog throttling parameters:
period: 120 samples , disable threshold: 10, enable threshold 0
SFP periodic diagnostics polling count: 65535

diag
Index Name Presence ID Eprom PNO SNO calibr Toxic
----- -------------- ---------- -------- ---------- ----------- ------- -------
1 Uplink SFP+ PIC(0) Present Complete 740-021308 M4301605 int Unknown
2 Uplink SFP+ PIC(1) Present Complete NON-JNPR P8R11ST EXT** Unknown
3 Uplink SFP+ PIC(2) Present Complete 740-011613 1171324439 int Unknown
4 Uplink SFP+ PIC(3) Present Complete 740-011613 1174522093 int Unknown

I2C Acceleration table
Index Name Presence ID Eprom Reg ID I2C Master I2C Group
----- -------------- ---------- -------- -------- ------------ -------


This should be a supported Gbic:

(vty)# show sfp 1 info
index: 0x01
sfp name: Uplink SFP+ PIC
pic context: 0x5f370e8
id mem scanned: true
linkstate: Up
sfp_present: true
sfp_changed: false
i2c failure count: 0x0
diag polling count: 0xE198
no diag polling from RE:0x0
run_periodic: false


This should be the unsupported one:

(vty)# show sfp 2 info
index: 0x02
sfp name: Uplink SFP+ PIC
pic context: 0x5f370e8
id mem scanned: true
linkstate: Up
sfp_present: true
sfp_changed: false
i2c failure count: 0x1
diag polling count: 0xCCB1
no diag polling from RE:0x0
run_periodic: false

 

(vty)# show sched
Total uptime 77+00:18:38, (6653918068 ms), 3496177384 thread dispatches
CPU load is 26% (5 second), 12% (1 minute)
Total network interrupt time 0 (usec)

CPU Name Time(ms)
87% Idle 5811219219
12% Threads 842698849
0% ISR 0


root@vx-core-b6-01> show log messages | match RPD_SCHED_SLIP

{master:0}

No output here.

 

Can you see something strange or do you think the Gbic could be the cause? Right now Iam not able to unplug the gbic because there is a storage behind it which is in use.

 

And again, thank you for your time,

Julian_V

Junos
Solution
Accepted by topic author Julian_V
‎07-03-2019 12:53 AM

Re: High CPU usage/Process question (4x 4300-48t and 1x 4300-32f non mixed VC)

‎07-01-2019 07:02 AM

From this output, i2c failure count = 0x1, points to failure in I2C bus - which is used to provide temperature, voltage and IDEEPROM info. Rather than unsupported optic, this optic may have a minor HW fault. When its feasible, it would be worthwile to try and un-plug this optic and monitor the CPU usage and PFEX process usage to confirm or rule out this being the root cause.

 

 

(vty)# show sfp 2 info
index: 0x02
sfp name: Uplink SFP+ PIC
pic context: 0x5f370e8
id mem scanned: true
linkstate: Up
sfp_present: true
sfp_changed: false
i2c failure count: 0x1
diag polling count: 0xCCB1
no diag polling from RE:0x0
run_periodic: false