Junos
Junos

Linecard restart repeating.

‎06-17-2019 07:22 PM

We have currently had an issue on line card reeboted few times on differnet set of juniper mx's

 

Evertime we see the following message before the LC goes down:


Jun 16 11:11:59.000 CAMGW-R01 : %PFE-5: fpc4 user.notice logger: /usr/bin/pfe-app-wrapper: Starting pfe application /var/app/smpc.elf
Jun 16 11:12:06.524 CAMGW-R01 chassisd[13727]: %DAEMON-5-CHASSISD_SNMP_TRAP10: SNMP trap generated: Fru Offline (jnxFruContentsIndex 7, jnxFruL1Index 5, jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FPC: MPC9E 3D @ 4/*/*, jnxFruType 3, jnxFruSlot 4, jnxFruOfflineReason 23, jnxFruLastPowerOff 2928237, jnxFruLastPowerOn 1065858324)

 

 

12 REPLIES 12
Junos

Re: Linecard restart repeating.

‎06-17-2019 07:26 PM

Hi xavierpaul,

 

Do you see any core dumps generated on the box? also check the nvram logs from the fpc shell to get more clues.  

Junos

Re: Linecard restart repeating.

‎06-17-2019 07:35 PM

you may refer this kb to know the reason for FPC reboot, here the reason is reconnect 

https://kb.juniper.net/InfoCenter/index?page=content&id=KB19695&smlogin=true&actp=search

we might need to check the logs to confirm the exact reason for FRU reconnect 

Junos

Re: Linecard restart repeating.

‎06-17-2019 08:14 PM

looks like a segmentation fault issue. please get the FPC syslog and nvram outputs:

start shell pfe network fpc<fpc number>

show nvram

show syslog.

 

Tks,

Abhishek.

Junos

Re: Linecard restart repeating.

‎06-17-2019 08:20 PM

Hi Xavierpaul,

There can be multiple reasons for reboot. You need to provide moe info to further isolate

 

>check for core-dump using 'show system core-dump' and this file has to analysed.

>check for any errors below Cb and FPC link using "show chassis ethernet-switch statistics"

>Check nvram logs using   request pfe execute target fpc4 command "show nvram" 

> The issue could be with the improper seating of card in slot 4

> We cannot rule out a hardware problem if this is happening repeatedly.

Junos

Re: Linecard restart repeating.

‎06-17-2019 08:22 PM

Hi Xavier,

 

Below are the commands to check 

 

show system core-dump  <<< will show if there are any new core dumps generated

 

<<< To check the fpc nvram logs 

start shell pfe network fpc<>

#show nvram                           <<< this will give us some clue on why the fpc rebooted 

#show syslog messages 

Junos

Re: Linecard restart repeating.

‎06-17-2019 10:42 PM

Have seen such condition with Memory issue in the FPC. Check for any DDRIF memory issue or any UCODE data error in messages log or Syslog messages of the FPC. Also, check out for thread usage in the FPC, CPU graph or memory fragmentation of the FPC. Any core dump for this FPC? Any sort of DDOS violation for this FPC? We have seen issues in software pertaining to the thread usage/core dump generated. Please let know of the version involved.

 

E.g:

 

Segmentation Fault!

/usr/bin/pfe-app-wrapper: Starting pfe application /var/app/smpc.elf

 

Ukern boot

[LOG] jdid_main: JDID mode check failed

[LOG] iffpc_jam_core_module_init: Registering valid jam vectors

[LOG] Set the IP IRI for table #1 to 0x22000080

[LOG] IPV4 Init: Set the IP IRI to 0x22000080

[LOG] ddos_issu_helper_register: issu state is 0 at ddos startup

[LOG] ddos sock support init ...

[LOG] ddos sock connection proto in use 7 ...

[LOG] RSMON rsmon_msg_thread_init

[LOG] if_module_init: Zeroing out jam vectors

Junos

Re: Linecard restart repeating.

‎06-18-2019 08:43 AM

Not an issue to be discussed in JNET forum. Please open JTAC ticket. 


Mengzhe Hu
JNCIE x 3 (SP DC ENT)
Junos

Re: Linecard restart repeating.

‎06-18-2019 12:33 PM

Hi Xavier 

 

Normally for a FPC card rebooting issue, we need to check more information from the PFE shell starting with "show syslog messages“ and "show nvram". Also there could be core dump file generated. It is better for you to open a JTAC ticket with RSI and /var/log uploaded to the case.

 

Best

 

Mu

Junos

Re: Linecard restart repeating.

‎06-19-2019 05:44 PM

Thanks Abhishek,

 

Nvram does shows segmentataion fault and there is a fpc core too along with CPU spikes . we have got our partner team to get a jtac case opened. Model: mx2020 Junos: 15.1F5-S4.6

 

Ukern boot
[LOG] jdid_main: JDID mode check failed
[LOG] iffpc_jam_core_module_init: Registering valid jam vectors
[LOG] Set the IP IRI for table #1 to 0x80000014
[LOG] IPV4 Init: Set the IP IRI to 0x80000014
[LOG] ddos_issu_helper_register: issu state is 0 at ddos startup
[LOG] ddos sock support init ...
[LOG] ddos sock connection proto in use 7 ...
[LOG] RSMON rsmon_msg_thread_init
[LOG] if_module_init: Zeroing out jam vectors
--------------------------------------
Segmentation Fault!
/usr/bin/pfe-app-wrapper: Starting pfe application /var/app/smpc.elf


So this segmentataion fault is memory issue on the fpc.

Thanks,
Paul

Junos

Re: Linecard restart repeating.

‎06-19-2019 05:48 PM

yes saw similar nvram messages:

 

Ukern boot
[LOG] jdid_main: JDID mode check failed
[LOG] iffpc_jam_core_module_init: Registering valid jam vectors
[LOG] Set the IP IRI for table #1 to 0x80000014
[LOG] IPV4 Init: Set the IP IRI to 0x80000014
[LOG] ddos_issu_helper_register: issu state is 0 at ddos startup
[LOG] ddos sock support init ...
[LOG] ddos sock connection proto in use 7 ...
[LOG] RSMON rsmon_msg_thread_init
[LOG] if_module_init: Zeroing out jam vectors
--------------------------------------
Segmentation Fault!
/usr/bin/pfe-app-wrapper: Starting pfe application /var/app/smpc.elf

Junos

Re: Linecard restart repeating.

‎06-19-2019 06:23 PM

Segmentation Fault is the fault raised by hardware whenever a program or a process tries to read or write a restricted memory Location. This fault will be notified to operating system by the OS kernel. OS kernel sends this fault to the offending process where the process after receiving the fault solves it within some time otherwise gets crashed. So in this case segmentation fault looks like to be the effect and not the cause. Since you are saying there was high CPU probably the process that caused the high cpu casued the crash/segfault. FPC core analysis would explain further what the offending process was, however, a core would be best analyzed on a JTAC case .

Junos
Solution
Accepted by topic author xavierpaul
‎06-21-2019 10:53 PM

Re: Linecard restart repeating.

‎06-19-2019 07:19 PM

MPC9E with 15.1F5-S4.6 code I believe your MX's are seeing a lot of route churns causing the high cpu,
perhaps you can take a look on this PR:

https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR1305994