SRX Services Gateway
Highlighted
SRX Services Gateway

Entire FPC restart on both node on SRX5800 for second time in this month?

‎09-18-2017 05:02 AM

Hi All,

 

 

I'm facing the entire FPC reboot itself on both node in cluster. Below is the log. Currently i'm already configured RE protect (tcp) by using policer. Is there any way to protect it also from udp or broadcast storm that can make the FPC CPU high?

 

{secondary:node1}

test@node2> ...ch "Sep 15" | match "FPC" | match "cpu"

Sep 15 14:06:22.606 2017  node1 PERF_MON: %USER-2-RTPERF_CPU_THRESHOLD_EXCEEDED: FPC 13 PIC 0 CPU utilization exceeds threshold, current value=99

Sep 15 14:06:27.838 2017  node1 PERF_MON: %USER-2-RTPERF_CPU_THRESHOLD_EXCEEDED: FPC 12 PIC 0 CPU utilization exceeds threshold, current value=99

Sep 15 14:06:28.511 2017  node1 PERF_MON: %USER-2-RTPERF_CPU_THRESHOLD_EXCEEDED: FPC 13 PIC 0 CPU utilization exceeds threshold, current value=88

Sep 15 14:06:33.093 2017 node1 PERF_MON: %USER-2-RTPERF_CPU_THRESHOLD_EXCEEDED: FPC 13 PIC 0 CPU utilization exceeds threshold, current value=91

Sep 15 14:06:56.145 2017  node1 PERF_MON: %USER-2-RTPERF_CPU_THRESHOLD_EXCEEDED: FPC 15 PIC 3 CPU utilization exceeds threshold, current value=86

Sep 15 14:06:58.377 2017  node1 PERF_MON: %USER-2-RTPERF_CPU_THRESHOLD_EXCEEDED: FPC 15 PIC 3 CPU utilization exceeds threshold, current value=85

 

 

 

{primary:node0}

test@node0> show log messages.3.gz | match "Sep 15" | match "FPC" | match "cpu"

Sep 15 14:05:34.638 2017  node0 PERF_MON: %USER-2-RTPERF_CPU_THRESHOLD_EXCEEDED: FPC 0 PIC 1 CPU utilization exceeds threshold, current value=99

Sep 15 14:05:39.562 2017  node0 PERF_MON: %USER-2-RTPERF_CPU_THRESHOLD_EXCEEDED: FPC 0 PIC 1 CPU utilization exceeds threshold, current value=99

Sep 15 14:05:42.497 2017  node0 PERF_MON: %USER-2-RTPERF_CPU_THRESHOLD_EXCEEDED: FPC 1 PIC 0 CPU utilization exceeds threshold, current value=99

Sep 15 14:05:43.667 2017  node0 PERF_MON: %USER-2-RTPERF_CPU_THRESHOLD_EXCEEDED: FPC

 

Thanks and appreciate any feedback

7 REPLIES 7
Highlighted
SRX Services Gateway

Re: Entire FPC restart on both node on SRX5800 for second time in this month?

‎09-19-2017 02:45 AM

How do we know if the CPU usage caysed the FPC restart? Do you see any core or any other messages?

Thanks,
Suraj
Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too
Highlighted
SRX Services Gateway

Re: Entire FPC restart on both node on SRX5800 for second time in this month?

‎09-19-2017 07:58 AM

Hi rsuraj,

 

Is there any method that we can prevent broadcast storm on SRX. I'm need second opinion if you or anybody have

 

{primary:node0}
test@srx5800> show system queues node 1 | match arpintrq
arpintrq                        0         3000        0       50 109804063 ----> Engineering said this output as evidence broadcast storm

 

This ARP flood can cause CP and RE CPU to go high and in turn cause Control link to flap , which happened in this case as well.

 

 

Thanks

Highlighted
SRX Services Gateway

Re: Entire FPC restart on both node on SRX5800 for second time in this month?

‎09-19-2017 08:49 AM
The question I have is, why so many packets hitting Node 1 which is secondary Node. Ideally the connected switch/device should not forward any packets to secondary node.
Thanks,
Suraj
Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too
Highlighted
SRX Services Gateway

Re: Entire FPC restart on both node on SRX5800 for second time in this month?

‎09-19-2017 03:55 PM
Hi,

Before the incident happen the node1 is active and node 0 is backup.
Highlighted
SRX Services Gateway

Re: Entire FPC restart on both node on SRX5800 for second time in this month?

‎09-22-2017 05:10 AM

Hi all,

 

 

just to update this issue. As per inform by Engineering the issue due to broadcast storm that make the communication between RE and FPC broke.

 

May i know whether SRX5800 have feature that can block from impact broadcast storm?

 

Thanks and appreciate any feedback

Highlighted
SRX Services Gateway

Re: Entire FPC restart on both node on SRX5800 for second time in this month?

‎09-24-2017 04:41 AM

Not on the SRX.  And to be effective storm control really needs to enabled at the access layer to stop the storm at the source.  You would do this on your EX stacks.

 

https://www.juniper.net/documentation/en_US/junos/topics/concept/rate-limiting-storm-control-underst...

Steve Puluka BSEET - Juniper Ambassador
IP Architect - DQE Communications Pittsburgh, PA (Metro Ethernet & ISP)
http://puluka.com/home
Highlighted
SRX Services Gateway

Re: Entire FPC restart on both node on SRX5800 for second time in this month?

‎09-24-2017 11:23 PM

Hi all,

 

Anyone can share info for this internal PR  PR 1236354.

 

 

Thanks

Feedback