SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD

last person joined: yesterday

Ask questions and share experiences about the SRX Series, vSRX, and cSRX.

Back to discussions

Expand all | Collapse all

SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD

Jump to Best Answer

1. SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD

Recommend

Erdem

Posted 05-14-2018 10:04

I have a branch office with a cluster of SRX220H2s that recently started exhibiting flapping issues with the secondary node in the cluster. Every 5-10 minutes, the secondary node will be kicked out of the cluster, then added several minutes later, before starting the cycle over. We've tried hard booting the secondary node to see if it would join and stick in the cluster, but it doesn't seem to help.

Additionally, I've noticed that the control-plane cpu on the primary node is consistently at 100%, with the jsrpd process consuming an awful amount of resources. We have a number of essentially identical branch clusters elsewhere, none of which have jsrpd consuming high resources. I know that that process is involved with the cluster process, in terms of messaging. Checking the jsrpd logs, I'm seeing something very unusual:

May 14 16:55:04 TCP-S: accepted client connection.
May 14 16:55:04 TCP-S: TCP client from 130.16.0.1/56547 connected
May 14 16:55:04 TCP-S: TCP peer closed connection
May 14 16:55:04 last message repeated 100 times (hit threshold of (100))
May 14 16:55:04 last message repeated 200 times (hit threshold of (200))
May 14 16:55:04 last message repeated 300 times (hit threshold of (300))
May 14 16:55:04 last message repeated 400 times (hit threshold of (400))
May 14 16:55:04 last message repeated 500 times (hit threshold of (500))
May 14 16:55:04 last message repeated 600 times (hit threshold of (600))
May 14 16:55:05 last message repeated 700 times (hit threshold of (700))
May 14 16:55:05 last message repeated 800 times (hit threshold of (800))

Here's the system process extensive command output:

show system processes extensive
node0:
--------------------------------------------------------------------------
last pid: 47616;  load averages:  1.28,  1.26,  1.42  up 431+22:43:27    16:59:15
140 processes: 19 running, 108 sleeping, 2 zombie, 11 waiting

Mem: 210M Active, 149M Inact, 1036M Wired, 145M Cache, 112M Buf, 432M Free
Swap:

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
 1403 root        5  76    0   996M 58812K RUN    0    ??? 102.20% flowd_octeon_hm
 1406 root        1 139    0 14096K  7032K RUN    0 727.7H 76.66% jsrpd
   22 root        1 171   52     0K    16K RUN    0 7574.2  0.00% idle: cpu0
   23 root        1 -20 -139     0K    16K RUN    0 118.8H  0.00% swi7: clock
    5 root        1 -16    0     0K    16K rtfifo 0  42.7H  0.00% rtfifo_kern_recv
   25 root        1 -40 -159     0K    16K WAIT   0  40.4H  0.00% swi2: netisr 0
 1413 root        1  76    0 12452K  5768K select 0  33.9H  0.00% license-check

show chasis cluster interfaces:

Control link status: Up

Control interfaces:
    Index   Interface        Status   Internal-SA
    0       fxp1             Up       Disabled

Fabric link status: Up

Fabric interfaces:
    Name    Child-interface    Status
                               (Physical/Monitored)
    fab0    ge-0/0/5           Up   / Up
    fab0
    fab1    ge-3/0/5           Up   / Up
    fab1

Redundant-ethernet Information:
    Name         Status      Redundancy-group
    reth0        Up          1
    reth1        Up          1
    reth2        Up          1

Redundant-pseudo-interface Information:
    Name         Status      Redundancy-group
    lo0          Up          0

Interface Monitoring:
    Interface         Weight    Status    Redundancy-group
    ge-3/0/0          255       Down      1
    ge-0/0/0          255       Up        1

{primary:node0}

last 100 of show log chassisd

show log chassisd | last 100
May 14 16:39:58 SCC: pseudo_create_devs_swfab: Skipping creation of swfab1, since fabric presence is set to true
May 14 16:39:58 SCC: lcc_detach_interfaces_not_online lcc 1
May 14 16:39:58 CHASSISD_IFDEV_DETACH_FPC: ifdev_detach_fpc(3)
May 14 16:39:58 CHASSISD_IFDEV_DETACH_FPC: ifdev_detach_fpc(4)
May 14 16:39:58 CHASSISD_IFDEV_DETACH_FPC: ifdev_detach_fpc(5)
May 14 16:40:06 SCC: pfpc ready fpc 3 i2c 1897
May 14 16:40:06 SCC: fpc 3 clean, bringing online
May 14 16:40:06 SCC: lcc_send_fpc_online_cmd_generic:  lcc 1 fpc 0
May 14 16:40:06 SCC: pic_online_req for fpc 3, pic 0  lcc_slot 1 in lcc_recv_pic_online_req
May 14 16:40:06 SCC: lcc_send_pic_online_ack: On Switch-chassis: fpc 3 pic 0 pic_type 0x669 msg_len 20 tlv_len 0
May 14 16:40:06 SCC: From SCC send: fru 13361152 lcc_slot 1 online ack to LCC
May 14 16:40:06 SCC: From Switch-Chassis send: fpc 3 pic 0 online ack to LCC
May 14 16:40:08 SCC: lcc_recv_pic_attach: pic attach pic 0, flags 0x0, portcount 8, fpc 3
May 14 16:40:08 SCC: pic_set_online: i2c 0x669 pic 0 fpc 3 state 5 in_issu 0
May 14 16:40:08 SCC:  pic_type=1641 pic_slot=0 fpc_slot=3 pic_i2c_id=1641

May 14 16:40:08 SCC: fpc slot 3 pic_present 0x0 => 0x1
May 14 16:40:08 SCC: FPC 3 PIC 0, attaching clean
May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 0

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 0
May 14 16:40:08 SCC: Created pic for ge-3/0/0

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 1

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 1
May 14 16:40:08 SCC: Created pic for ge-3/0/1

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 2

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 2
May 14 16:40:08 SCC: Created pic for ge-3/0/2

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 3

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 3
May 14 16:40:08 SCC: Created pic for ge-3/0/3

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 4

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 4
May 14 16:40:08 SCC: Created pic for ge-3/0/4

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 5

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 5
May 14 16:40:08 SCC: Created pic for ge-3/0/5

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 6

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 6
May 14 16:40:08 SCC: Created pic for ge-3/0/6

May 14 16:40:08 SCC: Creating pic entry, baseport 0, nports 8, port 7

May 14 16:40:08 SCC: create_pic_entry: pic i2c 0x669, hw qs 8 supported qs 8, flags 0x0, pic port 7
May 14 16:40:08 SCC: Created pic for ge-3/0/7

May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/0
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/0
May 14 16:40:08 SCC: ge-3/0/0: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/1
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/1
May 14 16:40:08 SCC: ge-3/0/1: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/2
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/2
May 14 16:40:08 SCC: ge-3/0/2: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/3
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/3
May 14 16:40:08 SCC: ge-3/0/3: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/4
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/4
May 14 16:40:08 SCC: ge-3/0/4: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/5
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/5
May 14 16:40:08 SCC: ge-3/0/5: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/6
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/6
May 14 16:40:08 SCC: ge-3/0/6: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-3/0/7
May 14 16:40:08 SCC: ifdev_create entered ge-3/0/7
May 14 16:40:08 SCC: ge-3/0/7: large delay buffer cleared
May 14 16:40:08 SCC: fpc_is_q_neompc: no valid ideeprom for slot 3
May 14 16:40:08 SCC: fpc_is_q_sangria: no valid ideeprom for slot 3
May 14 16:40:08 SCC: PIC (fpc 3 pic 0) message operation: add. ifd count 8, flags 0x3 in mesg
May 14 16:40:08 LCC: ignoring PIC message on LCC

For the moment, I've disabled the ports on the switch for the second node (node1) that keeps flapping, just so I don't keep seeing it go on and off, but can renable if needed.

Any thoughts are appreciated!

2. RE: SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD

0 Recommend
Nellikka
Posted 05-14-2018 18:37

Reply Reply Privately
Hi,

Please provide the output of the below mentioned command:

show chassis cluster information detail

Is node0 CPU normal after disabling node1 interfaces from the Switch?

3. RE: SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD

Recommend

Erdem

Posted 05-15-2018 04:00

The control-plan cpu for node0 did not return to normal, so I ended up turning the switchports back on and permitting connectivity between the two nodes. The cluster reformed and has actually been stable, but I'm still seeing 100% cpu control-plane on node0, much of which is still consumed by the jsrpd process. I'm tempted to forcibly restart the jsrpd process on node0, but I don't know what affect that would have on the cluster or on the operational status of node0.

Here's the output of that command:

show chassis cluster information detail
node0:
--------------------------------------------------------------------------
Redundancy mode:
    Configured mode: active-active
    Operational mode: active-active
Cluster configuration:
    Heartbeat interval: 1000 ms
    Heartbeat threshold: 3
    Control link recovery: Disabled
    Fabric link down timeout: 66 sec
Node health information:
    Local node health: Healthy
    Remote node health: Healthy

Redundancy group: 0, Threshold: 255, Monitoring failures: none
    Events:
        Mar  8 18:16:20.013 : hold->secondary, reason: Hold timer expired
        Jul 20 01:42:14.787 : secondary->primary, reason: Remote yield (100/0)

Redundancy group: 1, Threshold: 255, Monitoring failures: none
    Events:
        Mar  8 18:22:52.767 : hold->secondary, reason: Hold timer expired
        Mar  8 18:22:55.679 : secondary->primary, reason: Better priority (100/1)

Redundancy group: 2, Threshold: 255, Monitoring failures: none
    Events:
        Mar  8 18:28:36.929 : hold->secondary, reason: Hold timer expired
        Mar  8 18:28:40.658 : secondary->primary, reason: Better priority (100/1)
Control link statistics:
    Control link 0:
        Heartbeat packets sent: 37289640
        Heartbeat packets received: 37179519
        Heartbeat packet errors: 0
        Duplicate heartbeat packets received: 0
    Control recovery packet count: 0
    Sequence number of last heartbeat packet sent: 37289640
    Sequence number of last heartbeat packet received: 65685
Fabric link statistics:
    Child link 0
        Probes sent: 74763886
        Probes received: 74261735
    Child link 1
        Probes sent: 0
        Probes received: 0
Switch fabric link statistics:
    Probe state : DOWN
    Probes sent: 0
    Probes received: 0
    Probe recv errors: 0
    Probe send errors: 0
    Probe recv dropped: 0
    Sequence number of last probe sent: 0
    Sequence number of last probe received: 0

Chassis cluster LED information:
    Current LED color: Green
    Last LED change reason: No failures
Control port tagging:
    Disabled

Cold Synchronization:
    Status:
        Cold synchronization completed for: N/A
        Cold synchronization failed for: N/A
        Cold synchronization not known for: N/A
        Current Monitoring Weight: 0

    Statistics:
        Number of cold synchronization completed: 0
        Number of cold synchronization failed: 0

    Events:
        Mar  8 18:20:04.632 : Cold sync for PFE  is RTO sync in process
        Mar  8 18:20:05.450 : Cold sync for PFE  is Post-req check in process
        Mar  8 18:20:07.439 : Cold sync for PFE  is Completed

Loopback Information:

    PIC Name        Loopback        Nexthop     Mbuf
    -------------------------------------------------
                    Success         Success     Success

Interface monitoring:
    Statistics:
        Monitored interface failure count: 110

    Events:
        May 14 16:00:47.618 : Interface ge-3/0/0 monitored by rg 1, changed state from Up to Down
        May 14 16:04:49.508 : Interface ge-3/0/0 monitored by rg 1, changed state from Down to Up
        May 14 16:08:41.523 : Interface ge-3/0/0 monitored by rg 1, changed state from Up to Down
        May 14 16:12:42.731 : Interface ge-3/0/0 monitored by rg 1, changed state from Down to Up
        May 14 16:16:29.162 : Interface ge-3/0/0 monitored by rg 1, changed state from Up to Down
        May 14 16:20:31.862 : Interface ge-3/0/0 monitored by rg 1, changed state from Down to Up
        May 14 16:36:14.067 : Interface ge-3/0/0 monitored by rg 1, changed state from Up to Down
        May 14 16:40:14.408 : Interface ge-3/0/0 monitored by rg 1, changed state from Down to Up
        May 14 16:46:42.724 : Interface ge-3/0/0 monitored by rg 1, changed state from Up to Down
        May 14 22:39:28.700 : Interface ge-3/0/0 monitored by rg 1, changed state from Down to Up

Fabric monitoring:
    Status:
        Fabric Monitoring: Enabled
        Activation status: Active
        Fabric Status reported by data plane: Up
        JSRPD internal fabric status: Up

Fabric link events:
        May 14 16:40:10.483 : Fabric link fab0 is down
        May 14 16:40:10.487 : Child ge-0/0/5 of fab0 is down
        May 14 16:40:10.493 : Child ge-3/0/5 of fab1 is up
        May 14 16:40:12.742 : Fabric link fab0 is up
        May 14 16:40:12.753 : Child ge-0/0/5 of fab0 is up
        May 14 16:40:13.458 : Fabric link fab1 is up
        May 14 16:40:13.473 : Child ge-3/0/5 of fab1 is up
        May 14 16:40:16.529 : Child link-0 of fab0 is up, pfe notification
        May 14 16:40:16.684 : Child link-0 of fab1 is up, pfe notification
        May 14 16:40:17.573 : Fabric link up, link status timer

Control link status: Up
    Server information:
        Server status : Running
        Server connected to None
    Client information:
        Client status : Inactive
        Client connected to None
Control port tagging:
    Disabled

Control link events:
        May 14 16:04:33.419 : Control link fxp1 is up
        May 14 16:08:27.790 : Control link down, link status timer
        May 14 16:08:41.583 : Control link fxp1 is up
        May 14 16:12:25.540 : Control link fxp1 is up
        May 14 16:16:16.535 : Control link down, link status timer
        May 14 16:16:29.293 : Control link fxp1 is up
        May 14 16:20:14.217 : Control link fxp1 is up
        May 14 16:36:00.062 : Control link down, link status timer
        May 14 16:36:14.143 : Control link fxp1 is up
        May 14 16:39:58.684 : Control link fxp1 is up

Hardware monitoring:
    Status:
        Activation status: Enabled
        Redundancy group 0 failover for hardware faults: Enabled
        Hardware redundancy group 0 errors: 0
        Hardware redundancy group 1 errors: 0

Schedule monitoring:
    Status:
        Activation status: Disabled
        Schedule slip detected: None
        Timer ignored: No

    Statistics:
        Total slip detected count: 3510
        Longest slip duration: 9(s)

    Events:
        May 15 10:32:01.782 : Detected schedule slip
        May 15 10:33:01.972 : Cleared schedule slip
        May 15 10:37:04.209 : Detected schedule slip
        May 15 10:38:04.528 : Cleared schedule slip
        May 15 10:42:06.585 : Detected schedule slip
        May 15 10:43:06.675 : Cleared schedule slip
        May 15 10:47:08.831 : Detected schedule slip
        May 15 10:48:08.890 : Cleared schedule slip
        May 15 10:52:10.837 : Detected schedule slip
        May 15 10:53:10.993 : Cleared schedule slip

node1:
--------------------------------------------------------------------------
Redundancy mode:
    Configured mode: active-active
    Operational mode: active-active
Cluster configuration:
    Heartbeat interval: 1000 ms
    Heartbeat threshold: 3
    Control link recovery: Disabled
    Fabric link down timeout: 66 sec
Node health information:
    Local node health: Healthy
    Remote node health: Healthy

Redundancy group: 0, Threshold: 255, Monitoring failures: none
    Events:
        May 14 16:30:59.516 : hold->secondary, reason: Hold timer expired

Redundancy group: 1, Threshold: 255, Monitoring failures: none
    Events:
        May 14 16:30:59.761 : hold->secondary, reason: Hold timer expired

Redundancy group: 2, Threshold: 255, Monitoring failures: none
    Events:
        May 14 16:30:59.781 : hold->secondary, reason: Hold timer expired
Control link statistics:
    Control link 0:
        Heartbeat packets sent: 65686
        Heartbeat packets received: 64764
        Heartbeat packet errors: 0
        Duplicate heartbeat packets received: 0
    Control recovery packet count: 0
    Sequence number of last heartbeat packet sent: 65686
    Sequence number of last heartbeat packet received: 37289641
Fabric link statistics:
    Child link 0
        Probes sent: 131398
        Probes received: 131397
    Child link 1
        Probes sent: 0
        Probes received: 0
Switch fabric link statistics:
    Probe state : DOWN
    Probes sent: 0
    Probes received: 0
    Probe recv errors: 0
    Probe send errors: 0
    Probe recv dropped: 0
    Sequence number of last probe sent: 0
    Sequence number of last probe received: 0

Chassis cluster LED information:
    Current LED color: Green
    Last LED change reason: No failures
Control port tagging:
    Disabled

Cold Synchronization:
    Status:
        Cold synchronization completed for: N/A
        Cold synchronization failed for: N/A
        Cold synchronization not known for: N/A
        Current Monitoring Weight: 0

    Statistics:
        Number of cold synchronization completed: 0
        Number of cold synchronization failed: 0

    Events:
        May 14 16:31:50.807 : Cold sync for PFE  is RTO sync in process
        May 14 16:31:54.200 : Cold sync for PFE  is Post-req check in process
        May 14 16:31:56.205 : Cold sync for PFE  is Completed

Loopback Information:

    PIC Name        Loopback        Nexthop     Mbuf
    -------------------------------------------------
                    Success         Success     Success

Interface monitoring:
    Statistics:
        Monitored interface failure count: 1

    Events:
        May 14 16:31:16.111 : Interface ge-0/0/0 monitored by rg 1, changed state from Down to Up
        May 14 16:31:50.756 : Interface ge-3/0/0 monitored by rg 1, changed state from Down to Up
        May 14 16:38:19.150 : Interface ge-3/0/0 monitored by rg 1, changed state from Up to Down
        May 14 22:31:04.275 : Interface ge-3/0/0 monitored by rg 1, changed state from Down to Up

Fabric monitoring:
    Status:
        Fabric Monitoring: Enabled
        Activation status: Active
        Fabric Status reported by data plane: Up
        JSRPD internal fabric status: Up

Fabric link events:
        May 14 16:31:45.984 : Child ge-0/0/5 of fab0 is down
        May 14 16:31:46.310 : Child ge-3/0/5 of fab1 is down
        May 14 16:31:46.408 : Child ge-3/0/5 of fab1 is up
        May 14 16:31:47.850 : Fabric monitoring suspension is revoked by remote node
        May 14 16:31:48.986 : Fabric link fab0 is up
        May 14 16:31:48.996 : Child ge-0/0/5 of fab0 is up
        May 14 16:31:49.737 : Fabric link fab1 is up
        May 14 16:31:49.745 : Child ge-3/0/5 of fab1 is up
        May 14 16:31:52.416 : Child link-0 of fab1 is up, pfe notification
        May 14 16:31:53.431 : Fabric link up, link status timer

Control link status: Up
    Server information:
        Server status : Inactive
        Server connected to None
    Client information:
        Client status : Inactive
        Client connected to None
Control port tagging:
    Disabled

Control link events:
        May 14 16:30:27.927 : Control link fxp1 is down
        May 14 16:30:29.828 : Control link fxp1 is down
        May 14 16:30:37.315 : Control link fxp1 is up
        May 14 16:31:14.583 : Control link fxp1 is up
        May 14 16:31:25.891 : Control link fxp1 is up

Hardware monitoring:
    Status:
        Activation status: Enabled
        Redundancy group 0 failover for hardware faults: Enabled
        Hardware redundancy group 0 errors: 0
        Hardware redundancy group 1 errors: 0

Schedule monitoring:
    Status:
        Activation status: Disabled
        Schedule slip detected: None
        Timer ignored: No

    Statistics:
        Total slip detected count: 1
        Longest slip duration: 3(s)

    Events:
        May 14 16:30:36.691 : Detected schedule slip
        May 14 16:31:37.251 : Cleared schedule slip

{primary:node0}

4. RE: SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD

0 Recommend
Nellikka
Posted 05-15-2018 20:42

Reply Reply Privately
Hi,

As per the given output, I do see lot of jsrpd scheduler slip on node0. This may be because of high RE cpu on node0. You may try to failover all RGs to node1 and observe the RE CPU on node0.

Statistics: Total slip detected count: 3510 Longest slip duration: 9(s)

control link and Fab links are connected via Switch? I see a lot of flap on May 14
5. RE: SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD
Best Answer

0 Recommend
Erdem
Posted 05-21-2018 05:15

Reply Reply Privately
Just to update and close: we still could not determine the root cause, but were able to resolve the issue by rebooting node0. The cluster reformed, the jsprd process process utilization returned to normal levels, along with the cpu utilization.

SRX

SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD

Erdem05-14-2018 10:04

Nellikka05-14-2018 18:37

Erdem05-15-2018 04:00

Nellikka05-15-2018 20:42

Erdem05-21-2018 05:15Best Answer

1. SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD

2. RE: SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD

3. RE: SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD

4. RE: SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD

5. RE: SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD Best Answer

5. RE: SRX220H2 - Cluster Issues (secondary node flapping): High CPU JSRPD
Best Answer