SRX Services Gateway
Highlighted
SRX Services Gateway

JSRPD_SET_HW_MON_FAILURE : hw-mon failed for redundancy-group 1

‎02-16-2017 12:02 PM

Dear Expert,

 

Anybody already experienced that issue. I ave two SRX1400 in cluster and All of the sudden, I got below logs.

Anybody having an idea in what is going on ?

 

JSRPD

============
XXX xx xx:xx:xx Received hw-mon error: RG-0 errors:0 RG-1 errors:1
XXX xx xx:xx:xx RG0 errors: 0 setting RG0 hw-mon weight to 0
XXX xx xx:xx:xx RG1 errors: 1 setting RG1 hw-mon weight to 255
XXX xx xx:xx:xx hw-mon errors detected, suspending fabric monitoring.
XXX xx xx:xx:xx Current threshold for rg-0 is 255. Failures: none
XXX xx xx:xx:xx hw-mon failure, computed-weight 0, hw-mon-weight 255
XXX xx xx:xx:xx Current threshold for rg-1 is 0. Setting priority to 0. Failures: hardware-monitoring
XXX xx xx:xx:xx Both the nodes are primary. RG-1 PRIMARY->SECONDARY_HOLD due to preempt/yield, my priority 0 is worse than other node's priority 100
XXX xx xx:xx:xx Successfully sent an snmp-trap due to a failover from primary to secondary-hold on RG-1 on cluster 1 node 0. Reason: Monitor failed: HW
XXX xx xx:xx:xx updated rg_info for RG-1 with failover-cnt 22 state: secondary-hold into ssam. Result = success, error: 0
XXX xx xx:xx:xx reth0 ifd state changed from node0-primary -> node1-primary for RG-1
XXX xx xx:xx:xx reth1 ifd state changed from node0-primary -> node1-primary for RG-1

 

Messages logs

================

XXX xx xx:xx:xx  srx1400 node0.cpp0 fpc_slot 3 pfe 1 (chan_id 2) plane 0 hsl2 link error.
XXX xx xx:xx:xx  srx1400 node0.cpp0 CMALARM: Error (code: 1281, type:Major) encountered, cmalarm_passive_alarm_signal
XXX xx xx:xx:xx  srx1400 alarmd[1210]: Alarm set: FPC color=RED, class=CHASSIS, reason=FPC 3 Major Errors
XXX xx xx:xx:xx  srx1400 craftd[1211]:  Major alarm set, FPC 3 Major Errors
XXX xx xx:xx:xx  srx1400 jsrpd[1242]: JSRPD_SET_HW_MON_FAILURE: hw-mon failed for redundancy-group 1
XXX xx xx:xx:xx  srx1400 node0.cpp0 CMALARM: Error (code: 1281, type:Major) encountered, cmalarm_passive_alarm_signal
XXX xx xx:xx:xx  srx1400 jsrpd[1242]: JSRPD_RG_STATE_CHANGE: Redundancy-group 1 transitioned from 'primary' to 'secondary-hold' state due to Monitor failed: HW
XXX xx xx:xx:xx  srx1400 node0.fpc1.pic0 ha_ssam_handler: Updating the primary_node_id(2) for rg id(1) using ssam blob for node1
XXX xx xx:xx:xx  srx1400 jsrpd[1242]: JSRPD_RG_STATE_CHANGE: Redundancy-group 1 transitioned from 'secondary-hold' to 'secondary' state due to Back to back failover interval expired

 

=======

Cluster status

==============

 

Cluster ID: 1
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
    node0                   250         primary        no       no  
    node1                   100         secondary      no       no  

Redundancy group: 1 , Failover count: 24
    node0                   0           secondary      yes      no  
    node1                   100         primary        yes      no 

 

 

Control link status: Up

Control interfaces:
    Index   Interface        Status
    0       em0              Up    
    1       em1              Up

Fabric link status: Up

Fabric interfaces:
    Name    Child-interface    Status
                               (Physical/Monitored)
    fab0    ge-0/0/8           Up   / Up  
    fab0   
    fab1    ge-4/0/8           Up   / Up  
    fab1   

Redundant-ethernet Information:     
    Name         Status      Redundancy-group
    reth0        Up          1                
    reth1        Up          1                
   
Redundant-pseudo-interface Information:
    Name         Status      Redundancy-group
    lo0          Up          0                

Interface Monitoring:
    Interface         Weight    Status    Redundancy-group
    ge-4/0/3          255       Up        1   
    ge-4/0/2          255       Up        1   
    ge-0/0/3          255       Up        1   
    ge-0/0/2          255       Up        1   
    ge-4/0/1          255       Up        1   
    ge-4/0/0          255       Up        1   
    ge-0/0/1          255       Up        1   
    ge-0/0/0          255       Up        1  

1 REPLY 1
Highlighted
SRX Services Gateway

Re: JSRPD_SET_HW_MON_FAILURE : hw-mon failed for redundancy-group 1

‎02-16-2017 07:53 PM

Hello ,

 

This seems like the FPC have a Major error reported and HW monitoring was down which caused the failover . Please check for any core dumps :

 

> show system core-dump

 

If you see any core during the issue we need to get this investigated with JTAC . Also if you do not see any core , its good to open a ticket with JTAC  and get this investigated in detail . It may be also due to software issue .


Thanks,
Sam

Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too .....
Feedback