SRX

last person joined: yesterday 

Ask questions and share experiences about the SRX Series, vSRX, and cSRX.
  • 1.  JSRPD_SET_HW_MON_FAILURE : hw-mon failed for redundancy-group 1

    Posted 02-16-2017 12:02

    Dear Expert,

     

    Anybody already experienced that issue. I ave two SRX1400 in cluster and All of the sudden, I got below logs.

    Anybody having an idea in what is going on ?

     

    JSRPD

    ============
    XXX xx xx:xx:xx Received hw-mon error: RG-0 errors:0 RG-1 errors:1
    XXX xx xx:xx:xx RG0 errors: 0 setting RG0 hw-mon weight to 0
    XXX xx xx:xx:xx RG1 errors: 1 setting RG1 hw-mon weight to 255
    XXX xx xx:xx:xx hw-mon errors detected, suspending fabric monitoring.
    XXX xx xx:xx:xx Current threshold for rg-0 is 255. Failures: none
    XXX xx xx:xx:xx hw-mon failure, computed-weight 0, hw-mon-weight 255
    XXX xx xx:xx:xx Current threshold for rg-1 is 0. Setting priority to 0. Failures: hardware-monitoring
    XXX xx xx:xx:xx Both the nodes are primary. RG-1 PRIMARY->SECONDARY_HOLD due to preempt/yield, my priority 0 is worse than other node's priority 100
    XXX xx xx:xx:xx Successfully sent an snmp-trap due to a failover from primary to secondary-hold on RG-1 on cluster 1 node 0. Reason: Monitor failed: HW
    XXX xx xx:xx:xx updated rg_info for RG-1 with failover-cnt 22 state: secondary-hold into ssam. Result = success, error: 0
    XXX xx xx:xx:xx reth0 ifd state changed from node0-primary -> node1-primary for RG-1
    XXX xx xx:xx:xx reth1 ifd state changed from node0-primary -> node1-primary for RG-1

     

    Messages logs

    ================

    XXX xx xx:xx:xx  srx1400 node0.cpp0 fpc_slot 3 pfe 1 (chan_id 2) plane 0 hsl2 link error.
    XXX xx xx:xx:xx  srx1400 node0.cpp0 CMALARM: Error (code: 1281, type:Major) encountered, cmalarm_passive_alarm_signal
    XXX xx xx:xx:xx  srx1400 alarmd[1210]: Alarm set: FPC color=RED, class=CHASSIS, reason=FPC 3 Major Errors
    XXX xx xx:xx:xx  srx1400 craftd[1211]:  Major alarm set, FPC 3 Major Errors
    XXX xx xx:xx:xx  srx1400 jsrpd[1242]: JSRPD_SET_HW_MON_FAILURE: hw-mon failed for redundancy-group 1
    XXX xx xx:xx:xx  srx1400 node0.cpp0 CMALARM: Error (code: 1281, type:Major) encountered, cmalarm_passive_alarm_signal
    XXX xx xx:xx:xx  srx1400 jsrpd[1242]: JSRPD_RG_STATE_CHANGE: Redundancy-group 1 transitioned from 'primary' to 'secondary-hold' state due to Monitor failed: HW
    XXX xx xx:xx:xx  srx1400 node0.fpc1.pic0 ha_ssam_handler: Updating the primary_node_id(2) for rg id(1) using ssam blob for node1
    XXX xx xx:xx:xx  srx1400 jsrpd[1242]: JSRPD_RG_STATE_CHANGE: Redundancy-group 1 transitioned from 'secondary-hold' to 'secondary' state due to Back to back failover interval expired

     

    =======

    Cluster status

    ==============

     

    Cluster ID: 1
    Node                  Priority          Status    Preempt  Manual failover

    Redundancy group: 0 , Failover count: 1
        node0                   250         primary        no       no  
        node1                   100         secondary      no       no  

    Redundancy group: 1 , Failover count: 24
        node0                   0           secondary      yes      no  
        node1                   100         primary        yes      no 

     

     

    Control link status: Up

    Control interfaces:
        Index   Interface        Status
        0       em0              Up    
        1       em1              Up

    Fabric link status: Up

    Fabric interfaces:
        Name    Child-interface    Status
                                   (Physical/Monitored)
        fab0    ge-0/0/8           Up   / Up  
        fab0   
        fab1    ge-4/0/8           Up   / Up  
        fab1   

    Redundant-ethernet Information:     
        Name         Status      Redundancy-group
        reth0        Up          1                
        reth1        Up          1                
       
    Redundant-pseudo-interface Information:
        Name         Status      Redundancy-group
        lo0          Up          0                

    Interface Monitoring:
        Interface         Weight    Status    Redundancy-group
        ge-4/0/3          255       Up        1   
        ge-4/0/2          255       Up        1   
        ge-0/0/3          255       Up        1   
        ge-0/0/2          255       Up        1   
        ge-4/0/1          255       Up        1   
        ge-4/0/0          255       Up        1   
        ge-0/0/1          255       Up        1   
        ge-0/0/0          255       Up        1  



  • 2.  RE: JSRPD_SET_HW_MON_FAILURE : hw-mon failed for redundancy-group 1

     
    Posted 02-16-2017 19:54

    Hello ,

     

    This seems like the FPC have a Major error reported and HW monitoring was down which caused the failover . Please check for any core dumps :

     

    > show system core-dump

     

    If you see any core during the issue we need to get this investigated with JTAC . Also if you do not see any core , its good to open a ticket with JTAC  and get this investigated in detail . It may be also due to software issue .