Dear Expert,
Anybody already experienced that issue. I ave two SRX1400 in cluster and All of the sudden, I got below logs.
Anybody having an idea in what is going on ?
JSRPD
============
XXX xx xx:xx:xx Received hw-mon error: RG-0 errors:0 RG-1 errors:1
XXX xx xx:xx:xx RG0 errors: 0 setting RG0 hw-mon weight to 0
XXX xx xx:xx:xx RG1 errors: 1 setting RG1 hw-mon weight to 255
XXX xx xx:xx:xx hw-mon errors detected, suspending fabric monitoring.
XXX xx xx:xx:xx Current threshold for rg-0 is 255. Failures: none
XXX xx xx:xx:xx hw-mon failure, computed-weight 0, hw-mon-weight 255
XXX xx xx:xx:xx Current threshold for rg-1 is 0. Setting priority to 0. Failures: hardware-monitoring
XXX xx xx:xx:xx Both the nodes are primary. RG-1 PRIMARY->SECONDARY_HOLD due to preempt/yield, my priority 0 is worse than other node's priority 100
XXX xx xx:xx:xx Successfully sent an snmp-trap due to a failover from primary to secondary-hold on RG-1 on cluster 1 node 0. Reason: Monitor failed: HW
XXX xx xx:xx:xx updated rg_info for RG-1 with failover-cnt 22 state: secondary-hold into ssam. Result = success, error: 0
XXX xx xx:xx:xx reth0 ifd state changed from node0-primary -> node1-primary for RG-1
XXX xx xx:xx:xx reth1 ifd state changed from node0-primary -> node1-primary for RG-1
Messages logs
================
XXX xx xx:xx:xx srx1400 node0.cpp0 fpc_slot 3 pfe 1 (chan_id 2) plane 0 hsl2 link error.
XXX xx xx:xx:xx srx1400 node0.cpp0 CMALARM: Error (code: 1281, type:Major) encountered, cmalarm_passive_alarm_signal
XXX xx xx:xx:xx srx1400 alarmd[1210]: Alarm set: FPC color=RED, class=CHASSIS, reason=FPC 3 Major Errors
XXX xx xx:xx:xx srx1400 craftd[1211]: Major alarm set, FPC 3 Major Errors
XXX xx xx:xx:xx srx1400 jsrpd[1242]: JSRPD_SET_HW_MON_FAILURE: hw-mon failed for redundancy-group 1
XXX xx xx:xx:xx srx1400 node0.cpp0 CMALARM: Error (code: 1281, type:Major) encountered, cmalarm_passive_alarm_signal
XXX xx xx:xx:xx srx1400 jsrpd[1242]: JSRPD_RG_STATE_CHANGE: Redundancy-group 1 transitioned from 'primary' to 'secondary-hold' state due to Monitor failed: HW
XXX xx xx:xx:xx srx1400 node0.fpc1.pic0 ha_ssam_handler: Updating the primary_node_id(2) for rg id(1) using ssam blob for node1
XXX xx xx:xx:xx srx1400 jsrpd[1242]: JSRPD_RG_STATE_CHANGE: Redundancy-group 1 transitioned from 'secondary-hold' to 'secondary' state due to Back to back failover interval expired
=======
Cluster status
==============
Cluster ID: 1
Node Priority Status Preempt Manual failover
Redundancy group: 0 , Failover count: 1
node0 250 primary no no
node1 100 secondary no no
Redundancy group: 1 , Failover count: 24
node0 0 secondary yes no
node1 100 primary yes no
Control link status: Up
Control interfaces:
Index Interface Status
0 em0 Up
1 em1 Up
Fabric link status: Up
Fabric interfaces:
Name Child-interface Status
(Physical/Monitored)
fab0 ge-0/0/8 Up / Up
fab0
fab1 ge-4/0/8 Up / Up
fab1
Redundant-ethernet Information:
Name Status Redundancy-group
reth0 Up 1
reth1 Up 1
Redundant-pseudo-interface Information:
Name Status Redundancy-group
lo0 Up 0
Interface Monitoring:
Interface Weight Status Redundancy-group
ge-4/0/3 255 Up 1
ge-4/0/2 255 Up 1
ge-0/0/3 255 Up 1
ge-0/0/2 255 Up 1
ge-4/0/1 255 Up 1
ge-4/0/0 255 Up 1
ge-0/0/1 255 Up 1
ge-0/0/0 255 Up 1