SRX Services Gateway
SRX Services Gateway

Loosing Connectivity to SRX

‎08-04-2017 05:56 AM

Hi,

I have a very strange issue with SRX4100 configured with Active/Passive cluster.  The LED status is blinking red then goes green for a short while and goes blinking  red again.  I know that it is a non-critical alarm but the client wanted it to be addressed and i can't get it go away. Below is the cluster configuration.  Any help would be appreciated.

 

Do the switchports need to be aggregated to accommodate the child interfaces of the reth?  (No link aggreagation is currently configured on the switch) .

 

Thanks in advance.

 

root@MTMFW01> show configuration chassis cluster
reth-count 8;
redundancy-group 0 {
node 0 priority 100;
node 1 priority 1;
}
redundancy-group 1 {
node 0 priority 100;
node 1 priority 1;
}

********************************

{primary:node0}
root@MTMFW01> show chassis cluster status
Monitor Failure codes:
CS Cold Sync monitoring FL Fabric Connection monitoring
GR GRES monitoring HW Hardware monitoring
IF Interface monitoring IP IP monitoring
LB Loopback monitoring MB Mbuf monitoring
NH Nexthop monitoring NP NPC monitoring
SP SPU monitoring SM Schedule monitoring
CF Config Sync monitoring

Cluster ID: 1
Node Priority Status Preempt Manual Monitor-failures

Redundancy group: 0 , Failover count: 1
node0 100 primary no no None
node1 1 secondary no no None

Redundancy group: 1 , Failover count: 1
node0 100 primary no no None
node1 1 secondary no no None

{primary:node0}
root@MTMFW01>


**********************************

root@MTMFW01> show chassis cluster interfaces
Control link status: Up

Control interfaces:
Index Interface Monitored-Status Internal-SA Security
0 em0 Up Disabled Disabled

Fabric link status: Up

Fabric interfaces:
Name Child-interface Status Security
(Physical/Monitored)
fab0 xe-0/0/8 Up / Up Disabled
fab0
fab1 xe-7/0/8 Up / Up Disabled
fab1

Redundant-ethernet Information:
Name Status Redundancy-group
reth0 Up 1
reth1 Up 1
reth2 Up 1
reth3 Up 1
reth4 Up 1
reth5 Down Not configured
reth6 Down Not configured
reth7 Up 1

Redundant-pseudo-interface Information:
Name Status Redundancy-group
lo0 Up 0


*********************************

root@MTMFW01> show chassis cluster statistics
Control link statistics:
Control link 0:
Heartbeat packets sent: 188878
Heartbeat packets received: 186597
Heartbeat packet errors: 0
Fabric link statistics:
Child link 0
Probes sent: 374487
Probes received: 373725
Child link 1
Probes sent: 0
Probes received: 0
Services Synchronized:
Service name RTOs sent RTOs received
Translation context 0 0
Incoming NAT 0 0
Resource manager 0 0
DS-LITE create 0 0
Session create 73043 0
IPv6 session create 0 0
Session close 31738 0
IPv6 session close 0 0
Session change 2269 0
IPv6 session change 0 0
ALG Support Library 0 0
Gate create 0 0
Session ageout refresh requests 0 3267
IPv6 session ageout refresh requests 0 0
Session ageout refresh replies 3208 0
IPv6 session ageout refresh replies 0 0
IPSec VPN 0 0
Firewall user authentication 0 0
MGCP ALG 0 0
H323 ALG 0 0
SIP ALG 0 0
SCCP ALG 0 0
PPTP ALG 0 0
JSF PPTP ALG 0 0
RPC ALG 0 0
RTSP ALG 0 0
RAS ALG 0 0
MAC address learning 0 0
GPRS GTP 0 0
GPRS SCTP 0 0
GPRS FRAMEWORK 0 0
JSF RTSP ALG 0 0
JSF SUNRPC MAP 0 0
JSF MSRPC MAP 0 0
DS-LITE delete 0 0
JSF SLB 0 0
APPID 181 0
JSF MGCP MAP 0 0
JSF H323 ALG 0 0
JSF RAS ALG 0 0
JSF SCCP MAP 0 0
JSF SIP MAP 0 0
PST_NAT_CREATE 0 0
PST_NAT_CLOSE 0 0
PST_NAT_UPDATE 0 0
JSF TCP STACK 0 0
JSF IKE ALG 0 0

{primary:node0}


************************************************************

root@MTMFW01> show log jsrpd | last 100

Aug 4 08:31:30 printing fpc_num h4
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface reth4 is up
Aug 4 08:31:30 reth4 from jsrpd_ssam_reth_read reth_rg_id=1

Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-7/0/4 is up
Aug 4 08:31:30 printing fpc_num h5
Aug 4 08:31:30 Interface reth5 is going down
Aug 4 08:31:30 reth5 jsrpd not ready

Aug 4 08:31:30 Handle signal SIGCHLD
Aug 4 08:31:30 printing fpc_num h6
Aug 4 08:31:30 Interface reth6 is going down
Aug 4 08:31:30 reth6 jsrpd not ready

Aug 4 08:31:30 printing fpc_num h7
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface reth7 is up
Aug 4 08:31:30 reth7 from jsrpd_ssam_reth_read reth_rg_id=1

Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-0/0/0 is up
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-0/0/1 is up
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-0/0/2 is up
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-0/0/3 is up
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-0/0/4 is up
Aug 4 08:31:30 printing fpc_num 0
Aug 4 08:31:30 Interface xe-0/0/5 is going down
Aug 4 08:31:30 printing fpc_num 0
Aug 4 08:31:30 Interface xe-0/0/6 is going down
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-0/0/7 is up
Aug 4 08:31:30 printing fpc_num 0
Aug 4 08:31:30 fab0 child xe-0/0/8 is up
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-0/0/8 is up
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-7/0/0 is up
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-7/0/1 is up
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-7/0/2 is up
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-7/0/3 is up
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-7/0/4 is up
Aug 4 08:31:30 printing fpc_num 7
Aug 4 08:31:30 Interface xe-7/0/5 is going down
Aug 4 08:31:30 printing fpc_num 7
Aug 4 08:31:30 Interface xe-7/0/6 is going down
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-7/0/7 is up
Aug 4 08:31:30 printing fpc_num 7
Aug 4 08:31:30 fab1 child xe-7/0/8 is up
Aug 4 08:31:30 jsrpd_ifd_msg_handler: Interface xe-7/0/8 is up
Aug 4 11:50:13 ISSU state: 0
Aug 4 11:50:13 Error he.re.mcluster_ha_secure Sucess
Aug 4 11:50:26 ISSU state: 0
Aug 4 13:25:23 last message repeated 9 times
Aug 4 13:25:23 received SIGHUP, pid 1516
Aug 4 13:25:23 received SIGHUP - re-reading configuration, pid 1516
Aug 4 13:25:23 successfully set default traceoptions cfg
Aug 4 13:25:23 reading the cluster part of the config
Aug 4 13:25:23 reading the cluster member list
Aug 4 13:25:23 reading the cluster attributes
Aug 4 13:25:23 initial hold set to: 30
Aug 4 13:25:23 hardware monitoring is enabled
Aug 4 13:25:23 fabric monitoring is enabled
Aug 4 13:25:23 RG-0 failover for HW errors is enabled
Aug 4 13:25:23 schedule monitoring is disabled
Aug 4 13:25:23 Failover for loopback error is disabled
Aug 4 13:25:23 Failover for fabric nexthop error is disabled
Aug 4 13:25:23 Failover for mbuf error is disabled
Aug 4 13:25:23 data plane mode is active-active
Aug 4 13:25:23 fwdd monitoring is disabled
Aug 4 13:25:23 fabric time out is set to 0
Aug 4 13:25:23 control link recovery is disabled
Aug 4 13:25:23 ha-config-sync: feature knob is not set. Default to enabled
Aug 4 13:25:23 deleting rd ifd6 from ssam. Result = failed, 2
Aug 4 13:25:23 deleting rd ifd0 from ssam. Result = failed, 2
Aug 4 13:25:23 last message repeated 30 times
Aug 4 13:25:23 Current threshold for rg-1 is 255. Failures: none
Aug 4 13:25:23 Successfully updated GARP count for RG-1 (count 4) in to SSAM
Aug 4 13:25:23 Setting hold-down interval to 1 for RG-1
Aug 4 13:25:23 Set IP monitoring global weight to 0 global threshold to 0 for rg-1
Aug 4 13:25:23 Set IP monitoring retry interval to 0 retry count to 0 for rg-1
Aug 4 13:25:23 All global IP monitoring parameters are set to 0 because all IPs are deleted for rg-1
Aug 4 13:25:23 Current threshold for rg-1 is 255. Failures: none
Aug 4 13:25:23 Ctrl-link (1) timer started
Aug 4 13:25:23 Current threshold for rg-0 is 255. Failures: none
Aug 4 13:25:23 Current threshold for rg-1 is 255. Failures: none
Aug 4 13:28:50 ISSU state: 0
Aug 4 15:11:51 Error he.re.mcluster_ha_secure Sucess
Aug 4 15:12:41 TLV : RG_INFO
Aug 4 15:12:41 TLV send counter 188878
Aug 4 15:12:41 TLV last send Fri Aug 4 15:12:41 2017

Aug 4 15:12:41 TLV recv counter 373190
Aug 4 15:12:41 TLV last recv Fri Aug 4 15:12:41 2017

Aug 4 15:12:41 TLV RG MONITOR_OBJECT send counter 377756
Aug 4 15:12:41 TLV RG MONITOR_OBJECT recv counter 373190
Aug 4 15:12:41 TLV RG MONITOR_OBJECT err counter 0
Aug 4 15:12:41 TLV RG RG_WEIGHT send counter 377756
Aug 4 15:12:41 TLV RG RG_WEIGHT recv counter 0
Aug 4 15:12:41 TLV RG RG_WEIGHT err counter 373190
Aug 4 15:12:41 RG-0 weight :255 Remote weight 255
Aug 4 15:12:41 RG-1 weight :255 Remote weight 255

{primary:node0}

 

 

 

3 REPLIES 3
SRX Services Gateway

Re: Loosing Connectivity to SRX

‎08-04-2017 07:52 AM

I suggest to start with

show chassis cluster information

Look for Last LED change reason. With this information we can troubleshoot further.

Reth child interfaces must be aggregated on the switch only when you use more then one child interface per node. For example if you use 4 child interfaces you would have to configure 2 LAGs on the switch. It's better explained in SRX HA Deployment Guide  https://kb.juniper.net/InfoCenter/index?page=content&id=TN260

 

Regards, Wojtek

SRX Services Gateway

Re: Loosing Connectivity to SRX

‎08-04-2017 06:49 PM

I will grab the cluster information when i go bcak to the site.

 

Thanks for confirming about link aggregation on the switch. 

SRX Services Gateway

Re: Loosing Connectivity to SRX

‎08-07-2017 05:33 PM

Below is the "show chassis cluster information". I hope someone can tell me what happening with the cluster.

 

root@MTMFW01> show chassis cluster information
node0:
--------------------------------------------------------------------------
Redundancy Group Information:

Redundancy Group 0 , Current State: primary, Weight: 255

Time From To Reason
Aug 2 17:02:23 hold secondary Hold timer expired
Aug 2 17:02:39 secondary primary Only node present

Redundancy Group 1 , Current State: primary, Weight: 255

Time From To Reason
Aug 2 17:02:26 hold secondary Hold timer expired
Aug 2 17:02:39 secondary primary Only node present

Chassis cluster LED information:
Current LED color: Green
Last LED change reason: No failures

node1:
--------------------------------------------------------------------------
Redundancy Group Information:

Redundancy Group 0 , Current State: secondary, Weight: 255

Time From To Reason
Aug 2 19:09:10 hold secondary Hold timer expired

Redundancy Group 1 , Current State: secondary, Weight: 255

Time From To Reason
Aug 2 19:09:10 hold secondary Hold timer expired

Chassis cluster LED information:
Current LED color: Green
Last LED change reason: No failures

{primary:node0}
root@MTMFW01>