SRX Services Gateway
Highlighted
SRX Services Gateway

Has trouble with failover on SRX650-cluster

‎05-03-2018 09:58 PM

Hello!

 

I have cluster on SRX650 in mode active-standby.

Manual management of failover works fine.

But if i detach interfaces physically - i have a problem.

Preemption is off.

Example: node0 - primary, node1 - secondary.

When i detach patchcords from node0 then RG-1 move to node1 and all works fine.

Then i back the patchcords. (preemption is off)

And now i have some stranges simptomes:

>show chassis cluster information

is displaying follow information from node1 :

Chassis cluster LED information:
    Current LED color: Amber
    Last LED change reason: Monitored objects are down
Control port tagging:
    Disabled

 

But all interfaces are UP.

And on that time i have core-dump ksyncd_1734.core.1525378097.0 in directory /var/tmp/

If after that i detach patchcords from node1 then failover does not work.

What's happening?

Is my SRX node 1 broken?

---

> show version                        

node0:
--------------------------------------------------------------------------
Hostname: ro-324.srx-cluster.colo
Model: srx650
JUNOS Software Release [12.3X48-D65.1]

node1:
--------------------------------------------------------------------------
Hostname: ro-352.srx-cluster.colo
Model: srx650
JUNOS Software Release [12.3X48-D65.1]

> show chassis cluster status

Cluster ID: 1
Node   Priority Status         Preempt Manual   Monitor-failures

Redundancy group: 0 , Failover count: 3
node0  100      primary        no      no       None           
node1  200      secondary      no      no       None           

Redundancy group: 1 , Failover count: 7
node0  100      primary        no      no       None           
node1  200      secondary      no      no       None           

 

> show chassis cluster interfaces 

Control link status: Up

Control interfaces: 
    Index   Interface   Monitored-Status   Internal-SA
    0       fxp1        Up                 Disabled   

Fabric link status: Up

Fabric interfaces: 
    Name    Child-interface    Status
                               (Physical/Monitored)
    fab0    ge-0/0/2           Up   / Up  
    fab0    ge-0/0/3           Up   / Up  
    fab1    ge-9/0/2           Up   / Up  
    fab1    ge-9/0/3           Up   / Up  

Redundant-ethernet Information:     
    Name         Status      Redundancy-group
    reth0        Up          1                
    reth1        Up          1                
   
Redundant-pseudo-interface Information:
    Name         Status      Redundancy-group
    lo0          Up          0                

Interface Monitoring:
    Interface         Weight    Status    Redundancy-group
    ge-2/0/15         255       Up        1   
    ge-11/0/15        255       Up        1   
    ge-11/0/1         130       Up        1   
    ge-11/0/0         130       Up        1   
    ge-2/0/1          130       Up        1   
    ge-2/0/0          130       Up        1   

 

8 REPLIES 8
SRX Services Gateway

Re: Has trouble with failover on SRX650-cluster

‎05-03-2018 11:09 PM

Cluster status looks fine, interfaces are confiogured with weight 130, so you need to plugout 2 interfaces to initiate failover.

Thanks,
Suraj
Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too
SRX Services Gateway

Re: Has trouble with failover on SRX650-cluster

‎05-04-2018 12:44 AM

Hi,

 

Of course, I detached both patchcords (from node0).
And migration RG-1 from node0 to node1 is work.
Then i back patchcords to node1.
But if i detach patchcords from node1 - RG-1 does not migrate to node0.

 

--

Regards,
Valery


SRX Services Gateway

Re: Has trouble with failover on SRX650-cluster

‎05-04-2018 12:45 AM

Sorry 
in last reply string "Then i back patchcords to node1." needs read as "Then i back patchcords to node0."

SRX Services Gateway

Re: Has trouble with failover on SRX650-cluster

‎05-04-2018 12:49 AM
Can you share the cluster status and cluster information in below stages.


1. All cables plugged in
2. Cables removed from Node 0
3. Cables plugged back in Node 0
4. Cables removed from Node 1
Thanks,
Suraj
Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too
SRX Services Gateway

Re: Has trouble with failover on SRX650-cluster

‎05-04-2018 12:56 AM

Hi,

 

The stages 1-3 works fine.
But on the stage 4 status does not change.

 

After stage 2 claster has following information on the node1:

Chassis cluster LED information:
    Current LED color: Amber
    Last LED change reason: Monitored objects are down
Control port tagging:
    Disabled

This status still keeps after stage 3. And that is strange. 

 

--

Regards,

Valery

SRX Services Gateway

Re: Has trouble with failover on SRX650-cluster

‎05-04-2018 01:54 AM

Can you share the cluster status and cluster information from Stage 3

Thanks,
Suraj
Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too
SRX Services Gateway

Re: Has trouble with failover on SRX650-cluster

‎05-04-2018 07:56 AM

Hi, rsuraj !

> show chassis cluster status 

Spoiler
Monitor Failure codes:
CS Cold Sync monitoring FL Fabric Connection monitoring
GR GRES monitoring HW Hardware monitoring
IF Interface monitoring IP IP monitoring
LB Loopback monitoring MB Mbuf monitoring
NH Nexthop monitoring NP NPC monitoring
SP SPU monitoring SM Schedule monitoring
CF Config Sync monitoring RE Relinquish monitoring

Cluster ID: 1
Node Priority Status Preempt Manual Monitor-failures

Redundancy group: 0 , Failover count: 3
node0 100 primary no no None
node1 200 secondary no no None

Redundancy group: 1 , Failover count: 7
node0 100 primary no no None
node1 200 secondary no no None

> show chassis cluster interfaces 

 

Spoiler
Control link status: Up

Control interfaces:
Index Interface Monitored-Status Internal-SA
0 fxp1 Up Disabled

Fabric link status: Up

Fabric interfaces:
Name Child-interface Status
(Physical/Monitored)
fab0 ge-0/0/2 Up / Up
fab0 ge-0/0/3 Up / Up
fab1 ge-9/0/2 Up / Up
fab1 ge-9/0/3 Up / Up

Redundant-ethernet Information:
Name Status Redundancy-group
reth0 Up 1
reth1 Up 1

Redundant-pseudo-interface Information:
Name Status Redundancy-group
lo0 Up 0

Interface Monitoring:
Interface Weight Status Redundancy-group
ge-2/0/15 255 Up 1
ge-11/0/15 255 Up 1
ge-11/0/1 130 Up 1
ge-11/0/0 130 Up 1
ge-2/0/1 130 Up 1
ge-2/0/0 130 Up 1  

> show chassis cluster information 

 

Spoiler
node0:
--------------------------------------------------------------------------
Redundancy Group Information:

Redundancy Group 0 , Current State: primary, Weight: 255

Time From To Reason
Apr 30 21:01:32 hold secondary Hold timer expired
Apr 30 21:01:48 secondary primary Only node present
May 2 19:41:49 primary secondary-hold Manual failover
May 2 19:46:49 secondary-hold secondary Ready to become secondary
May 2 20:15:29 secondary primary Remote is in secondary hold

Redundancy Group 1 , Current State: primary, Weight: 255

Time From To Reason
May 2 20:41:58 primary secondary-hold Monitor failed: IF
May 2 20:41:59 secondary-hold secondary Ready to become secondary
May 3 10:35:36 secondary primary Remote is in secondary hold
May 3 20:07:06 primary secondary-hold Monitor failed: IF
May 3 20:07:07 secondary-hold secondary Ready to become secondary
May 3 20:16:30 secondary primary Remote is in secondary hold

Chassis cluster LED information:
Current LED color: Green
Last LED change reason: No failures
Control port tagging:
Disabled

node1:
--------------------------------------------------------------------------
Redundancy Group Information:

Redundancy Group 0 , Current State: secondary, Weight: 255

Time From To Reason
May 3 13:42:22 hold secondary Hold timer expired

Redundancy Group 1 , Current State: secondary, Weight: 255

Time From To Reason
May 3 13:42:22 hold secondary Hold timer expired
May 3 20:07:06 secondary primary Remote is in secondary hold
May 3 20:16:30 primary secondary-hold Manual failover
May 3 20:16:31 secondary-hold secondary Ready to become secondary

Chassis cluster LED information:
Current LED color: Amber
Last LED change reason: Monitored objects are down
Control port tagging:
Disabled

> show configuration chassis 

 

Spoiler
aggregated-devices {
ethernet {
device-count 2;
}
}
cluster {
control-link-recovery;
reth-count 2;
redundancy-group 0 {
node 0 priority 100;
node 1 priority 200;
}
redundancy-group 1 {
node 0 priority 100;
node 1 priority 200;
inactive: preempt;
interface-monitor {
ge-2/0/0 weight 130;
ge-2/0/1 weight 130;
ge-11/0/0 weight 130;
ge-11/0/1 weight 130;
ge-11/0/15 weight 255;
ge-2/0/15 weight 255;
}
}
}

> show chassis cluster information issu 

Spoiler
node0:
--------------------------------------------------------------------------

Cold Synchronization Progress:
CS Prereq 1 of 1 SPUs completed
1. if_state sync 1 SPUs completed
2. fabric link 1 SPUs completed
3. policy data sync 1 SPUs completed
4. cp ready 1 SPUs completed
5. VPN data sync 1 SPUs completed
6. Dynamic addr sync 1 SPUs completed
CS RTO sync 1 of 1 SPUs completed
CS Postreq 1 of 1 SPUs completed

node1:
--------------------------------------------------------------------------

Cold Synchronization Progress:
CS Prereq 1 of 1 SPUs completed
1. if_state sync 1 SPUs completed
2. fabric link 1 SPUs completed
3. policy data sync 1 SPUs completed
4. cp ready 1 SPUs completed
5. VPN data sync 1 SPUs completed
6. Dynamic addr sync 1 SPUs completed
CS RTO sync 1 of 1 SPUs completed
CS Postreq 1 of 1 SPUs completed

> show chassis cluster statistics 

Spoiler
Control link statistics:
Control link 0:
Heartbeat packets sent: 323261
Heartbeat packets received: 322363
Heartbeat packet errors: 0
Fabric link statistics:
Child link 0
Probes sent: 647071
Probes received: 645236
Child link 1
Probes sent: 647071
Probes received: 645236
Services Synchronized:
Service name RTOs sent RTOs received
Translation context 0 0
Incoming NAT 0 0
Resource manager 0 0
DS-LITE create 0 0
Session create 263048 47619
IPv6 session create 0 0
Session close 118682 22879
IPv6 session close 0 0
Session change 39 16
IPv6 session change 0 0
ALG Support Library 0 0
Gate create 0 0
Session ageout refresh requests 68 680
IPv6 session ageout refresh requests 0 0
Session ageout refresh replies 656 62
IPv6 session ageout refresh replies 0 0
IPSec VPN 0 0
Firewall user authentication 0 0
MGCP ALG 0 0
H323 ALG 0 0
SIP ALG 0 0
SCCP ALG 0 0
PPTP ALG 0 0
JSF PPTP ALG 0 0
RPC ALG 0 0
RTSP ALG 0 0
RAS ALG 0 0
MAC address learning 0 0
GPRS GTP 0 0
GPRS SCTP 0 0
GPRS FRAMEWORK 0 0
JSF RTSP ALG 0 0
JSF SUNRPC MAP 0 0
JSF MSRPC MAP 0 0
DS-LITE delete 0 0
JSF SLB 0 0
APPID 0 0
JSF MGCP MAP 0 0
JSF H323 ALG 0 0
JSF RAS ALG 0 0
JSF SCCP MAP 0 0
JSF SIP MAP 0 0
PST_NAT_CREATE 0 0
PST_NAT_CLOSE 0 0
PST_NAT_UPDATE 0 0
JSF TCP STACK 0 0
JSF IKE ALG 0 0

> show version 

Spoiler
node0:
--------------------------------------------------------------------------
Hostname: ro-324.srx-cluster.colo
Model: srx650
JUNOS Software Release [12.3X48-D65.1]

node1:
--------------------------------------------------------------------------
Hostname: ro-352.srx-cluster.colo
Model: srx650
JUNOS Software Release [12.3X48-D65.1]

 

SRX Services Gateway

Re: Has trouble with failover on SRX650-cluster

‎05-15-2018 02:44 AM
Hi!
 
Does anyone have any ideas?
My other two SRX-clusters does not have the same problems.
I'm inclined to think that this is a hardware problem. But I do not know how to diagnose it.
 
--
Regards,
Valery M.