SRX Services Gateway
Showing results for 
Search instead for 
Do you mean 
Reply
Contributor
Posts: 23
Registered: ‎06-20-2011
0 Kudos

SRX650 Chassis Cluster Failover

Hi

 

We have 2*SRX650 configured in a Chassi Cluster Mode where node0 was damaged due to overheating/shutdown, the traffic failed over to node1 and we brough out node0 for repair. The node0 unit has now been repaired and connected again in the cluster.

 

The output looks like this where node1 are still in production and managing the traffic as (primary)

 

operator@SRX650> show chassis cluster status                        

Cluster ID: 1
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
    node0                   200         secondary      no       no  
    node1                   100         primary        no       no  

Redundancy group: 1 , Failover count: 1
    node0                   200         secondary      no       no  
    node1                   100         primary        no       no  

 

{primary:node1}
operator@SRX650> show chassis cluster interfaces                  
Control link status: Up

Control interfaces:
    Index   Interface        Status
    0       fxp1             Up    

Fabric link status: Up

Fabric interfaces:
    Name    Child-interface    Status
                               (Physical/Monitored)
    fab0    ge-0/0/2           Up   / Up  
    fab0   
    fab1    ge-9/0/2           Up   / Up  
    fab1   

Redundant-ethernet Information:     
    Name         Status      Redundancy-group
    reth0        Up          1                
    reth1        Down        Not configured   
   
Redundant-pseudo-interface Information:
    Name         Status      Redundancy-group
    lo0          Up          0                

Interface Monitoring:
    Interface         Weight    Status    Redundancy-group
    xe-15/0/0         255       Up        1   
    xe-6/0/0          255       Up        1 

 

What is the correct order to make:

 

1) Manual Failover so node0 goes Primary and node1 goes secondary, it says in different documentation that you need to make a reset first within the redundancy-group 0 and 1 ? before you request failover on the redundancy group x node x

 

Please clarify

 

What is the correct order to check:

 

1) If node1 eventually stops due to damaged filesystem, power, etc how to make sure that the cluster fails over to node0 automatically without the need of manual involvement ?

 

----

 

Configuration information on the chassis setup

 

chassis {
    cluster {
        reth-count 2;
        redundancy-group 0 {
            node 0 priority 200;
            node 1 priority 100;
        }
        redundancy-group 1 {
            node 0 priority 200;
            node 1 priority 100;
            gratuitous-arp-count 5;
            interface-monitor {
                xe-6/0/0 weight 255;
                xe-15/0/0 weight 255;
            }
        }
    }
}
interfaces {
    xe-6/0/0 {
        gigether-options {
            redundant-parent reth0;
        }
    }
    xe-15/0/0 {
        gigether-options {
            redundant-parent reth0;
        }
    }
    fab0 {
        fabric-options {
            member-interfaces {
                ge-0/0/2;
            }
        }
    }
    fab1 {
        fabric-options {
            member-interfaces {         
                ge-9/0/2;
            }
        }
    }
    lo0 {
        unit 0 {
            family inet;
        }
    }
    reth0 {
        vlan-tagging;
        redundant-ether-options {
            redundancy-group 1;
        }
        unit 100

....

....

etc

MICDUF
Visitor
Posts: 9
Registered: ‎11-10-2016
0 Kudos

Re: SRX650 Chassis Cluster Failover

I believe the reset is just to clear the manual failover. You do not need to do a reset. The reset only applies if you did a manual failover. You did not, therefore the reset does not apply in your case. You can just do a request chassis cluster failover redundancy-group 0 node 0 and request chassis cluster failover redundancy-group 1 node 0 to get you back to the state you want.

Highlighted
Trusted Contributor
Posts: 87
Registered: ‎07-19-2016
0 Kudos

Re: SRX650 Chassis Cluster Failover

Hi MICDUF,

 

In this case you would need to reset the failover. Not before but after you have initiated the failover of both RGs. Because the moment you initiate a manual failover to Node 0 the priority of Node 1 will become 255. Failover reset is to bring the priority back to the configured value. 

 

So in you case:

 

request chassis cluster failover redundancy-group 0 node 0

request chassis cluster failover redundancy-group 1 node 0

 

request chassis cluster failover reset redundancy-group 0 

request chassis cluster failover reset redundancy-group 1

 

1) If node1 eventually stops due to damaged filesystem, power, etc how to make sure that the cluster fails over to node0 automatically without the need of manual involvement ?

 

-- Cluster failover will only occur if any of the monitored objects go down. Since filesystem/power etc are not monitored objects you cannot have an automatic failover by default. If by any chance, because of power issues, one of the line cards or the complete device itself shuts down/resets then there will be a failover. 

 

If you have to have automatic failover based on FS/Power failures you might want to dip your toes into custom event-scripts provided FS/power outages generate usable events. 

 

Regards,

Anand

[KUDOS PLEASE! If you think I earned it!
If this solution worked for you please flag my post as an "Accepted Solution" so others can benefit..]