SRX Services Gateway
Highlighted
SRX Services Gateway

Redundancy Group - would inactive node's interface failures cause a failover?

[ Edited ]
‎06-28-2016 12:32 PM

redundancy-group 1 {
   node 0 priority 100;
   node 1 priority 1;
   preempt;
   interface-monitor {
      ge-0/0/0 weight 128;
      ge-0/0/1 weight 128;
      ge-5/0/0 weight 128;
      ge-5/0/1 weight 128;
   }
}

 

Pretty straightforward.  But here is one thing I don't get and I've never seen explicitly explained.  And I've never tested this, myself.

 

If redundancy-group 1 is active on node 0 and node 1 interfaces ge-5/0/0 and ge-5/0/1 fail would that cause a failover from node 0 to node 1?

 

Normally you'd only want failover when the monitored interfaces on the current active node's interfaces fail, right?  We don't care what happens to the interfaces on the other node.  Is this inherently how interface monitoring and redundancy group failover works?  If not, how to configure for that?

 

I *think* the answer is that when ge-5/0/0 and/or ge-5/0/1 fail, node 1 priority drops to zero so the redundancy group cannot fail over to node 1, despite the weight threshold being met.

 

Where this might be practical is with more interfaces:

 

redundancy-group 2 {
   node 0 priority 100;
   node 1 priority 1;
   preempt;
   interface-monitor {
      ge-0/0/4 weight 85;
      ge-0/0/5 weight 85;

      ge-0/0/6 weight 85;
      ge-0/0/7 weight 85;

      ge-5/0/4 weight 85;
      ge-5/0/5 weight 85;

      ge-5/0/6 weight 85;
      ge-5/0/7 weight 85;


   }
}

 

In this case assume ge 4-7 on each SRX is link aggregated to the same switch.  So node 0, 4-7 -> reth2 (lacp) -> Switch-A and Node 1, 4-7 -> reth2 (lacp) -> Switch-B.  Reth2 is attached to redundancy group 2.

 

If three ports in the lacp group for reth 2 fail, redundancy group 2 will fail over.  But here again, if three failed ports on are on node ONE while the RG is active on node 0, would THAT cause node 1 priority to fall to zero, or would the RG fail over to node 1, having three of it's four ports down?

 

Hopefully this is clear, what I mean.

 

Bottom line is, I'd not want failures of interface monitoring on the passive/inactive node for a RG to cause RG to fail to that node.

 

I guess the root of my question is a that I need a thorough technical understanding of how interface monitoring failures affect node priority, if that is the relevant mechanism here.

4 REPLIES 4
Highlighted
SRX Services Gateway
Solution
Accepted by topic author david.hostetter@tdcemail.com
‎06-29-2016 11:03 AM

Re: Redundancy Group - would inactive node's interface failures cause a failover?

[ Edited ]
‎06-28-2016 01:55 PM

Hi,

 

This document would help answer the questions:

http://kb.juniper.net/library/CUSTOMERSERVICE/GLOBAL_JTAC/NT260/SRX_HA_Deployment_Guide.pdf

 

With interface monitoring, the redundancy group has a default threshold of 255 [hardcoded]. Whenever an interface fails, the threshold gets reduced by the link weight. E.g in the redundancy group 1, if ge-0/0/0 fails, new threshold =  127 [255-128]. If the ge-0/0/1 also fails, threshold reaches 0, which makes node 0 priority to become 0 [ineligible], thus triggering redundancy group 1 failover.

 

If redundancy-group 1 is active on node 0 and node 1 interfaces ge-5/0/0 and ge-5/0/1 fail would that cause a failover from node 0 to node 1?

--> Node 1's priority becomes 0 due to its threshold of 0 being reached. Node 1 is then ineligible for failover in the cluster. This cannot cause failover from node0 to node1 for redundancy group 1.

Even if node 0's interfaces fail, node 1 with priority being 0 would still be ineligible for failover. This would result in complete reth1 failure.

 

Hope this helps.

 

Cheers,

Ashvin

 

Highlighted
SRX Services Gateway

Re: Redundancy Group - would inactive node's interface failures cause a failover?

‎06-29-2016 11:09 AM

Ashvin,

 

This is the concept I was lacking:  "Node 1's priority becomes 0 due to its threshold of 0 being reached"

 

If that is the case, then the situation is very clear.  However I cannot find this documented anywhere.  Even in the document you linked, redundancy group is described as having one threshold, not each node having a separate redundancy group threshold.

 

So, I am still left with a question of if the interfaces in a redundancy group on the currently secondary node fail, wouldn't their weights be subtracted from the redundancy group and if threshold of zero is reached for the redundancy group, cause a failover to that secondary node, whose interfaces have failed?

 

I am going to test this now and see which way it works.

Highlighted
SRX Services Gateway

Re: Redundancy Group - would inactive node's interface failures cause a failover?

[ Edited ]
‎06-29-2016 11:39 AM

Ok, it works as you have described.  Clearly in the 'show chassis cluster information', each redundancy group has a separate weight per node.

 

Maybe I am exhibiting poor reading comprehension but I have read a number of junos documents over the years and this has escaped me.  All our testing is always failing the currently primary node to secondary, I never considered what would happen to redundancy group if secondary node interfaces failed.

 

All this is maybe esoteric until one considers assigning weights.  The thing to keep in mind is you are assigning weights PER NODE to subtract from 255.  So back to my original example:

 

      ge-0/0/0 weight 128;
      ge-0/0/1 weight 128;
      ge-5/0/0 weight 128;
      ge-5/0/1 weight 128;

 

One might think that if ge-0/0/0 and ge-5/0/0 both failed, subtract those weights from 255 and this would cause a failover.  But that would not make any sense in the real world.

 

I understand now that it is 255 per redundancy group per chassis/node and you subtract a failed interface's weight from the weight of that interface's chassis's redundancy group instance.

 

And the idea that a redundancy group instance on a node reaching weight of zero, merely changes that node's priority for that redundancy group to zero and it is the change in priority weights that cause the failover, not the redundancy group per se, that is kind of subtle and something I hadn't seen documented before.  Thanks for the link, I had not seen that document previously.

 

This all makes perfect sense.  Thanks again!

 

 

 

Highlighted
SRX Services Gateway

Re: Redundancy Group - would inactive node's interface failures cause a failover?

‎06-29-2016 12:16 PM

Hi,

 

Yes, each node has a threshold of 255 for the redundancy group.

It is important when assigning weights, that the threshold gets to 0 whenever failover is desired as you rightly said:

"The thing to keep in mind is you are assigning weights PER NODE to subtract from 255"

 

With that in mind, you could also cause a failover on failure of a single interface if that interface had a weight > 255.

For example:

 

      ge-0/0/0 weight 256;
      ge-0/0/1 weight 128;
      ge-5/0/0 weight 128;
      ge-5/0/1 weight 128;

 

For argument's sake, assume you know that interface ge-0/0/1 operates at a speed of 100M and all the other interface operate at full 1G.  By setting weight of 255 for ge-0/0/0, should the latter fail the reth group would failover to the other node and save against any capacity issues.

 

Cheers,

Ashvin