SRX Services Gateway
Reply
Contributor
yheffen
Posts: 42
Registered: ‎09-03-2009
0

Dual ISP Failover via RPM

I have a site with connections two to ISPs. One link is considered primary, and one is pretty much (at this point) backup. Both are Ethernet links. The problem is that the most likely failure, and the kind we really want to address, is a loss of logical connectivity to the Internet. That is, when the Internet connectivty fails through ISP A on the primary link, the interface on the firewall isn't going to go down.

 

After doing some searches in the forums and knowledge base, it appears that using commit and event scripts combined with Real-Time Performane Monitoring (RPM) is the way to do this? For example,

 

http://forums.juniper.net/t5/SRX-Services-Gateway/SRX240-with-CX111-as-backup-WAN/m-p/34832/highligh...

 

That all is, however, a bit complicated and I'm not entirely clear about how it works and what options are available.

 

So for me, I'd like to specify two or three criteria to test connectivity to the Internet, say, check http://google.com and ICMP pings to another reliable address or two, and then only switch my default route to the backup when it has fewer failures on those checks than the primary.

 

Setting up RPMs is pretty easy (and pretty neat, I must say), but  how those are used by the event scripts discussed in the above post and documentation referenced therein... Not as clear. Anyone have this setup or more details on how to get this to work?

 

Or is there an entirely different and simpler way to do this?

Super Contributor
colemtb
Posts: 313
Registered: ‎09-30-2009
0
Contributor
yheffen
Posts: 42
Registered: ‎09-03-2009
0

Re: Dual ISP Failover via RPM

Thanks for the great lead. I've been trying to figure out how to go from the simple single ping test in the example to something more complicated. I want to have several ping tests and only fail over when a ping test fails on the primary link and is not failing on the backup.

 

I'm also not quite sure I have the logic of the example down. Seems like you're testing the availabilty of the remote side more than your own link. When the ping fails, it switches to the backup route and disables the primary interface. Then when the ping succeeds, it switches back... So, it seems like that if the primary link were to go bad, you would just start flapping. The test on the primary would fail, so you would go over to the backup connection. And once there, the ping test would succeed, since the remote side has been up this whole time and now we can reach it, so we go back to the primary connection... Which is still down, so the ping fails again and we go to the backup which is up so the ping succeeds so we go to the primary which is down so... Am I missing something there?

Super Contributor
colemtb
Posts: 313
Registered: ‎09-30-2009
0

Re: Dual ISP Failover via RPM

This script uses a source IP for RPM so...  The source of the PING will always be from your primary... Therefore, it will only have a successful RPM event when the primary fires back up.

 

At least that's how the CX script works.

Contributor
yheffen
Posts: 42
Registered: ‎09-03-2009
0

Re: Dual ISP Failover via RPM

Sorry, I was referring to the link you posted. In that case, the RPM configuration is,

 

 

services {
    rpm {
        probe icmp-ping-probe {
            test ping-probe-test {
                probe-type icmp-ping;
                target address 10.63.0.50; /* EDIT HERE */
                test-interval 60;
            }
        }
    }
}

 And in that example, the event script actually disables primary interface, so you wouldn't be able to test from it anyway.

 

 

Super Contributor
colemtb
Posts: 313
Registered: ‎09-30-2009
0

Re: Dual ISP Failover via RPM

Ah, here is the link to CX one.  If your primary is static, this one works just fine.]

 

If not, that's when you get into scripting to pull the interface IP address from the XML and putting that into your source for the pings from RPM Monitor.

 

www.juniper.net/support/products/cx/#sw

 

Documentation start on page 10 I believe.

 

http://www.juniper.net/us/en/local/pdf/app-notes/3500184-en.pdf

 

 

Contributor
yheffen
Posts: 42
Registered: ‎09-03-2009
0

Re: Dual ISP Failover via RPM

OK. Running into some bumps in the road here. I want to be able to monitor whether some IPs and URLs are available. RPM seems perfect for this. And if I want to monitor them through the ISP that is currently my default route, my RPM configuration works great.

 

However, I want to be able to monitor the availabilty of the sites via each ISP. From all of the examples and documentation I've seen, I thought the way to do this would be to create a "forwarding" routing instance for each ISP where the default route in each routing instance goes out its respective ISP. I think I nailed that part.

 

But when I create RPMs that use a routing instance or try to ping out of a routing instance, it doesn't work. When I tried to research this in the forum, this thread,

 

http://forums.juniper.net/t5/SRX-Services-Gateway/PBR-ISP-transitioning-help/m-p/55296/highlight/tru...

 

Seems to indicate that this is expected behavior. Forwarding routing instances don't work from the SRX itself. (Same goes for "virtual-router" instances apparently  so what's the point of the "routing-instance" option to the ping command?)

 

So what is the right way to monitor the same set of destinations out of different ISP links within RPM?

Contributor
yheffen
Posts: 42
Registered: ‎09-03-2009

Re: Three-way ISP Failover via RPM

Riviving my own old thread here.

 

I was diverted from implementing this and have since had more time to solve it. I have managed to get my RPM working (yea!) out multiple ISPs (three). Now I am trying to figure out how to configure my "event-options" to actually do something with the information RPM collects.

 

So for anyone interested (several people have asked if I've gotten anywhere on this), here's the RPM and routing setup. I use groups to avoid duplicating a lot of stuff in the RPM configuration,

 

 

groups {
    internet-rpm {
        services {
            rpm {
                probe <*> {
                    test google {
                        probe-type http-get;
                        target url http://google.com/;
                        test-interval 20;
                    }
                    test speedtest {
                        probe-type http-get;
                        target url http://speedtest.net/;
                        test-interval 20;
                    }
                }
            }
        }
    }
    test-primary {
        services {
            rpm {
                probe <*> {
                    test <*> {
                        source-address 10.100.20.14;  ## Primary ISP
                        routing-instance primary;
                    }
                }
            }
        }
    }
    test-backup {
        services {
            rpm {
                probe <*> {
                    test <*> {
                        source-address 172.20.46.190;  ## Backup ISP
                        routing-instance backup;
                    }
                }
            }
        }
    }
    test-dsl {
        services {
            rpm {
                probe <*> {
                    test <*> {
                        source-address 192.168.1.2;  ## When all else fails
                        routing-instance dsl;
                    }
                }
            }
        }
    }
}
routing-options {
    static {
        route 172.18.0.0/16 next-hop 172.18.108.1;
    }
    rib-groups {
        primary_int_rib {               
            import-rib [ internal.inet.0 primary.inet.0 ];
        }
        backup_int_rib {
            import-rib [ internal.inet.0 backup.inet.0 ];
        }
        dsl_int_rib {
            import-rib [ internal.inet.0 dsl.inet.0 ];
        }
    }
}
routing-instances {
    backup {
        instance-type virtual-router;
        interface reth0.0;
        routing-options {
            interface-routes {
                rib-group inet backup_int_rib;
            }
            static {
                route 0.0.0.0/0 next-hop 172.20.46.185;
            }                           
        }
    }
    dsl {
        instance-type virtual-router;
        interface reth1.991;
        routing-options {
            interface-routes {
                rib-group inet dsl_int_rib;
            }
            static {
                route 0.0.0.0/0 next-hop 192.168.1.254;
            }
        }
    }
    internal {
        instance-type virtual-router;
        interface reth2.32;
        interface reth2.300;
        routing-options {
            static {
                route 10.160.16.0/24 next-hop 172.18.3.2;
                route 10.160.240.0/24 next-hop 172.18.3.2;
                route 10.160.251.0/24 next-hop 172.18.3.2;
                route 172.16.1.0/24 next-hop 172.18.3.2;
                route 172.18.3.0/24 next-hop 172.18.3.2;
                route 172.18.4.0/22 next-hop 172.18.3.2;
                route 172.18.10.0/24 next-hop 172.18.3.2;
                route 172.18.108.0/24 next-hop 172.18.3.2;
                route 172.18.131.0/24 next-hop 172.18.3.2;
                route 172.18.140.0/24 next-hop 172.18.3.2;
                route 172.18.160.0/24 next-hop 172.18.3.2;
                route 0.0.0.0/0 next-table primary.inet.0;
            }
        }
    }
    primary {
        instance-type virtual-router;
        interface reth1.720;
        routing-options {
            interface-routes {
                rib-group inet primary_int_rib;
            }
            static {
                route 0.0.0.0/0 next-hop 10.100.20.1;
            }
        }
    }
}
services {
    rpm {
        probe internet-primary {
            apply-groups [ internet-rpm test-primary ];
        }
        probe internet-backup {         
            apply-groups [ internet-rpm test-backup ];
        }
        probe internet-dsl {
            apply-groups [ internet-rpm test-dsl ];
        }
    }
}

 This is only working in a lab setting, but it sure looks like it works.

 

My next steps are to actually use the RPM information I'm collecting to change the default route to the best choice available. Here's some pseudo-code showing how I'd like to use these links,

 

if (primary is ok)
    use primary;
elsif (backup is ok)
    use backup;
elsif (dsl is ok)
    use dsl;
else
    use primary;

 So I'm trying to turn that into an event-driven configuration under the "event-options" hierarchy. It's not obvious to me how to do it. Anyone care to nudge me towards some elegant way to handle this? In the mean time, I'm going keep staring at the couple of examples and documentation to see if I have any revelations.

 

 

Thanks.

 

Trusted Expert
Automate
Posts: 784
Registered: ‎11-01-2007
0

Re: Three-way ISP Failover via RPM

This is a great candidate for the config contest!

Contributor
yheffen
Posts: 42
Registered: ‎09-03-2009
0

Re: Three-way ISP Failover via RPM

Once again reviving my own thread.

 

The piece I'm stuck on now is how to "score" the RPM on a given link. I can write the SLAX script to do this for a given configuration, but I want to generalize it so if I went to modify my RPM configuration, I don't have to go in and customize the SLAX script each time.

 

For example, say I have three tests within an RPM probe. Say one test is more important than the other two. Test A checks connectivity to some critical resource whereas the other two tests, B and C, are just some generic Internet availability tests. If test A is good on one link, that's the one I want to use. If test A is good (or bad) on both, then I care about which link is doing better with B and C. This is easy enough to do with some simple scoring and weighting the tests,

 

  Test A is worth 4

  Test B is worth 1

  Test C is worth 1

 

Which ever interface scores highest, I use. If there is a tie, I do nothing (or maybe I prefer one link due to its bandwidth or cost, whatever).

 

I keep looking to see if there is some available way to do this within RPM. I don't see one. Or is this something I need to do with a script? Is there a slick way to do that in a generalized way (i.e. as I mention above, a single script can handle different RPM configurations without having to mess with the script itself each time a test is modified, test is added, weighting changed, etc.)?

Copyright© 1999-2013 Juniper Networks, Inc. All rights reserved.