SRX Services Gateway
Reply
Contributor
RichardF
Posts: 15
Registered: ‎07-23-2010
0

HA cluster management

Hi All,

 

Can I get some feedback as to how many of you guys manage a HA cluster?

 

I have been trying to work out a easy way for us to manage clusters without being terribly complex.

Specifically being able to manage each node, 99% of the time this isn't actually needed however when doing upgrades and such it's required.

 

My experience so far:

 

Ok, so when you cluster the unit it creates fxp0 and fxp1 which you can configure via groups.

Yes this works, however this seems to be a purely OOB management. The interface IPs and routes via this interface are installed into the global route table and as such this means you can't also route traffic through the devices from your management network.

 

So, routing-instances.

Can't put the fxp interfaces in a new routing instance.

 

Ok, so I'm willing to burn two ports in the name of being able to manage each node

Interfaces 6/0/15 and 15/0/15 selected for this task.

Can't put IP addresses on them through the interface command as this drops the secondary out of a HA state.

Can put the IP addressing on the interfaces via groups, however for some reason this only works when under node0.

Getting somewhere with this, however when creating a security zone to put the interfaces in and apply services it will not accept due to 15/0/15 not being configured under interfaces.

 

So really would like someone to point out where I'm going wrong, or if I'm just going about this the wrong way.

 

Configs below

 

 

Cheers & many thanks to any input.

 

 

 

node0 {
    system {
        host-name fw-02-01;
    }
    interfaces {
        ge-6/0/15 {
            unit 0 {
                family inet {
                    address 172.20.187.253/26;
                }
            }
        }
        ge-15/0/15 {
            unit 0 {
                family inet {
                    address 172.20.187.252/26;
                }
            }
        }
    }
}
node1 {
    system {
        host-name fw-02-02;   
    }

 

zones {
    functional-zone management {
        interfaces {                    
            ge-6/0/15.0;
            ge-15/0/15.0;
        }
    }
}

 

 

Contributor
gxc11
Posts: 24
Registered: ‎01-30-2008

Re: HA cluster management

You are not alone in your struggles!  Such a task can get nasty pretty fast.  Right now the way I do this at sites that do not have out of band management is SSH to the cluster, which puts you on the primary, in my case node 0 and jump over to the secondary as follows:

 

(10.1 I believe this command came, if not at 10.1 or higher, there is no hope!)

 

root@SRX-Node0> request routing-engine login node 1

 

That puts you on node 1, and off you go...

 

The next issue of course is upgrading.  If you are local, bust out the USB stick, mount it, copy the files off, etc...if remote or just to see how to copy stuff around, see this post from Thorsten:

 

http://forums.juniper.net/t5/SRX-Services-Gateway/Minimum-effort-SRX-Cluster-upgrade-procedure/m-p/5...

 

-Gerry

Distinguished Expert
keithr
Posts: 979
Registered: ‎09-10-2009

Re: HA cluster management

SRX Cluster management.  Fun, ain't it?

 

Personally, I would love to meet the person or persons responsible for the design of how these devices operate for management, and introduce those parties to what I refer to as the "brick of education."

 

As far back as the pre-Juniper days of Netscreen, I have always had issue with the way their so-called "out-of-band management" ports operate.  Seems Juniper took a bad idea and kept it alive and kicking.

 


RichardF wrote:

Yes this works, however this seems to be a purely OOB management. The interface IPs and routes via this interface are installed into the global route table and as such this means you can't also route traffic through the devices from your management network. 


You hit it on the head here.  There is a bit of dissension about what OOB means.  When a network device puts the routes for its supposedly out-of-band network management interface into the system's master routing table, that does NOT fit my definition of "out-of-band," because as you said, the system will then start to route transit traffic to/from your management network through that interface! WRONG, Juniper! WRONG! If the device installs routes through that interface into its primary routing table, that is by definition *IN BAND*.

 

Juniper's answer is that we're supposed to create an entirely separate network for the sole purposes of managing our Juniper devices.  Yeah, that's great if you have a small, simple network.  What are customers with large, diverse, and complicated networks supposed to do? That's not exactly a trivial task.

 

Your configuration examples of putting interfaces into your node0 and node1 groups aren't going to work, as you've mostly discovered. The thing to remember about how the SRXs do clustering is that two devices logically become one device.  That is the biggest hurdle to get past when trying to wrap your head around how these things operate.  You can think of an SRX cluster as you would a single switch, maybe a Cisco 6500, for example, that has 2 supervisor cards.  One card is active in the chassis, the other is standby.  If the active card fails, the standby card picks up.  Both cards live in the same switch and service the same ports.  Your two SRX devices become one device with two "supervisor cards" (Routing Engines).  Think of node1 as just extra ports, not as an individual device.  The RE in node0 is active (usually), and the RE in node1 basically shuts down and goes to sleep until it's needed. That turns node1 into a dumb box of network interfaces, for all intents and purposes, and is why the interfaces you configure under node1 don't work.  If you fail the routing engine over to node1, then those interfaces on that node would start working and the ones on node0 would stop working.  Juniper's way around this is with the blasted fxp0 interfaces.  If you have the luxury of building an isolated network segment for your SRX management, then you're in good shape to manage those devices.  For the rest of us in the real world, though, it turns into a hassle.

 

Depending on what your needs are, with Junos 10.1R2 (I believe) and newer you can use "Virtual Chassis" mode for managing the SRX cluster.  It brings a little bit of sanity to the way they operate, but I would say it's still got a long way to go.  I honestly don't know if VC mode solves the issue of being able to download IDP or other software updates to the node1 box, or if it even solves the issue of node1 not being able to sync its clock with your NTP servers (another glaring oversight in my opinion...). Perhaps someone who's used VC mode can chime in on what it solves and what it doesn't solve.

 

Here is a KB that describes VC mode:  http://kb.juniper.net/InfoCenter/index?page=content&id=KB18228

 

Good luck.  :smileyhappy:

 

-kr


---
If this solves your problem, please mark this post as "Accepted Solution."
Kudos are always appreciated.
Distinguished Expert
firewall72
Posts: 825
Registered: ‎05-04-2008
0

Re: HA cluster management

Hi,

 

I have my 240 in VC mode and managing via the NSM.  I also have a 210 VC.  Both are able to download IDP updates without issue.  The thing I haven't tested yet is are they making it over to the secondary.  In my opinion, VC is the only way to go, especially when factoring in a remote NSM.  The "backup-router" approach is buggy across all versions and JTAC was unable to resolve my issues.  I ran packet captures and found the mgt traffic leaving the fxp's for the nsm was hit and miss.

 

John

John Judge
JNCIS-SEC, JNCIS-ENT,

If this solves your problem, please mark this post as "Accepted Solution". Kudos are appreciated.
Distinguished Expert
muttbarker
Posts: 2,362
Registered: ‎01-29-2008
0

Re: HA cluster management

Hey John - IDP updates DO NOT propogate from primary to secondary. You must update manualy. I had a JTAC case on this one and got validation on this ever so sad fact.....

Kevin Barker
JNCIP-SEC
JNCIS-ENT, FWV, SSL, WLAN
JNCIA-ER, EX, IDP, UAC, WX
Juniper Networks Certified Instructor
Juniper Networks Ambassador

Juniper Elite Reseller
J-Partner Service Specialist - Implementation

If this worked for you please flag my post as an "Accepted Solution" so others can benefit. A kudo would be cool if you think I earned it.
Distinguished Expert
keithr
Posts: 979
Registered: ‎09-10-2009
0

Re: HA cluster management

 


muttbarker wrote:

Hey John - IDP updates DO NOT propogate from primary to secondary. You must update manualy. I had a JTAC case on this one and got validation on this ever so sad fact.....


 

But if it's in VC mode, can the secondary at least reach out to the internet to download the updates?  That's always been the primary problem with the "old" chassis cluster mode, the secondary could only reach out via its fxp0, which is no mystery what a nightmare that can be to manage.

-kr


---
If this solves your problem, please mark this post as "Accepted Solution."
Kudos are always appreciated.
Distinguished Expert
muttbarker
Posts: 2,362
Registered: ‎01-29-2008
0

Re: HA cluster management

Nope - need to ftp the updates from the primary to the secondary. A BSD script could be written I guess. It is sadly another example of just how these boxes lack functionality in the production world.

 

Kevin Barker
JNCIP-SEC
JNCIS-ENT, FWV, SSL, WLAN
JNCIA-ER, EX, IDP, UAC, WX
Juniper Networks Certified Instructor
Juniper Networks Ambassador

Juniper Elite Reseller
J-Partner Service Specialist - Implementation

If this worked for you please flag my post as an "Accepted Solution" so others can benefit. A kudo would be cool if you think I earned it.
Visitor
jerome
Posts: 7
Registered: ‎05-24-2008
0

Re: HA cluster management

Hello,

There's also something disappointing about cluster management in real world: when one cluster member gets disabled (after a fabric link failure for example), the fxp0 interface is down (physically up but does not answer to requests). The only way to manage the disabled box (to issue a reboot) is to log via serial console.  Or use the trick "request routing-engine login node 1" but I discovered it does not work in all cases.

 

I also confirm that using fxp0 routing is a nightmare in real world situation (different behaviours on primary/secondary with egress packets or ingress packets, managing route priority between backup-router routing table and inet.0 routing table, etc...)

 

I think real and robust out-of-band management should be a priority in the product roadmap (look at Netscreen MGT interface which can be considered as a beginning of a decent management interface). The lack of such feature is a design flaw...

Super Contributor
motd
Posts: 221
Registered: ‎12-16-2008
0

Re: HA cluster management

 


muttbarker wrote:

Nope - need to ftp the updates from the primary to the secondary. A BSD script could be written I guess. It is sadly another example of just how these boxes lack functionality in the production world.

 


An event script already exists. No idea who wrote it, maybe jtac has a copy. Lots of people have written scripts to work around SRX/NSM limitations (even I wrote a few very basic ones), but most of them aren't public unfortunately. I'm still looking for one that will keep RG0 on the same node as RG1 for all the features that are not supported in A/A :smileyhappy:

 

 

SRX cluster management has always been a bit of a challenge. Its doable but I have had to use some ugly tricks sometimes. Especially the backup-router behavior is difficult to understand for most (or just poorly documented), its not an entirely separate table.

Oh well, as soon as we can terminate VPN connections in non-default routing instances, we can use the old netscreen trick and dedicate inet.0 to management and use a routing instance for all transit traffic.

Distinguished Expert
dfex
Posts: 708
Registered: ‎04-17-2008
0

Re: HA cluster management

Hi All,

 

I've been watching this thread with interest, as I too have struggled with a way around the limitations of chassis-cluster and the fxp0 silliness.

 

What I have deployed in the past is as follows:

 

Connect the fxp0 interface on each SRX to redundant switches which have a management VLAN stretched between them. Next configure a Management security zone on the firewall, and put an interface from this zone into the same management VLAN on that switch - essentially you're connecting all SRX fxp0 and management ge interfaces to each other in the same VLAN.

 

Because of the JUNOS stupidity with the fxp0 interface being present in the global table, but still out-of-band, you need to place all "Security" interfaces in their own virtual router.  This separates the fxp0 routes from the VR table (which is now used essentially as the new "global" table).  It would be nice to do it the other way around, but you can't place fxp0 in a VR.

 

When configuring the backup-router command under your node groups, point it to the IP address on the Management security zone interface and add appropriate security policies to allow traffic to/from this zone referencing the fxp0 IP address as if it were a host.  A word of caution here - when using backup-router, I've had all sorts of issues specifying 0.0.0.0/0 as the destination route, so make it the specific prefix(es) of your management and you shouldn't have (as many) issues.

 

If you are doing IPSEC on your SRX, the above topology may be problematic, as IKE can't be negotiated over interfaces inside a virtual router yet (coming in either 10.4 or 11.1 I believe).

 

As someone mentioned above, IDP updates are also problematic with HA clusters.

 

From JUNOS 10.1r2 and onwards, there is a hidden command:

 

 

set chassis cluster network-management cluster-master

 which allows in-band management from any interface, and the chassis cluster appears/behaves more like an EX Virtual-Chassis in NSM (eg: 2x REs).  Again there are caveats around deploying this way, but it might help out some of you who are struggling with fxp0 issues.

 

Good luck!

 

Ben

 

 

 

 

Ben Dale
JNCIP-ENT, JNCIS-SP, JNCIE-SEC #63
Juniper Ambassador
Follow me @labelswitcher
Copyright© 1999-2013 Juniper Networks, Inc. All rights reserved.