SRX

last person joined: yesterday 

Ask questions and share experiences about the SRX Series, vSRX, and cSRX.
Expand all | Collapse all

SRX 220 Chassis cluster throughput to 3 individual switches (2 x EX2200 and 1 x CiscoLinksys).

  • 1.  SRX 220 Chassis cluster throughput to 3 individual switches (2 x EX2200 and 1 x CiscoLinksys).

    Posted 02-10-2013 01:36

    Hi guys,

    2 independent ISP links (PPPoE), one terminated on each SRX for redundancy with multiple RIBs for failover (ISP links not in Redundancy Groups). 3 sets of dual links from each switch uplinking to SRX cluster(one per SRX). SRX220's running 11.4R6.6 JTAC recommended and supports ethernet switching (as lost ports to fxp0,fxp1,fab0's,swfab's). Trying to make out the best design for redundancy/throughput/simplicity in the case of system or sub-system failure modes.

     

    I keep coming back to a simple dual uplink from each switch (secondary held down with RSTP) as the JunOS requirements for LACP and LAGs/reth's (combined with lack of port density and cable runs) means it's not clear at all if we can get 2gbps uplink from each switch spread across the SRXs. Can I just run a reth as an LACP bundle via links from each SRX to a single AE on a single EX and get 2gbps? (Been through much doco and it all seems to talk about local LAGs bundled in reths however I have only one link per SRX to each switch).

     

    Trying to get the most active bandwidth during normal operation e.g. 2Gbps bundled uplink from each switch to the SRX cluster and only when a chassis/RE fails does the failover occur. * Anyone got any ideas other than single uplinks in single AE's inside a reth? (Which means the switch will shut down one AE due to STP and/or we only get 1gbps active uplink?)

     

    Note: The EX Virtual Chassis is out of the question as it's licensing price point is too high for this small pod.



  • 2.  RE: SRX 220 Chassis cluster throughput to 3 individual switches (2 x EX2200 and 1 x CiscoLinksys).

    Posted 02-13-2013 03:29

    srx-cluster-research.jpg

     

    SRX:

    set interfaces ge-0/0/4 description "=== channel to au-mel-XXXXXX-sw03c==="
    set interfaces ge-0/0/4 gigether-options redundant-parent reth4
    set interfaces ge-3/0/4 description "=== channel to au-mel-XXXXXX-sw03c ==="
    set interfaces ge-3/0/4 gigether-options redundant-parent reth4
    set interfaces fab0 fabric-options member-interfaces ge-0/0/5
    set interfaces fab1 fabric-options member-interfaces ge-3/0/5
    set interfaces reth2 redundant-ether-options redundancy-group 2
    set interfaces reth2 redundant-ether-options lacp passive
    set interfaces reth4 vlan-tagging
    set interfaces reth4 redundant-ether-options redundancy-group 4
    set interfaces reth4 redundant-ether-options lacp active
    set interfaces reth4 unit 0 vlan-id 1
    set interfaces reth4 unit 33 vlan-id 33
    set interfaces reth4 unit 40 vlan-id 40
    set interfaces reth4 unit 48 vlan-id 48
    set interfaces reth4 unit 50 vlan-id 50
    set interfaces swfab0 fabric-options member-interfaces ge-0/0/1
    set interfaces swfab1 fabric-options member-interfaces ge-3/0/1
    set security zones security-zone trust host-inbound-traffic system-services all
    set security zones security-zone trust host-inbound-traffic protocols all
    set chassis cluster redundancy-group 4 node 0 priority 100
    set chassis cluster redundancy-group 4 node 1 priority 1
    set chassis cluster redundancy-group 4 preempt
    set chassis cluster redundancy-group 4 interface-monitor ge-0/0/0 weight 255
    set chassis cluster redundancy-group 4 interface-monitor ge-0/0/4 weight 255
    set chassis cluster redundancy-group 4 interface-monitor ge-3/0/4 weight 255

    reth4 and reth4.subints have been placed in 'trust' zone also.... 

     

    EX (SCENARIO B):

    set interfaces ae4 aggregated-ether-options lacp passive
    set interfaces ae4 unit 0 family ethernet-switching port-mode trunk
    set interfaces ae4 unit 0 family ethernet-switching vlan members Data_40
    set interfaces ae4 unit 0 family ethernet-switching vlan members Guest_48
    set interfaces ae4 unit 0 family ethernet-switching native-vlan-id 33
    set interfaces ae14 aggregated-ether-options lacp passive
    set interfaces ae14 unit 0 family ethernet-switching port-mode trunk
    set interfaces ae14 unit 0 family ethernet-switching vlan members Data_40
    set interfaces ae14 unit 0 family ethernet-switching vlan members Guest_48
    set interfaces ae14 unit 0 family ethernet-switching native-vlan-id 33
    set interfaces ge-0/0/0 description "=== channel to au-mel-hub-fw03a ==="
    set interfaces ge-0/0/0 ether-options 802.3ad ae4
    set interfaces ge-0/0/0 unit 0 family ethernet-switching
    set interfaces ge-0/0/1 description "=== channel to au-mel-hub-fw03a ==="
    set interfaces ge-0/0/1 ether-options 802.3ad ae14
    set interfaces ge-0/0/1 unit 0 family ethernet-switching

     Any help on the above regarding 2gbps LACP, native VLAN on SRX, and why packets ain't coming back?



  • 3.  RE: SRX 220 Chassis cluster throughput to 3 individual switches (2 x EX2200 and 1 x CiscoLinksys).

    Posted 02-13-2013 16:20

    Does a new reth L3 sub int totally replace the VLAN rvi I had previously in a layer 2 trunked up to VLAN rvi environment or can I do true ethernet-switching on the reth?



  • 4.  RE: SRX 220 Chassis cluster throughput to 3 individual switches (2 x EX2200 and 1 x CiscoLinksys).

    Posted 02-13-2013 17:05

    So KB article http://kb.juniper.net/InfoCenter/index?page=content&id=KB21422 says "NOTE: As of this writing, while using ethenet-switching in chassis cluster deployment Layer3 routing from L2 ethernet-switching network via L3-interface Vlan.X is not supported." ....

     

    Originally 'ethernet switching' was not supported on SRX as per http://www.juniper.net/techpubs/en_US/junos10.4/topics/task/operational/chassis-cluster-before-enable-on-srx100-srx210-srx240-device-switching-disabling.html

     

    Then 'ethernet switching' was introduced as per JunOS 11.2 on SRX220's http://kb.juniper.net/InfoCenter/index?page=content&id=KB21312

     

    Am starting to think we can get 2gbps upstream from the EX but only 1gbps downstream from the SRX as both LAGs from the point of view of the switch are up however the swfab link completes the connection to the primary SRX node and that's where the uplinks terminate. Both links from the EX are active in terms of LACP and forwarding for STP.

     

    EX:

    root@au-mel-XXX-sw03c# run show lacp interfaces
    Aggregated interface: ae4
        LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
          ge-0/0/0       Actor    No    No   Yes  Yes  Yes   Yes     Fast   Passive
          ge-0/0/0     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active
        LACP protocol:        Receive State  Transmit State          Mux State
          ge-0/0/0                  Current   Fast periodic Collecting distributing
    
    Aggregated interface: ae14
        LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
          ge-0/0/1       Actor    No    No   Yes  Yes  Yes   Yes     Fast   Passive
          ge-0/0/1     Partner    No    No   Yes  Yes  Yes   Yes     Fast    Active
        LACP protocol:        Receive State  Transmit State          Mux State
          ge-0/0/1                  Current   Fast periodic Collecting distributing
    
    root@au-mel-XXX-sw03c# run show spanning-tree interface
    
    Spanning tree interface parameters for instance 0
    
    Interface    Port ID    Designated      Designated         Port    State  Role
                             port ID        bridge ID          Cost
    ae4.0            128:5        128:5  32768.54e0320ad0c1     20000  FWD    DESG
    ae14.0          128:15       128:15  32768.54e0320ad0c1     20000  FWD    DESG
    
    oot@au-mel-XXX-sw03c# run show vlans
    Name           Tag     Interfaces
    Data_40        40
                           ae4.0*, ae14.0*, ge-0/0/4.0, ge-0/0/5.0, ge-0/0/6.0,
                           ge-0/0/7.0, ge-0/0/8.0, ge-0/0/9.0, ge-0/0/10.0,
                           ge-0/0/11.0, ge-0/0/12.0, ge-0/0/13.0, ge-0/0/14.0,
                           ge-0/0/15.0, ge-0/0/16.0, ge-0/0/17.0, ge-0/0/18.0,
                           ge-0/0/19.0, ge-0/0/20.0, ge-0/0/21.0, ge-0/0/22.0,
                           ge-0/0/23.0
    Guest_48       48
                           ae4.0*, ae14.0*, ge-0/0/10.0, ge-0/0/11.0, ge-0/0/12.0,
                           ge-0/0/13.0
    Management_33  33
                           ae4.0*, ae14.0*, ge-0/0/10.0, ge-0/0/11.0, ge-0/0/12.0,
                           ge-0/0/13.0
    default
                           ge-0/0/2.0, ge-0/0/3.0

     

     

     



  • 5.  RE: SRX 220 Chassis cluster throughput to 3 individual switches (2 x EX2200 and 1 x CiscoLinksys).

    Posted 02-13-2013 18:33

    Well spanning tree doesn't run and it looks like the whole 'ethernet-switching' is a misnomer in this case, as is the swfab0's. If trunking to a L3 SRX then reths with L3 RVI/sub ints will suffice (with vlan tagging enabled). The VLAN l3 int and configuring a reth as a purely layer2 concept doesn't in fact work it seems. So much for that concept. Now to try and re-jig native VLANs on the EX side and ensure all VLANs are tagged.... anyone got any comments on the design and/or experience with L2 constructs/switching domains on chassis clustered SRXs? 

     

    It looks like only 1gbps at any one time (as I looked at ICMP packet counts on links of the SRX)... shame we can't get 2gbps to the cluster i.e. active/active and a single LACP bundle that would theoretically only lose one link at any time (the SRX 220 is starved of ethernet ports in its base configuration).

     

    I may try the single LACP bundle from the EX to the SRX and experiment if I lose packets or get asymmetric flows... (as that's what I thought the 'ethernet-switching' and swfab interfaces would provide for...)...



  • 6.  RE: SRX 220 Chassis cluster throughput to 3 individual switches (2 x EX2200 and 1 x CiscoLinksys).

    Posted 02-15-2013 21:24

    So just to confirm that a very basic attempt at 3 separate LAGs to each switch i.e. an AE across SRX chassis and relying upon simple L2 switching with L3 vlans doesn't work as AE's *must* terminate per node. SRX220H, JunOS 11.4R6.6 chassis cluster:

     

    [edit interfaces ge-3/0/2 gigether-options]
      '802.3ad'
         In Clustering mode, child links ge-3/0/2, ge-0/0/2 of bundle ae2  must be from same chassis
    [edit security zones security-zone trust]
      'interfaces ge-0/0/2.0'
        Interface ge-0/0/2.0 must be configured under interfaces

     Am going back to a reth with 6 members which basically terminate on 3 switches. 2 remote AEs per switch and one reth on the SRX cluster as I need the L3 int on the SRX to be the default gateway for multiple L2 domains. Unless I can put a L3 address on reth4 and have it reachable via vlan tagging from another reth? (To keep the 3 reth's separate e.g. 3x2 rather than 1x6 links)...



  • 7.  RE: SRX 220 Chassis cluster throughput to 3 individual switches (2 x EX2200 and 1 x CiscoLinksys).

    Posted 02-17-2013 06:16

    So...

     

    Reth ints do not support VRRP http://kb.juniper.net/InfoCenter/index?page=content&id=KB12881

    Known limitations for 11.4 and Chassis Cluster http://www.juniperpodcast.com/techpubs/en_US/junos11.4/information-products/topic-collections/release-notes/11.4/index.html?topic-62169.html

     

    a) Can the 6 trunks from 3 discrete switches be in the same reth4?

    b) How would 3 reths habe a single shared layer 3 gateway? (VLAN sub ints can not be used with chassis cluster)

    c) Separate AE's can not be used on the SRX side but must be used on the switch side.

    d) Any ideas on how to achieve this?

     

    srx-cluster-research-stage2.jpg

     

     

     

     



  • 8.  RE: SRX 220 Chassis cluster throughput to 3 individual switches (2 x EX2200 and 1 x CiscoLinksys).
    Best Answer

    Posted 02-19-2013 19:26

    OK so after testing 12.1 which doesn't support the L2/L3 mix with VLAN subints and RETHs my design has fallen back to good old switching with 6 separate trunks (actually wrapped in 6 separate AEs just cause I already had config on remote switches)... and good old Rapid Spanning Tree.

     

    I only have a basic chassis failover for RG0 and 'ethernet-switching' is enabled. During failover I dropped ~10 ICMP packets and with failback I got 14 ICMP drops and 9 DUPs but it's ok for now.

     

    Idea was to have multiple shared RIBs for active/active ISP load balancing, failover, and traffic engineering w/ 1 ISP link per node, and then 3 switches with redundant connections to the cluster carrying all VLANs with the L3 gw on the cluster. 

     

    I am going back to JTAC recommended 11.4R6.6 and simple switching. I don't get to have 2gbps (never could it seems) or seamless failover but RSTP works, the AEs work, and I can lose 1 of anything and the whole site keeps going.

     

    N+1 for everything from the edge inwards at this SME except the UPS... working on it Smiley Very Happy



  • 9.  RE: SRX 220 Chassis cluster throughput to 3 individual switches (2 x EX2200 and 1 x CiscoLinksys).

    Posted 02-19-2013 21:53

    New problem. On 11.4R6.6 the manual failover works fine with a few dropped packets but pulling the power from cluster node0 as primary results in an issue with LACP not running on new node1 primary. Basically the cluster was dead in the water. I am going to go back to 12.1 and see what happens if I can reproduce the bug. 

     

    Watch this space... Grrrrrrrr.... Robot Mad



  • 10.  RE: SRX 220 Chassis cluster throughput to 3 individual switches (2 x EX2200 and 1 x CiscoLinksys).

    Posted 02-19-2013 23:33

    Long story short, staying on 11.4R6.6 (didn't bother going up to 12.1 again to test) and dropped LACP from configs as it's just double encapsulation at this point and adds no value on the SRX220H due to lack of available ports.

     

    With simple trunks and spanning tree + a chassis cluster on RG0, I can now pull the power from a cluster member, drop 20-40 packets, and am back up on at least a 1gbps link (1 of the 2 redundant links) from the EX to the cluster.

     

    I will now do the same for the other EX and the Cisco.... this has been a very frustrating excercise and the long way round to a simplified config. N+1... with blood sweat and tears, no fast failover, and stateful in limited scenarios. Smiley Frustrated



  • 11.  RE: SRX 220 Chassis cluster throughput to 3 individual switches (2 x EX2200 and 1 x CiscoLinksys).

    Posted 03-14-2013 02:23

    JTAC ensures me this failure of the 'ethernet-switching' and LACP service on an SRX cluster will be fixed in 12.1x46



  • 12.  RE: SRX 220 Chassis cluster throughput to 3 individual switches (2 x EX2200 and 1 x CiscoLinksys).

    Posted 08-13-2013 00:17

    Running 11.4R7.5 I had to dual home the secondary PPPoE backup link in a reth and an RG2 to get it to work for FBF'd packets arriving at either node. When power pulled on node0 everything fails to node1 fine and running on secondary PPPoE link. Note: RSTP restarts fine and the IRB interface advertises itself as root *however*:

     

    Main issue now is that  if RG0 only fails across to node1 that the STP root bridge comes across to node1 for a few seconds, spanning tree reconverges and then the root bridge via IRB goes back to node0 which is subsequently being blocked on downstream switches.

     

    Clustering + six pack config + (R)STP is a mess. So only a full node failure or upstream ISP link failure is covered in our HA design. If RG0 for whatever reason (control link, internal sub-component) fails, then STP blackholes traffic and/or the primary ISP filtered forwarding gets blackholed on primary pp link by virtue of the FBF still seeing the PPPoE/PPP.

     

    Only way I can see is to reth both upstream ISP links (if the individual modems support dual attached cabling)... but this doesn't address the (R)STP bug... thinking about Fortinet now for clusters and per client shaping....

     

    Note: Even the JTAC supplied custom build didn't work.