SRX

last person joined: yesterday 

Ask questions and share experiences about the SRX Series, vSRX, and cSRX.
  • 1.  SRX340 control link wont establish over layer 2 network

    Posted 01-11-2019 06:55

    We currently have other branch devices clustered over layer 2 network.

    Now, with a newer cluster based on SRX340 and recommended JunOS, the cluster worked fine with direct cable, but once connected to layer 2 network it wont work.

    We have EX-switches between them.

    I can see the arp/mac-addresses for the control link.

    I can see traffic statistics that the is incoming and outgoing on the interface. But the control link wont come logical up and it says it doesnt see the other cluster member.

    Any ideas?

    The layer 2 network is configed with jumbo-frames, and as access ports (same setup as other branch devices with older JunOS)

     

    //Rob



  • 2.  RE: SRX340 control link wont establish over layer 2 network

    Posted 01-11-2019 18:49

    Please share below mentioned command outputs from both nodes:

    show interfaces terse | no-more

    show chassis cluster status

    show chassis cluster interfaces | no-more

    show chassis cluster information details | no-more (2 times)

     



  • 3.  RE: SRX340 control link wont establish over layer 2 network

    Posted 01-14-2019 00:52

    Hi,

    The devices sends control link messages, but no one recives it.

    But there is traffic both ways on the control link interfaces.

    fab0 up up
    fab0.0 up up inet 30.65.0.200/24
    fxp1 up up
    fxp1.0 up up inet 129.64.0.1/2
    tnp 0x1400001
    fxp2 up up
    fxp2.0 up up tnp 0x1400001

     

     

    ---node0:

    Cluster ID: 4
    Node Priority Status Preempt Manual Monitor-failures

    Redundancy group: 0 , Failover count: 0
    node0 100 secondary no no None
    node1 0 lost n/a n/a n/a

    Redundancy group: 1 , Failover count: 0
    node0 0 secondary no no None
    node1 0 lost n/a n/a n/a

    ---node1:

    Cluster ID: 4
    Node Priority Status Preempt Manual Monitor-failures

    Redundancy group: 0 , Failover count: 1
    node0 0 lost n/a n/a n/a
    node1 1 primary no no None

    Redundancy group: 1 , Failover count: 1
    node0 0 lost n/a n/a n/a
    node1 0 primary no no None

     

    --node0:

    Control link status: Up

    Control interfaces:
    Index Interface Monitored-Status Internal-SA Security
    0 fxp1 Down Disabled Disabled

    Fabric link status: Down

    Fabric interfaces:
    Name Child-interface Status Security
    (Physical/Monitored)
    fab0
    fab0

    Redundant-pseudo-interface Information:
    Name Status Redundancy-group
    lo0 Up 0

     

    --node1:

    Control link status: Up

    Control interfaces:
    Index Interface Monitored-Status Internal-SA Security
    0 fxp1 Down Disabled Disabled

    Fabric link status: Down

    Fabric interfaces:
    Name Child-interface Status Security
    (Physical/Monitored)
    fab1 ge-5/0/2 Up / Down Disabled
    fab1 ge-5/0/7 Up / Down Disabled

    Redundant-ethernet Information:
    Name Status Redundancy-group
    reth0 Down 1
    reth1 Down Not configured

    Redundant-pseudo-interface Information:
    Name Status Redundancy-group
    lo0 Up 0

     

    ---node1: ....yes it's sending control link, but nothing recived. same at both nodes.

    Control link statistics:
    Control link 0:
    Heartbeat packets sent: 233944
    Heartbeat packets received: 0
    Heartbeat packet errors: 0
    Fabric link statistics:
    Child link 0
    Probes sent: 7822
    Probes received: 0

     

    --node0... control link is increasing, but nothing recived.

    Control link statistics:
    Control link 0:
    Heartbeat packets sent: 2667
    Heartbeat packets received: 0
    Heartbeat packet errors: 0
    Fabric link statistics:
    Child link 0
    Probes sent: 0
    Probes received: 0

     

    ....from that point of view, things are pointing to the switch. Juniper EX4300.

    set interfaces ge-1/0/16 description ***SEVIS004FW-NODE1_FAB***
    set interfaces ge-1/0/16 mtu 9216
    set interfaces ge-1/0/16 unit 0 family ethernet-switching interface-mode access
    set interfaces ge-1/0/16 unit 0 family ethernet-switching vlan members FW811
    set protocols rstp interface ge-1/0/16 edge

     set interfaces ge-1/0/22 description ***SEVIS004FW-NODE1_Control_HA_Link***
    set interfaces ge-1/0/22 mtu 9216
    set interfaces ge-1/0/22 unit 0 family ethernet-switching interface-mode access
    set interfaces ge-1/0/22 unit 0 family ethernet-switching vlan members FW810
    set protocols rstp interface ge-1/0/22 edge

     

    rstp-port is in forwarding state.

    ge-1/0/22

    Input rate : 17536 bps (32 pps) - seems to be communicating, but the cluster wont form
    Output rate : 16960 bps (31 pps)

     

    ge-1/0/16 (fab)

    Input rate : 0 bps (0 pps) <-- nothing coming in here. Not sure if that's because control link aint comming up or why that migth happening. Will look into it.
    Output rate : 256 bps (1 pps)

     

     



  • 4.  RE: SRX340 control link wont establish over layer 2 network

    Posted 01-14-2019 20:04

    Hi RJ,

     

    I can see that node 0 shows Secondary/Lost which I consider weird. If both nodes are not able to see each other I will expect that both of them will go to primary/lost as they both will think that their peer doesnt exist and that they are in charge of the traffic.

     

    Cluster ID: 4
    Node Priority Status Preempt Manual Monitor-failures

    Redundancy group: 0 , Failover count: 0
    node0 100 secondary no no None
    node1 0 lost n/a n/a n/a

    Redundancy group: 1 , Failover count: 0
    node0 0 secondary no no None
    node1 0 lost n/a n/a n/a

     

    Can you confirm it says Secondary/lost? can you share a "show version" and "Show chassis hardware" of each node. Can you reboot node 0 and check if the Secondary/lost situation remains?

     



  • 5.  RE: SRX340 control link wont establish over layer 2 network

    Posted 01-14-2019 23:48

    The cluster is not in production yet. I configured it with directly connected cables. The cluster looked just fine, as it should.

    Since I have ha SRX240 cluster setup in the same way I configured the switches the same way. That is access-ports, separate VLANs for fab and control.

    I can see MAC/ARP on the control but not on the fab. Dunno why.

    I have rebooted both of them at the same time several times, and one at a time as well. 

    After some time without forming the cluster (the other known is set to "lost"), the FPC/PIC = the ethernet ports, is beeing shut down / not visible.

    If i disable the control port from the switch side, the hardware looks fine after reboot. So in some way it conflicts when trying to form the control link.

     

    node1 jsrpd log, just before and during the reboot:

    Jan 14 13:53:52 TCP-S: TCP peer closed connection
    Jan 14 13:54:12 PFE Rx client is shutdown (socket id: 22, session 820440)
    Jan 14 13:58:21 new PFE Rx client connection established (socket id: 22, session 820440)
    Jan 14 13:58:21 Received fabric monitor child status
    Jan 14 13:58:21 Received fabrics child link status from PFE
    Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_status: lnk_idx:0, link_state(curr:0, new:0)
    Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_status: lnk_idx:1, link_state(curr:0, new:0)
    Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_status: lnk_idx:2, link_state(curr:0, new:255)
    Jan 14 13:58:21 State of lnk-0 of fab1 remains DOWN
    Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_status: lnk_idx:3, link_state(curr:0, new:255)
    Jan 14 13:58:21 State of lnk-1 of fab1 remains DOWN
    Jan 14 13:58:21 HA Fabric Info: After fabric child status is updated
    Jan 14 13:58:21 node0: fab0 is Inactive with 0 child (AggId: 0)
    Jan 14 13:58:21 node1: fab1 is Active with 2 child (AggId: 131)
    Jan 14 13:58:21 link-0: ge-5/0/2 (5/0/2) is Active : ifd_state: Up pfe_state:Down secure_state Disabled
    Jan 14 13:58:21 link-1: ge-5/0/7 (5/0/7) is Active : ifd_state: Down pfe_state:Down secure_state Disabled
    Jan 14 13:58:21 Received fabric monitor child secure status
    Jan 14 13:58:21 Received fabrics child link secure status from PFE
    Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_secure_status: lnk_idx:0, link_secure_state(curr:0, new:0)
    Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_secure_status: lnk_idx:1, link_secure_state(curr:0, new:0)
    Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_secure_status: lnk_idx:2, link_secure_state(curr:0, new:255)
    Jan 14 13:58:21 State of lnk-0 of fab1 remains DISABLED
    Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_secure_status: lnk_idx:3, link_secure_state(curr:0, new:255)
    Jan 14 13:58:21 State of lnk-1 of fab1 remains DISABLED
    Jan 14 13:58:21 HA Fabric Info: After fabric child status is updated
    Jan 14 13:58:21 node0: fab0 is Inactive with 0 child (AggId: 0)
    Jan 14 13:58:21 node1: fab1 is Active with 2 child (AggId: 131)
    Jan 14 13:58:21 link-0: ge-5/0/2 (5/0/2) is Active : ifd_state: Up pfe_state:Down secure_state Disabled
    Jan 14 13:58:21 link-1: ge-5/0/7 (5/0/7) is Active : ifd_state: Down pfe_state:Down secure_state Disabled
    Jan 14 13:59:58 TCP-S: accepted client connection.
    Jan 14 13:59:58 TCP-S: TCP client from 129.64.0.1/49677 connected
    Jan 14 13:59:58 TCP-S: Peer msg: hello:node-1
    Jan 14 15:02:32 ISSU state: 0

     

    Current status, both control links is link up to the switch and the devices is sending control link msg, but nothing recieved.

    node1

    Redundancy group: 0 , Failover count: 1
    node0 0 lost n/a n/a n/a
    node1 1 primary no no None

    Redundancy group: 1 , Failover count: 1
    node0 0 lost n/a n/a n/a
    node1 0 primary no no CS

     

    node0

    Redundancy group: 0 , Failover count: 0
    node0 100 secondary no no None
    node1 0 lost n/a n/a n/a

    Redundancy group: 1 , Failover count: 0
    node0 0 secondary no no None
    node1 0 lost n/a n/a n/a

     

    [15.1X49-D150.2]

     

    node1:
    --------------------------------------------------------------------------
    Hardware inventory:
    Item Version Part number Serial number Description
    Chassis CY5117AF1619 SRX340
    Routing Engine REV 0x12 650-065043 CY5117AF1619 RE-SRX340
    FPC 0 FPC
    PIC 0 8xGE,8xGE SFP Base PIC
    Power Supply 0

     

    ...and as I've seen before, one node is shutting down fpc.

    node0:
    --------------------------------------------------------------------------
    Hardware inventory:
    Item Version Part number Serial number Description
    Chassis SRX340
    FPC 0 FPC
    Power Supply 0

     



  • 6.  RE: SRX340 control link wont establish over layer 2 network

    Posted 01-15-2019 08:22

    RJ,

     

    If you run a "show chassis fpc pic-status" do you see the FPC/PIC offline?

     

    Can you disable the cluster and once the nodes bootup in standalone mode, delete the whole configuration and then retry the cluster formation again?

     

         > set chassis cluster disable reboot

     

    After bootup:

     

         #delete (type yes)
         #set system root-authentication plain-text-password (set a root password)
         #commit.
         # show (confirm that only the root password is configured)
         # run request chassis cluster cluster-id [value] node [value] reboot

    Please log the bootup process and confirm the cluster status on both nodes.

     



  • 7.  RE: SRX340 control link wont establish over layer 2 network
    Best Answer

    Posted 01-23-2019 00:37

    I pinpointed the issue to the switches/L2 segment.

    igmp-snooping was enabled.

    Strange that the SRX240 cluster (running recommended version) was working just fine.

     

    I removed igmp-snooping, since we dont use full multicast configuration with rp etc. it's ok if it's flooded like broadcast.

    Once it was removed the cluster came up fully working.

    Switches, EX4300 has different behavior than EX4200. Not possible to disable igmp on selected VLANs (very bad design by Juniper). So if you want igmp-snooping but not for a few, then you need to configure igmp-snooping when creating new VLANs. Just adding more administrative work/configuration when we should move the other way with less specific config.