SRX Services Gateway
SRX Services Gateway

SRX340 control link wont establish over layer 2 network

‎01-11-2019 06:55 AM

We currently have other branch devices clustered over layer 2 network.

Now, with a newer cluster based on SRX340 and recommended JunOS, the cluster worked fine with direct cable, but once connected to layer 2 network it wont work.

We have EX-switches between them.

I can see the arp/mac-addresses for the control link.

I can see traffic statistics that the is incoming and outgoing on the interface. But the control link wont come logical up and it says it doesnt see the other cluster member.

Any ideas?

The layer 2 network is configed with jumbo-frames, and as access ports (same setup as other branch devices with older JunOS)

 

//Rob

6 REPLIES 6
SRX Services Gateway

Re: SRX340 control link wont establish over layer 2 network

‎01-11-2019 06:48 PM

Please share below mentioned command outputs from both nodes:

show interfaces terse | no-more

show chassis cluster status

show chassis cluster interfaces | no-more

show chassis cluster information details | no-more (2 times)

 

Thanks,
Nellikka
JNCIE x3 (SEC #321; SP #2839; ENT #790)
Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too!!!
SRX Services Gateway

Re: SRX340 control link wont establish over layer 2 network

‎01-14-2019 12:51 AM

Hi,

The devices sends control link messages, but no one recives it.

But there is traffic both ways on the control link interfaces.

fab0 up up
fab0.0 up up inet 30.65.0.200/24
fxp1 up up
fxp1.0 up up inet 129.64.0.1/2
tnp 0x1400001
fxp2 up up
fxp2.0 up up tnp 0x1400001

 

 

---node0:

Cluster ID: 4
Node Priority Status Preempt Manual Monitor-failures

Redundancy group: 0 , Failover count: 0
node0 100 secondary no no None
node1 0 lost n/a n/a n/a

Redundancy group: 1 , Failover count: 0
node0 0 secondary no no None
node1 0 lost n/a n/a n/a

---node1:

Cluster ID: 4
Node Priority Status Preempt Manual Monitor-failures

Redundancy group: 0 , Failover count: 1
node0 0 lost n/a n/a n/a
node1 1 primary no no None

Redundancy group: 1 , Failover count: 1
node0 0 lost n/a n/a n/a
node1 0 primary no no None

 

--node0:

Control link status: Up

Control interfaces:
Index Interface Monitored-Status Internal-SA Security
0 fxp1 Down Disabled Disabled

Fabric link status: Down

Fabric interfaces:
Name Child-interface Status Security
(Physical/Monitored)
fab0
fab0

Redundant-pseudo-interface Information:
Name Status Redundancy-group
lo0 Up 0

 

--node1:

Control link status: Up

Control interfaces:
Index Interface Monitored-Status Internal-SA Security
0 fxp1 Down Disabled Disabled

Fabric link status: Down

Fabric interfaces:
Name Child-interface Status Security
(Physical/Monitored)
fab1 ge-5/0/2 Up / Down Disabled
fab1 ge-5/0/7 Up / Down Disabled

Redundant-ethernet Information:
Name Status Redundancy-group
reth0 Down 1
reth1 Down Not configured

Redundant-pseudo-interface Information:
Name Status Redundancy-group
lo0 Up 0

 

---node1: ....yes it's sending control link, but nothing recived. same at both nodes.

Control link statistics:
Control link 0:
Heartbeat packets sent: 233944
Heartbeat packets received: 0
Heartbeat packet errors: 0
Fabric link statistics:
Child link 0
Probes sent: 7822
Probes received: 0

 

--node0... control link is increasing, but nothing recived.

Control link statistics:
Control link 0:
Heartbeat packets sent: 2667
Heartbeat packets received: 0
Heartbeat packet errors: 0
Fabric link statistics:
Child link 0
Probes sent: 0
Probes received: 0

 

....from that point of view, things are pointing to the switch. Juniper EX4300.

set interfaces ge-1/0/16 description ***SEVIS004FW-NODE1_FAB***
set interfaces ge-1/0/16 mtu 9216
set interfaces ge-1/0/16 unit 0 family ethernet-switching interface-mode access
set interfaces ge-1/0/16 unit 0 family ethernet-switching vlan members FW811
set protocols rstp interface ge-1/0/16 edge

 set interfaces ge-1/0/22 description ***SEVIS004FW-NODE1_Control_HA_Link***
set interfaces ge-1/0/22 mtu 9216
set interfaces ge-1/0/22 unit 0 family ethernet-switching interface-mode access
set interfaces ge-1/0/22 unit 0 family ethernet-switching vlan members FW810
set protocols rstp interface ge-1/0/22 edge

 

rstp-port is in forwarding state.

ge-1/0/22

Input rate : 17536 bps (32 pps) - seems to be communicating, but the cluster wont form
Output rate : 16960 bps (31 pps)

 

ge-1/0/16 (fab)

Input rate : 0 bps (0 pps) <-- nothing coming in here. Not sure if that's because control link aint comming up or why that migth happening. Will look into it.
Output rate : 256 bps (1 pps)

 

 

SRX Services Gateway

Re: SRX340 control link wont establish over layer 2 network

‎01-14-2019 08:04 PM

Hi RJ,

 

I can see that node 0 shows Secondary/Lost which I consider weird. If both nodes are not able to see each other I will expect that both of them will go to primary/lost as they both will think that their peer doesnt exist and that they are in charge of the traffic.

 

Cluster ID: 4
Node Priority Status Preempt Manual Monitor-failures

Redundancy group: 0 , Failover count: 0
node0 100 secondary no no None
node1 0 lost n/a n/a n/a

Redundancy group: 1 , Failover count: 0
node0 0 secondary no no None
node1 0 lost n/a n/a n/a

 

Can you confirm it says Secondary/lost? can you share a "show version" and "Show chassis hardware" of each node. Can you reboot node 0 and check if the Secondary/lost situation remains?

 

Pura Vida from Costa Rica - Mark as Resolved if it applies.
Kudos are appreciated too!
SRX Services Gateway

Re: SRX340 control link wont establish over layer 2 network

‎01-14-2019 11:47 PM

The cluster is not in production yet. I configured it with directly connected cables. The cluster looked just fine, as it should.

Since I have ha SRX240 cluster setup in the same way I configured the switches the same way. That is access-ports, separate VLANs for fab and control.

I can see MAC/ARP on the control but not on the fab. Dunno why.

I have rebooted both of them at the same time several times, and one at a time as well. 

After some time without forming the cluster (the other known is set to "lost"), the FPC/PIC = the ethernet ports, is beeing shut down / not visible.

If i disable the control port from the switch side, the hardware looks fine after reboot. So in some way it conflicts when trying to form the control link.

 

node1 jsrpd log, just before and during the reboot:

Jan 14 13:53:52 TCP-S: TCP peer closed connection
Jan 14 13:54:12 PFE Rx client is shutdown (socket id: 22, session 820440)
Jan 14 13:58:21 new PFE Rx client connection established (socket id: 22, session 820440)
Jan 14 13:58:21 Received fabric monitor child status
Jan 14 13:58:21 Received fabrics child link status from PFE
Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_status: lnk_idx:0, link_state(curr:0, new:0)
Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_status: lnk_idx:1, link_state(curr:0, new:0)
Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_status: lnk_idx:2, link_state(curr:0, new:255)
Jan 14 13:58:21 State of lnk-0 of fab1 remains DOWN
Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_status: lnk_idx:3, link_state(curr:0, new:255)
Jan 14 13:58:21 State of lnk-1 of fab1 remains DOWN
Jan 14 13:58:21 HA Fabric Info: After fabric child status is updated
Jan 14 13:58:21 node0: fab0 is Inactive with 0 child (AggId: 0)
Jan 14 13:58:21 node1: fab1 is Active with 2 child (AggId: 131)
Jan 14 13:58:21 link-0: ge-5/0/2 (5/0/2) is Active : ifd_state: Up pfe_stateSmiley Very Happyown secure_state Disabled
Jan 14 13:58:21 link-1: ge-5/0/7 (5/0/7) is Active : ifd_state: Down pfe_stateSmiley Very Happyown secure_state Disabled
Jan 14 13:58:21 Received fabric monitor child secure status
Jan 14 13:58:21 Received fabrics child link secure status from PFE
Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_secure_status: lnk_idx:0, link_secure_state(curr:0, new:0)
Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_secure_status: lnk_idx:1, link_secure_state(curr:0, new:0)
Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_secure_status: lnk_idx:2, link_secure_state(curr:0, new:255)
Jan 14 13:58:21 State of lnk-0 of fab1 remains DISABLED
Jan 14 13:58:21 jsrpd_pfe_fabmon_update_lnk_secure_status: lnk_idx:3, link_secure_state(curr:0, new:255)
Jan 14 13:58:21 State of lnk-1 of fab1 remains DISABLED
Jan 14 13:58:21 HA Fabric Info: After fabric child status is updated
Jan 14 13:58:21 node0: fab0 is Inactive with 0 child (AggId: 0)
Jan 14 13:58:21 node1: fab1 is Active with 2 child (AggId: 131)
Jan 14 13:58:21 link-0: ge-5/0/2 (5/0/2) is Active : ifd_state: Up pfe_stateSmiley Very Happyown secure_state Disabled
Jan 14 13:58:21 link-1: ge-5/0/7 (5/0/7) is Active : ifd_state: Down pfe_stateSmiley Very Happyown secure_state Disabled
Jan 14 13:59:58 TCP-S: accepted client connection.
Jan 14 13:59:58 TCP-S: TCP client from 129.64.0.1/49677 connected
Jan 14 13:59:58 TCP-S: Peer msg: hello:node-1
Jan 14 15:02:32 ISSU state: 0

 

Current status, both control links is link up to the switch and the devices is sending control link msg, but nothing recieved.

node1

Redundancy group: 0 , Failover count: 1
node0 0 lost n/a n/a n/a
node1 1 primary no no None

Redundancy group: 1 , Failover count: 1
node0 0 lost n/a n/a n/a
node1 0 primary no no CS

 

node0

Redundancy group: 0 , Failover count: 0
node0 100 secondary no no None
node1 0 lost n/a n/a n/a

Redundancy group: 1 , Failover count: 0
node0 0 secondary no no None
node1 0 lost n/a n/a n/a

 

[15.1X49-D150.2]

 

node1:
--------------------------------------------------------------------------
Hardware inventory:
Item Version Part number Serial number Description
Chassis CY5117AF1619 SRX340
Routing Engine REV 0x12 650-065043 CY5117AF1619 RE-SRX340
FPC 0 FPC
PIC 0 8xGE,8xGE SFP Base PIC
Power Supply 0

 

...and as I've seen before, one node is shutting down fpc.

node0:
--------------------------------------------------------------------------
Hardware inventory:
Item Version Part number Serial number Description
Chassis SRX340
FPC 0 FPC
Power Supply 0

 

SRX Services Gateway

Re: SRX340 control link wont establish over layer 2 network

‎01-15-2019 08:21 AM

RJ,

 

If you run a "show chassis fpc pic-status" do you see the FPC/PIC offline?

 

Can you disable the cluster and once the nodes bootup in standalone mode, delete the whole configuration and then retry the cluster formation again?

 

     > set chassis cluster disable reboot

 

After bootup:

 

     #delete (type yes)
     #set system root-authentication plain-text-password (set a root password)
     #commit.
     # show (confirm that only the root password is configured)
     # run request chassis cluster cluster-id [value] node [value] reboot

Please log the bootup process and confirm the cluster status on both nodes.

 

Pura Vida from Costa Rica - Mark as Resolved if it applies.
Kudos are appreciated too!
SRX Services Gateway
Solution
Accepted by topic author R_J
‎01-23-2019 12:37 AM

Re: SRX340 control link wont establish over layer 2 network

‎01-23-2019 12:37 AM

I pinpointed the issue to the switches/L2 segment.

igmp-snooping was enabled.

Strange that the SRX240 cluster (running recommended version) was working just fine.

 

I removed igmp-snooping, since we dont use full multicast configuration with rp etc. it's ok if it's flooded like broadcast.

Once it was removed the cluster came up fully working.

Switches, EX4300 has different behavior than EX4200. Not possible to disable igmp on selected VLANs (very bad design by Juniper). So if you want igmp-snooping but not for a few, then you need to configure igmp-snooping when creating new VLANs. Just adding more administrative work/configuration when we should move the other way with less specific config.