SRX Services Gateway
SRX Services Gateway

Do I have an MTU-VPN-OSPF ticking time bomb?

[ Edited ]
‎10-26-2018 02:58 AM

To start with, here is my point of reference:

 

http://networkingbodges.blogspot.com/2015/07/ospf-stuck-in-exchange-exstart.html - in particular the second paragraph under sub-heading 'Papering Over the Cracks'

 

I have recently (over the last 6 months) replaced our estate of Netscreen and SSG devices with SRXs. Most employ a VPN back to the 'hub'. The aforementioned devices only supported VPN tunnels with a maximum MTU of 1500. When the SRXs went in, a third party Juniper consultant advised that this limitation no longer applied, i.e. we could use the default MTU for the tunnels  - the maximum for a jumbo frame of 9192. Sure enough all of our new tunnels have been happily functioning with this value. However, 2 of said connections have recently become stuck in the Exchange state. JTAC got involved, and for whatever reason OSPF will now only function if the tunnel carries an MTU of 1388 (over a VDSL link). JTAC could offer no explanation as to why this is now the case.

 

The article above, may or may not be relevant in this instance, but if it is, I fear each site will be lost one by one. However, I do not want to needlessly and significantly lower the MTU value of all tunnels. The 1388 value above was merely arrived at by trial and error.

 

Can anyone help me avoid a bit of a disaster?

8 REPLIES 8
SRX Services Gateway

Re: Do I have an MTU-VPN-OSPF ticking time bomb?

‎10-30-2018 12:32 AM

How about any advice or suggestions?

SRX Services Gateway

Re: Do I have an MTU-VPN-OSPF ticking time bomb?

[ Edited ]
‎11-04-2018 11:44 PM

spuluka, , - I thought you might have some ideas?

SRX Services Gateway

Re: Do I have an MTU-VPN-OSPF ticking time bomb?

‎11-07-2018 01:58 AM

I have now lost a 3rd site to the same issue and only 'resolved' with the same fix i.e. an MTU of 1388.

SRX Services Gateway

Re: Do I have an MTU-VPN-OSPF ticking time bomb?

‎11-07-2018 02:56 AM

Hi,

Here is the rough calculation:
Physical MTU of the interface: 1500
DSL (PPPoE) header =8 bytes
ESP header (ESP tunnel mode header + ESP header + ESP IV + ESP Trailer) = 60 bytes
IP header = 20 bytes
OSPF shared header size = 24 bytes

1500 - (8+60+20+24) = 1388 bytes

 

Please go through this article whcih will help you to understand ip/esp overhead https://packetpushers.net/ipsec-bandwidth-overhead-using-aes/

 

Thanks,
Nellikka
JNCIE x3 (SEC #321; SP #2839; ENT #790)
Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too!!!
SRX Services Gateway

Re: Do I have an MTU-VPN-OSPF ticking time bomb?

‎11-07-2018 05:00 AM

So why initially was I able to use a jumbo frame MTU of 9192?

SRX Services Gateway

Re: Do I have an MTU-VPN-OSPF ticking time bomb?

‎11-07-2018 05:05 AM

What is the MTU of physical (exit) interface at both sides? is it configured for jumbo frames and do the intermediate devices support jumbo fromes?

Thanks,
Nellikka
JNCIE x3 (SEC #321; SP #2839; ENT #790)
Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too!!!
SRX Services Gateway

Re: Do I have an MTU-VPN-OSPF ticking time bomb?

‎11-07-2018 06:09 AM

Physical interfaces are set to 1500.

 

I did not specify a jumbo frame MTU of 9192, I simply didn't define one, which means the default is used, which apparently is 9192.

 

There are no intermediate devices, it's SRX-SRX (Hub to Spoke VPN).

SRX Services Gateway

Re: Do I have an MTU-VPN-OSPF ticking time bomb?

‎11-07-2018 11:21 AM

Do you mean the SRXs are directly connected? No ISP devices in between? Please confirm
What was the ospf route size in hub/spoke device? Was there any increase in routes, specifically external routes?
When the number of routes increases, ospf DB and LS Update packet will use full packet size (1500 based on exit interface mtu size).
Do you set df-bit in vpn config?

If any of the intermediate device (ISP devices) do not support 1500 mtu, that device will send fragmentation needed message to SRX.
if df-bit is set in vpn, srx will not honor fragmentation needed message and will keep on sending the DB and LS update packet with full packet size and ospf neighbor state will be stuck in Exchange state.
I believe in your case, initially it was working because the number of routes was less and recently the number of routes increased and whichever tunnel is flapped, got affected

You can test this theory by changing the df-bit settings.

show security ipsec vpn vpn-name <vpn-name> detail
set security ipsec vpn <vpn-name> df-bit clear

If you are facing ospf stuck issue again, ping the peer ip with different size and find out the maximum mtu supported by the ISP and other SRX.
If all the devices support 1500 mtu, you should be able to ping using 1472 (or 1464 on dsl ; 8 DSL header + 8 icmp header + 20 IP header) packet size. If not, fix the mtu issue.

ping <remote peer ip> size 1472 do-not-fragment

Thanks,
Nellikka
JNCIE x3 (SEC #321; SP #2839; ENT #790)
Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too!!!