I have recently (over the last 6 months) replaced our estate of Netscreen and SSG devices with SRXs. Most employ a VPN back to the 'hub'. The aforementioned devices only supported VPN tunnels with a maximum MTU of 1500. When the SRXs went in, a third party Juniper consultant advised that this limitation no longer applied, i.e. we could use the default MTU for the tunnels - the maximum for a jumbo frame of 9192. Sure enough all of our new tunnels have been happily functioning with this value. However, 2 of said connections have recently become stuck in the Exchange state. JTAC got involved, and for whatever reason OSPF will now only function if the tunnel carries an MTU of 1388 (over a VDSL link). JTAC could offer no explanation as to why this is now the case.
The article above, may or may not be relevant in this instance, but if it is, I fear each site will be lost one by one. However, I do not want to needlessly and significantly lower the MTU value of all tunnels. The 1388 value above was merely arrived at by trial and error.
Do you mean the SRXs are directly connected? No ISP devices in between? Please confirm What was the ospf route size in hub/spoke device? Was there any increase in routes, specifically external routes? When the number of routes increases, ospf DB and LS Update packet will use full packet size (1500 based on exit interface mtu size). Do you set df-bit in vpn config?
If any of the intermediate device (ISP devices) do not support 1500 mtu, that device will send fragmentation needed message to SRX. if df-bit is set in vpn, srx will not honor fragmentation needed message and will keep on sending the DB and LS update packet with full packet size and ospf neighbor state will be stuck in Exchange state. I believe in your case, initially it was working because the number of routes was less and recently the number of routes increased and whichever tunnel is flapped, got affected
You can test this theory by changing the df-bit settings.
show security ipsec vpn vpn-name <vpn-name> detail set security ipsec vpn <vpn-name> df-bit clear
If you are facing ospf stuck issue again, ping the peer ip with different size and find out the maximum mtu supported by the ISP and other SRX. If all the devices support 1500 mtu, you should be able to ping using 1472 (or 1464 on dsl ; 8 DSL header + 8 icmp header + 20 IP header) packet size. If not, fix the mtu issue.
ping <remote peer ip> size 1472 do-not-fragment
Thanks, Nellikka JNCIE x3 (SEC #321; SP #2839; ENT #790) Please Mark My Solution Accepted if it Helped, Kudos are Appreciated too!!!