We are experiencing a weird problem with our HA configurations. The nodes are just installed and configured with basic HA configuration. The problem is the node tranists to disabled state after missing hearbeats. The nodes are connected back to back and we have tried chaning SFP, Cables and even both nodes but the problem persists. Please note that a similar pair is working fine in another location with same software and hardware.
We did upgrade the software to the latest release as recommended by JTAC, but the issue is still same. The case is now pening with ATAC and all the related logs have been provided.
Please let me know if anyone of you have faced a similar situation and what can be the solution. For Juniper Employees, the case number is
May 23 21:14:04 Successfully sent jnxJsChClusterIntfTrap trap with severity minor to inform that Control link - em0 state changed from UP to DOWN on cluster 1; reason: missed heartbeats May 23 21:14:07 missed heartbeats on control link between 25 to 33
We were able to figure out what the issue was. But first a little back story.
We had to deactivate the fxp management addresses because of an asymmetric issue and the fact that in Junos 15 doesnt support putting management interfaces in a routing instance. Monitoring devices in the trust zone were taking the management route via the fxp interface to the device but the reply was coming back in the untrust. This created polling issues.
The problem turned out to be L3 broadcast traffic from the management network was routing back out the untrust interfaces even with no IP active on the fxp interfaces. Pinging 1 time to the broadcast ip in the management network created a ~6000x amplification on the fxp interfaces and a ~3300x amplification for the untrust interfaces. It was routing L3 broadcast packets coming in the fxp interface with no ip address on it. These amplifications were crushing our RE when it happened. This is where the missed heartbeat messages were coming from.
There are four solutions to this.
1. Shutdown the fxp interface
2. Enabled an IP in that management network on the fxp interface or remove the family inet from the interface
3. Put production traffic in a routing instance and leave the fxp interface in default
4. Run Junos 17 and put the management interfaces in a routing instance