Routing
Highlighted
Routing

iBGP flap but IGP didn't

‎05-12-2020 02:43 AM

Can anyone think of a reason why BGP would flap on a particular router but IGP indicates no issues detected. So just to add some context we have tow MX5 routers in a DC and two MX5 in at a remote location. We are making use of two 3rd party service providers which provided us with L2 services to the remote location. So DC1 router will have a direct connection to remote router1 via ISP1 and DC router2 will have a direct connection to remote router 2 via ISP2, the two remote routers will have a direct connect between them as they are in the same cabinet and the same for the DC routers. We are running OSPF between all links and with have iBGP running between them. So, both routers at the remote location have BGP establishment to the two upstream routers. So what i have started to notice is the service via ISP1 start to deteriorate but it doesn’t go down completely so that OSPF cant reconverge, there is just some packet drops and jitter for a few seconds then it recovers, few minutes later the same and so on it repeats. What i have experienced that during these degrading iBGP will flap even when IGP reports no underlying issues. My question is now why this would happen. I understand the normal timeout for BGP is 180 for OSPF 30 but why would service degrading bring down iBGP but not the IGP. This would have been all working great if the service went down completely so that IGP  can now send the BGP sessions via the other link and remain up. I know there are optimizations that can be done but my question is more around why would iBGP fail in this scenario.

5 REPLIES 5
Highlighted
Routing

Re: iBGP flap but IGP didn't

[ Edited ]
‎05-12-2020 02:49 AM

Hi MFB,

 

Could you paste logs from the timestamp when you noticed an iBGP flap on the device most recently?

 

Meanwhile, you can also refer the below document for Troubleshooting BGP sessions that could be of great help to you-

https://www.juniper.net/documentation/en_US/junos/topics/topic-map/troubleshooting-bgp-sessions.html...

 

Hope this helps 🙂

 

Please mark this "Accepted Solution" in case the above document helps you solve your query.

Kudos are much appreciated too 🙂

 

 

Highlighted
Routing

Re: iBGP flap but IGP didn't

‎05-12-2020 05:00 AM

Hi MFB,

 

To answer your question in simple terms, it is actually dependant on how the paths are converged in a path vector protocol like BGP. 

 

Just like any distance-vector/path-vector protocol, all the incoming routing updates are accepted into BGP routing update process and advertises to its peers only the best routes selected. BGP does not use periodic updates, hence route invalidation is not actually based on expiry of any kind of soft state information (for instance prefix-related timers like in RIP). Rather, BGP uses explicit withdrawal section in the triggered UPDATE message to signal neighbours of the loss of a particular path. In addition to the explicit withdrawals, implicit signalling is also supported by BGP where new information for the same prefix from the same peer replaces the previously learned information.

 

Well, some factors that add to delay are:

1. When a route becomes invalid, the BGP has to go through the selection of best route again.

2. Let's say a particular route vanishes soon after it is advertised;  it may take as long as "Number-of-Hops x Advertisement-Interval" to reach the node.

 

In short, to select the next best path BGP takes around the same time it would need to re-learn.

 

On the other hand, IGP's are more dynamic in nature the way they propagate changes across the network.

 

Now to improve failure detection:

1. Allowing BGP speakers to use multiple paths at the same time which can alleviate the load to an extent.

2. Using BFD for BGP to fine-tune further.

 

I hope I was able to give you an idea over this post, if yes please mark this as "accepted solution" for the benifit of other fellow enthusiasts and give that kudos a hit.

 

//Nex

Highlighted
Routing

Re: iBGP flap but IGP didn't

‎05-12-2020 05:15 AM

Appriciate for the feedback. The issue is more related to client to server TCP connections between BGP speakers and not BGP routing in general. When there is deterration on one of the links where this TCP session is runnning on between speakers BGP flaps however IGP stays up unaware of the underlying issues. So i would like to know why iBGP will fail before IGP. Is there such option of daul homed BGP to the same peer over defrent paths?

Highlighted
Routing

Re: iBGP flap but IGP didn't

‎05-12-2020 05:21 AM

Hello,

2 possible causes I can think of:

1/ are Your MX5 running full tables, maybe even with Netflow turned on? MX5 has slow RE CPU and is memory-restricted (2GB)  so even 1 full table at 800K prefixes is a stretch for this platform nowadays. So if the answer is yes, please look at MX5 CPU and memory utilization during these events

2/ do You have Trio DDOS log messages popping up during these events? It could be that there is a transient L2 loop in the provider network that bombards Your MX5 with all sorts of L2 crap which may trigger MX5 Trio DDOS but Trio DDOS with default settings is not protective enough. 

HTH

Thx

Alex  

 

_____________________________________________________________________

Please ask Your Juniper account team about Juniper Professional Services offerings.
Juniper PS can design, test & build the network/part of the network as per Your requirements

+++++++++++++++++++++++++++++++++++++++++++++

Accept as Solution = cool !
Accept as Solution+Kudo = You are a Star !
Highlighted
Routing

Re: iBGP flap but IGP didn't

‎05-17-2020 07:24 AM

After deeper investigation according to packet captures it seemed like unicast and broadcast traffic were been dropped where as multicast went through. I picked this up during an outage when i cleared the ARP table on both ends of the link, they didn't populate again however OSPF came back up. Doing a PCAP it showed the same. Due to the default and basic setup of OSPF this allowed the protocal to stay up as updates etc. went through on multicast, however LDP and BGP running on TCP broke therefore causing an outage on MPLS services. So will be reporting this to the service provider but we will also be implimenting counter measures now. Thanks for your feedback.

Feedback