#TheRoutingChurn
#TheRoutingChurn
MPLS: LDP and IP Fast Re-route Mechanisms
10.10.16

 

 

Introduction

 

Many service providers deploy LDP to establish transport LSPs in their MPLS network. These LDP transport LSPs are confined to single IGP routing domain. The services that rely on these transport LSPs depend on connectivity. So to speak, these transport LSPs provide connectivity, which is essential for the services. Therefore, any failure in the network results in disruption of the services. Thus service providers need to minimize service disruption during network failures. To minimize service disruption, they need to minimize connectivity disruption. In order to minimize connectivity disruption, they rely on the fast re-route technologies.

 

There are various fast re-route schemes that are available in LDP network. This article compares and contrasts several of them such as LFA, R-LFA, TI-LFA and TI-FRR. It also shows why a particular fast re route scheme is much simpler than others and how it provides topology independent local protection - that is, how it provides local protection in any topology so long as there is an alternate path in the network avoiding the failed network resource.

 

 

Loop-Free Alternates (LFA)

 

LFA fast re-route (backup path) coverage depends on topology. In certain topologies such as ring, the LFA FRR coverage is close to zero.  As it places restrictions on network topology, service providers generally design their network to accommodate LFA fast re-route in their network. Still it is almost difficult to impossible to get hundred percent backup coverage. For instance in figure 1, R1 will not have a backup path to R5 to protect against link R1-R6 failure as the metric from R2 to R5 via R3 is not lesser than metric from R2 to R5 via protected link R1-R6. In order to cover such cases and to improve backup coverage, Remote LFA that is described next was designed.

 

 

Remote LFA  (R-LFA) 

 

As the name suggests, Remote LFA is an extension of LFA. Remote LFA extends LFA backup coverage by tunneling the packet to remote node. When the packet emerges out of the tunnel on the remote node, it follows regular hop-by-hop label switched path to reach the destination. How to find such a remote node to tunnel the packet to ? Consider the topology as in figure 1.

 

rlfa.png

                                                                                Figure 1

 

 

R-LFA needs to find a remote node such that when a packet is tunneled to it, it will not be looped back. It uses the following algorithm to find such a remote node -

 

P-Space : Set of routers that can be reached from source router without traversing the protected link.

Extended P-Space : Union of P-spaces of neighbors of the source router excluding the neighbor over the protected link.

Q-Space : Set of routers from which destination can be reached without traversing the protected link.

PQ node : Set of routers that are common to both P-space (or Extended P-space) and Q-space. 

Remote node is chosen as one of the node from set of PQ nodes - generally a PQ node that is closer to the source than others. 

 

After source R1 determines the PQ node, which is R3 in this case, it uses two hierarchical LDP LSPs to send packet on a backup path to destination D. The first LSP is a LDP LSP from R1 to R3 and the second LSP is a LDP LSP from R3 to R5. As R1 needs to learn LDP label for second LSP advertised by PQ node R3, it establishes an automatic tLDP session to R3 to obtain such a FEC-D - label mapping. Also, as R1 establishes the tLDP session automatically, it can administratively control the number of tLDP sessions that it can initiate. However R3 cannot administratively control the number of tLDP session that it can accept. Further, today R1 and R3 exchange all LDP FEC-Label bindings over the tLDP session. They need to exchange only the necessary FEC-Label bindings over the tLDP session. In order to address these issues, we have proposed a new draft - Application aware targeted LDP - at the IETF. 

 

In the event of link R1-R6 failure, R1 pushes two labels. First - the label to go to destination D as advertised by R3 over a tLDP session, and second - the label to go to PQ node R3. When the packet emerges out of the tunnel at PQ node, it follows regular hop-by-hop label switched path to go to destination D.

 

What if Remote LFA is unable to calculate the remote node (PQ node) ? This can happen if the metric of the links such as R3-R4 or R4-R5 is high compared to other links. Thus, Remote LFA increases the LFA coverage but it is still topology dependent. That is, Remote LFA still does not provide a hundred percent backup coverage Smiley Frustrated

 

So there is need for a solution that will provide hundred percent backup coverage. So lets talk about TI LFA that is designed to do just that.  

 

 

Topology independent LFA  (TI-LFA) 

 

TI LFA is a spring version of Remote LFA. In addition, it also provides explicit repair path. 

TI LFA.png

                                                                                Figure 2

 

 

   

TI LFA calculates the remote node similar to Remote LFA. However as source R1 knows about node SID of R3, which is a PQ node, and R5, which is destination, so it does not need a tLDP session between R1 and remote node R3 to learn node SID of R5. The repair path to reach destination D consists of two segments. The first segment is a node segment from R1 to R3 and second segment is node segment from R3 to R5. 

 

In the event of link R1-R6 failure, R1 (source) pushes two labels. The first label is node SID of R5, and second label is a node SID of R3. Once the traffic emerges out at the PQ node R3, it follows regular shortest path forwarding to reach destination D. What if the metric of link between R3-R4 is high compared to other links ? To address this issue, TI-LFA also provides explicit repair path.

 

TI-LFA Explicit Repair Path

 

 

TI LFA ERP.png 

                                                                                 Figure 3

 

If the metric of the link between R3-R4 is high compared to other links, then there will be no PQ node common to the P-space and Q-space. In such a scenario, TI LFA uses three segments to construct a repair path. The first segment is a node segment from R1 to R3, the second segment is a adjacency segment from R3 to R4 (that is, from P node to the Q node), and the third segment is a node segment from R4 to R5. When the traffic emerges out of adjacency segment on R4, it follows regular shortest path using node SID of R5 to reach destination. 

 

What if the metric of the link between R4-R5 is also higher compared to other links ? Then when the traffic emerges out on R4, it will be looped back to R1. In such a scenario, TI LFA uses four segment to constructs a repair path. The first two segments are same as above. The third segment is an adjacency segment from R4 to R5 and fourth segment is a node segment of R5. Therefore in summary, if the metrics are not very helpful, TI-LFA repair path consists of as many segments as the number of hops from point of local repair to the destination. Hence R1's hardware should be capable enough to push as many as n labels to construct TI LFA repair path. The n could be 5, 10, 20 etc. depending on the topology. Today not many routers hardware will be able to support that requirement. So we are back to a situation in which service provider may not get hundred percent backup coverage in a multi-vendor environment with TI LFASmiley Frustrated

 

We have seen many versions of LFA and yet unable to get hundred percent backup coverage.  Perhaps its a time to step back and design a simpler solution that will provide hundred percent backup coverage by pushing just one extra label. TI FRR, which we demoed at MPLS world congress, is one such solution. 

 

 

Topology independent Fast Re-route (TI-FRR) 

 

TI FRR provides topology independent local protection using RSVP-TE auto-bypass LSPs. 

 

 

 

 TI FRR1.png

                                                                                Figure 4

 

 

 

 

 

Consider a LDP LSP from R1 to R5. Further suppose we want to protect the node R6. With respect to this LSP, R1 is a point of local repair (PLR) and R5, which is a next nexthop for the PLR, is the merge point (MP). The PLR can determine the next nexthop as a by-product of interior gateway protocols (IGPs) shortest path calculations. No additional SPFs are required. If we contrast this with LFA and R-LFA , they run multiple SPFs, which could potentially delay convergence, and still do not provide hundred percent backup coverage. In the event of node R6 failure, R1 begins forwarding traffic on a green RSVP-TE auto bypass LSP to R5 avoiding the node R6. The bypass LSP is signaled and programmed prior to the failure. The same bypass LSP is used to protect all the LDP LSPs that traverse point of local repair (R1), protected node (R6), and merge point (R5). This mechanism is guaranteed to provide repair path with just one extra label. The metric of links does not affect the repair path as the auto bypass LSP is signaled using CSPF (constrained shortest path first).

 

So what it will take to enable TI FRR in LDP networks ? just a single line configuration. Everything else what is necessary to signal RSVP-TE auto bypass LSP and build the repair path is done automatically under the cover.  So it is very easy for the service provider to deploy this solution. As before, TI FRR with manual configuration is also supported. The RSVP-TE auto bypass LSPs can be traffic engineered using variety of constraints. It places no restriction of network topology - that is, it provides topology independent local protection. If we contrast it with LFA, R-LFA, they places certain restrictions on network topology and hence do not provide hundred percent backup coverage. Additionally, the provisioning and configuration required is fairly small. Extra bonus - both unicast and multicast LDP uses same mechanism to provide fast re-route. 

 

Granted that the packet may take a longer path in some scenarios with TI FRR. However this will occur only during fast re-route duration, which is supposed to be very short. If not, we need to worry about other problems in the network and not fast re-route. After global convergence, the packet should take the shortest path to reach the destination.  

 

In conclusion, various flavors of LFA do not provide hundred percent backup coverage for the reasons that are described in this article. Service provider should not design their network in order to accommodate a particular fast re-route scheme such as LFA. It should be the other way round - a fast re-route scheme should be able to accommodate any topology that the service provider designs. And TI FRR provides repair path in any topology. 

 

How TI FRR works in intra-area, inter-area, etc. ? We will cover those topics in the next  #TheRoutingChurn posts. In the interim, If you have questions or feedbacks on the fast re-route mechanisms, Please let us know via comments either here or twitter #TheRoutingChurn !

 

 

 

 

References

(1) LFA [Configuration]

(2) R-LFA [Configuration]

(3) TI-LFA [Demo available]

(4) TI-FRR [Demo available][Manual configuration]

12.15.15
Recognized Expert Recognized Expert

Great stuff Santosh, very useful and complete!

Ato

12.20.15
chinar.trivedi

Indeed awesome. This explanation of basis of #rLFA and PQ Node algorithm was needed from long time on the web.

Thanks Santosh for a very neat explanation of rLFA and other LDP Convergence mechanisms..

 

Cheers,

Chinar

 

12.21.15
nsmagt

Hi Santosh,

 

Thanks for the overview. Couple of comments:

 

The n could be 5, 10, 20 etc. depending on the topology. Today not many routers hardware will be able to support that requirement. So we are back to a situation in which service provider may not get hundred percent backup coverage in a multi-vendor environment with TI LFASmiley Frustrated

 

 

- How many carrier networks have you seen with 10 or 20 hops in any path? I've seen none, and I've seen quite a few carrier networks. I would argue that network architectures with 10+ hops (even upon contingency) are broken.

- Carrier-grade silicon such as Trio has flexible pipelines that allow for large label stacks (large enough for any non-silly network architecture). Carrier networks are built with these silicon architectures. They support large enough label stacks for 100% coverage with TI-LFA (unless there's software limits that need to be removed first)

- The only place where TI-LFA coverage may be less than 100% is where merchant silicon is used. Fixed pipelines tend to support just 3 labels today. This is likely to change in the future though, and anyway TI-LFA is not really designed for data center architectures (where merchant silicon makes sense today).

 

Cheers,

Nicolai

 

01.04.16
Juniper Employee

Hello Nicolai,

 

Thanks for the information! Please see inline for some answers -

(1) The article aims to describe fast re-route not only for present but also for future network architectures 

      in terms of number of hops.

(2) If the network has no more than 10 hops and only carrier-grade silicon such as Trio, TI-LFA may fit well.

(3) Right. The aim is to describe fast re-route for most of the network architectures.

 

Cheers

Santosh

07.02.16
chinar.trivedi

Hi Santosh,  @Ato 

In RLFA's case, in your example, I have always wondered from Source's perspective like in your eg, R1, why only R2, R3 and R4 arein P-Spsace and how and WHY NOT R5 and R6? 

From R1's perspective which is Source, shouldn't all the nodes which are not traversing the Link under Protection (R1-R6) , be in P space.

 

Same thing for Q Space  from Destination's (R5) perspective, how and why only 3 nodes in Q space which are R6, R4 and R3? why not all other nodes too?

 

How and when does the P-Space/Q-space node calculation stop?

 

 

Top Kudoed Authors