02-19-2009 01:14 AM
we currently have a small mpls network that we wanted to expand to some remote sites. In order to cut down the costs, we bought some "layer 2" ethernet services from an ethernet carrier.
For the site to site interconnection we are using IQ2 Gigabit Ethernet on M20s.
For the LSP Signalling we are using RSVP (with no strict paths, just link protection).
For the routing protocol we use ospf
The MPLS over carrier ethernet L2Circuits seems to work fine (we had to tune the MTUs) up to now.
Now my question is, since we are planning to migrate some "voice" and signalling over this new links, w
what are the best protocols in order to dedect failures of the carrier provider at both link level or circuit lever (i.e. a VLAN may be down or misconfiguration in carriers network or in their mpls etc)?
I came accross the following up to now, but I am not sure wich one (or what combination) will cover me fully.
- bfd over ospf
Let me add that the carrier service provider does not currently support any OAMs, so i need to do a CPE to CPE solution.
04-26-2009 05:08 PM
This is a common question for organizations building MPLS cores across non-native environments such as DWDM services, packet-based l2 links, etc.
The real goal of all these protocols is to provide in-band bidirectional monitoring of connectivity with a very short interval timer (typically 150ms or less). BFD and Ethernet OAM (802.3ag/ah) can all be used to accomplish this task. Typically each protocol is used to detect byzantine failures on a link where connectivity is interrupted but there is no physical link-down event to trigger a reconvergence.
There a couple points worth considering:
1) BFD is an IP protocol, and typically operates with a routing protocol as a client. So, it provides a fast-hello path for routing protocols like OSPF and IS-IS. Because it is a layer 3 IP packet it can traverse any encapsulation valid for IP. So it doesn't care if the link is ethernet, serial etc.
2) Ethernet OAM uses a specific L2 frame format on ethernet links, and is intended for either single-segment (ah) or multiple-segment (ag). Both ah/ag are also newer protocols than BFD, which can affect details of how they're processed in various architectures. Also, because they are L2 PDUs, they can be affected by layer 2 devices in the transit path. I have found a couple scenarios where legacy ethernet switches can inadvertently "eat" ethernet OAM PDUs.
Most important in the selection of these protocols is how you want them to affect your failover.
- If you're running a standard IP routed network, BFD is typically used to quickly detect failures between OSPF neighbors. Because it is significantly more lightweight than processing OSPF hellos, it's a much better alternative to OSPF fast-hellos.
- If you are running an MPLS core, BFD sessions between neighbors can help speed convergence on soft-failures of transit links. But, BFD will typically signal the IGP to set neighbor-down, which triggers an OSPF/IS-IS reconvergence. This is typically a slower event that won't trigger detour/bypass LSPs when using fast-reroute.
- Ethernet OAM, because it operates at L2 and isn't tied to the dynamic routing protocol will typically trigger an interface-down (IFD event in JUNOS), which is treated by the operating system as a link-down not a connectivity loss. The avantage here is that in an MPLS fast-reroute environment, this can trigger a local-repair to a bypass LSP, thus speeding convergence.
Note: in JUNOS, 802.3ah is called "link-fault-management" whereas 802.3ag is called "connectivity-fault-management". Both can be configured under:
[ edit protocols oam ethernet ]
Also, because both BFD and Ethernet OAM are extremely fast hello protocols, they can generate a large volume of "hello" traffic when implemented on multiple links/partners. It's critical that the processing of these Hello packets is distributed within the architecture of the router. In JUNOS, you can enable this by adding the following config:
# set routing-options ppm delegate-processing
When using bfd/oam on a moderate number of peers, you can easily achieve <450ms of real-world failure detection intervals using an interval of 150ms and a multiplier of 3. The interval can also be decreased as low as 50ms, but the safety of this tight setting depends on the specific hardware implementation and the number of peers being processed.
So, in summary the real answer is: it depends. Both protocols are well suited to solving a soft-link-failure detection problem in your environment. The choice of which protocol to use is up to you. Many large customers are using BFD in their core environments. Ethernet OAM is equally attractive -- particularly in metro ethernet environments. I would recommend starting with BFD and investigating the behavior of 802.3ah and ag in your L2 circuit environments.
JNCIE-ER #6 / JNCIE-M #265 / JNCI
02-24-2010 03:12 AM
I would like to know if we can use BFD for link falt management, especially when aggregate link is used. I found problems with OAM over SDH Circuit; I dont know if it because of the layer 2 circuit or junos kernel since there is PR and cases opened with JTAC related to OAM. at any case I can not sniff any OAM PDU while I can do it in lab with directely connect fastethernet link.
I thought to use BFD and cancel the aggregate links but it need to check load balancing over released links even if they are crossed by LSPs.
JNCIE-M&T(#526) /CCNP certified /JNCIS-ER