Optimizing EVPN for Virtual Machine Mobility over the WAN
Dec 6, 2013
Organizations need to insure that their applications are available and performing. Server virtualization helps by enabling virtual machine mobility. If a server is overworked or will be unavailable vMotion can be used to migrate live workloads to another server in the current data center or in another data center. This requires that the addressing including the MAC, IP address and VLAN ID remain the same so that sessions are not dropped when the VM move happens. This is done by extending the L2 domain to the new location, know as Layer 2 stretch. Within a subnet this is easy to do. Across subnets in the data center it becomes more difficult. Doing live migration over the WAN introduces considerable challenges. Juniper has introduced a number of technologies to make virtual machine live migration possible.
The challenge with VM mobility is how to do the Layer 2 stretch in a way that ensures that the VM can be reached after it is moved. There are a number of issues that need to be dealt with. The MAC and IP address no longer pinned to a site or to an interface as they have moved with the VM. You need fast convergence of network paths as VM moves so that traffic will reach it quickly. You need ingress and egress traffic convergence and optimization to avoid having traffic go through the former default gateway after the VM has moved. You need learning of the effects of the live motion event and information distribution control so that the network isn’t impacted by signaling traffic. You need proper L2 & L3 interaction so that everything happens in a timely manner to ensure the best experience for the users of the applications that are affected by the VM move. VPLS has been the traditional methods of doing this, and now Juniper is supporting EVPN to provide enhancements to the solution.
Virtual Private LAN Services
Juniper has supported VPLS for VM mobility on the MX Series routers for some time. VPLS provides VLAN Extension over a shared IP/MPLS network with segmentation provided by separate VPLS instances per VLAN. It can support a full mesh of any-to-any connectivity regardless of the physical path. VPLS can support a split subnet and be provisioned to advertise members when the IP address has moved to a new location. VPLS can deal with ingress and egress L2 requirements automatically and be provisioned to support ingress optimization at L3 leveraging VRRP. L2 data plane based learning and information control is handled without advertisement and for L3 learning BGP policies are used. It supports forwarding with multicast & point-to-multipoint capabilities. It can be provisioned for new site auto discovery and multicast, broadcast and flooding control. For availability it relies on the underlying MPLS capabilities and ECMP and Fast Reroute.
VPLS is well suited to providing network segmentation and while VPLS has served the purpose of enabling VM mobility as well there are some issues that needed to be addressed as the scale of deployments grow and to enhance the usability of the L2 extension technology. With VPLS unknown Unicast MACs are always flooded. It doesn’t support active-active multi-homing and only supports active-standby multi-homing. Provisioning can be complex. Network re-convergence on failure can be slow dependent on number of MAC in the broadcast domain. Multicast can only use P2MP LSPs, and it has no provision for MP2MP LSPs to provide for optimization. There are also issues with VLAN-aware bundling services and translation and there is no provision for administrative control on MAC routes. As a result there is a need for additional flexible VPN topologies and policies and that is where EVPN comes in.
Ethernet Virtual Private Networks
To respond to the need for a more robust implementation for L2 extension Juniper is supporting a new layer-2 VPN protocol on the MX Series routers called EVPN or Ethernet VPN. EVPN delivers the capabilities of VPLS and provides a number of enhancements. EVPN delivers multi-point connectivity among Ethernet LAN sites across an MPLS backbone. EVPN is a similar technology to VPLS in the forwarding plane but adds the capability to use BGP control plane driven MAC address learning. This provides better control over MAC learning to avoid flooding of the network with signaling traffic. ARP flooding is minimized by additional attributes being added during MAC advertisement. It increases the scale of MAC addresses and VLANs that can be supported. EVPN has the ability to support load-balancing over multiple active paths with multi-homing. L3 egress traffic forwarding is optimized by usage of the default gateway extended community capability. BGP capabilities such as constrained distribution, route reflectors, and inter-AS are reused to provide better convergence in the event of network failures. EVPN allows hosts to relocate within the same subnet without requiring renumbering. With these enhancements EVPN simplifies operations and enables more flexible L2 extension topologies.
Fixing the Trombone Effect with VMTO
While EVPN provides a number of useful improvements over VPLS there is still one issue that needs to be fully resolved. This is what is call the routing trombone effect or the issue with ingress and egress routing looping through the former default gateway, which is now invalid due to the VM move. To solve this Juniper has implemented Virtual Machine Traffic Optimization (VMTO) on the MX Series routers.
Let’s look at the case of egress traffic without VMTO. Let’s say that a virtual server in a data center needs to send packets to server in another data center. The problem is that the server’s active default gateway for its registered VLAN is in yet another data center due to the VM having moved. The effect is that the traffic must travel from the first data center to the second data center to reach the VLAN’s active default gateway. The packet must reach the default gateway in order to be routed towards destination data center. This results in duplicate traffic on WAN links and suboptimal routing, hence the “Egress Trombone Effect”. With VMTO on the MX Series this situation is fixed and the traffic is optimized. The solution is to virtualize and distribute the default gateway so that it is active on every router that participates in the VLAN. The effect is that egress packets can be sent to any router on the VLAN in question allowing the routing to be done in the local datacenter. This eliminates the “Egress Trombone Effect” and creates the most optimal forwarding path for the Inter-DC traffic.
For ingress routing the situation is similar. When a server in one data center needs to send packets to a server in another data center the problem is that first edge router prefers the path to another data center for the subnet as it has no knowledge of host IPs new location due to the VM move. The effect is that traffic from the server is first routed across the WAN to the original data center due to a lower cost route for the subnet in question. Then the edge router that data center will send the packet to the destination data center. With VMTO this is fixed and traffic is optimized and the trombone effect is eliminated. The way this happens is that in addition to sending a summary route of the subnet in question the data center edge routers the MX also sends host routes which represent the location of local servers. The effect is that ingress traffic destined for the servers is sent directly across the WAN from the first data center to destination data center. This eliminates the “Ingress Trombone Effect” and creates the most optimal forwarding path for the Inter-DC traffic.