With the increasing deployments of public-cloud data-centers, there is a need to add security measures that will allow virtual public cloud providers to limit malicious traffic from compromised servers in the data-centers. One possible attack is traffic injection into an arbitrary customer virtual private network (VPN) from a compromised server through the Gateway router. This blog describes a method to implement anti-spoofing checks on overlay tunnels to ensure that legitimate sources are injecting traffic via their designated tunnels.
In order to understand the problem in detail, refer to Figure-1, which depicts a hyper-scale data-center, with a server farm (Server-1 to Server-8) hosting virtual machines (VM-1 to VM-32), where customers can run their applications. The servers communicate to the data-center edge Gateway router using overlay tunnels (T1 to T8) over the data-center fabric. The Gateway router is scaled up to have a VPN context for all of the cloud customers (Red, Yellow, Blue, Pink, Brown) and a bi-directional tunnel (T1 to T8) to each server. Each of the customers has several VMs spread across the servers, which are color coded to match with the customer.
An SDN controller orchestrates the network by assigning IP prefixes for customer VMs, and publishing these prefixes to the Gateway. In addition, the SDN controller manages label allocation and provisions hypervisor software forwarding state with these labels and tunnels to reach other endpoints within the data center. Customer VMs cannot normally manipulate these labels and under normal conditions cannot change the forwarding state of the hypervisor.
The Gateway resolves the VM prefixes on dynamically created IP tunnels to each server, and populates these prefixes into the appropriate customer VRFs. This architecture facilitates a secure L3-VPN connectivity for each customer within the public-cloud infrastructure. The customer VMs within the data-center reach each other and devices across the Internet using the overlay tunnels that terminate and originate on the Gateway.
Note that there may be direct tunnels between the servers within the data center, however this document focuses on the secure communication through the gateway, therefore only paths through the gateway are considered.
Figure 1 : A Hyper-Scale Public Cloud Data Center
Using Figure-1 as reference, for example, customer Blue has VM-13 and VM-20 that need to communicate. To reach VM-20, Server-4 hypervisor software qualifies VM-13 data with a VPN label ‘Blue’ and uses the overlay tunnel T4 that originates on Server-4 and terminates on the Gateway. The Gateway uses the VPN label ‘Blue’ within the packet to figure out that it belongs to customer Blue, looks-up that VM-20 can be reached via tunnel T5, encapsulates the data with VPN label ‘Blue’ and tunnel T5 to reach Server-5 and then to VM-20. Figure-2 shows the packets from VM-13 towards VM-20, as seen by the Gateway.
Figure-2 : Legitimate East-West Traffic, received by the Gateway
Similarly, VM-20 can reach VM-13 via tunnel T5 to the Gateway and then to Server-4 via tunnel T4. Note that tunnels T4 and T5 are shared by all the customers’ VMs within Server-4 and Server-5 respectively; it is the VPN label that identifies the customer and provides the data segregation on the shared resources. Traffic isolation between the VMs is managed by the hypervisor.
As another example, customer Red from across the Internet can reach its VM-1 through the Gateway router. The Gateway stamps the packet with a VPN label ‘Red’ and uses tunnel T1 to reach Server-1; the packet is de-multiplexed using VPN label ‘Red’ and the VM-1’s IP address. These packets are depicted in Figure-3.
Figure-3 : Legitimate North-South Traffic, received by the Gateway
In an ideal scenario, all the customers can co-exist securely in the shared cloud environment. However, there could be malicious users that can compromise hypervisor software and breach the security of the private networks.
VPN Label Spoofing
In this scenario, a misbehaving customer application compromises the hypervisor software and starts spoofing the VPN labels and leaks traffic to arbitrary VPNs through the Gateway. For example, customer Pink has VM-23 on Server-6, which spoofs the VPN label ‘Brown’ and injects traffic towards the Gateway via tunnel T6. The Gateway uses the Brown label to switch context to the Brown VPN and leaks the packet to VM-31, as shown in Figure-4. This violates the VPN property and puts customer Brown at a risk of exposing private and sensitive data to a malicious customer Pink.
Figure-4 : Malicious Data with Spoofed VPN Label
IP Address Spoofing
In a virtualized multi-tenant network, VMs may be moved and provisioned dynamically in the data-center. A malicious application may use stale or transient VPN prefixes to leak traffic into un-suspecting VPNs. Let’s say VM-6 belonging to customer Brown has terminated gracefully, however the IP address and label are used by an attacker to inject traffic into the VPN, as shown in Figure-5.
Figure-5 : Malicious Data with Spoofed IP
In this architecture, the transport tunnels and the customer VPNs are completely independent and one transport tunnel can be used to carry several customers residing on the tunnel end-point server. Hence the Gateway needs to rely on the VPN label in the data packet to switch context to a VPN and forward the inner IP packet. This means any spurious customer can spoof and leak data into another customer’s VPN via the Gateway.
The solution is an anti-spoofing check implemented on the Gateway, at the point of tunnel termination, also referred to as tunnel downstream direction. The anti-spoofing check ensures that the source IP address in the customer data packet is legitimate and is reachable via the same tunnel in the upstream direction or encapsulation point.
Normally, when the Gateway receives a tunneled packet, it decapsulates the tunnel header, uses the VPN label in the packet to switch context to the VPN, and does a destination IP lookup of the inner header in the VPN’s VRF to forward the packet. To implement anti-spoofing, the Gateway will do a source IP lookup as well, and drops the packet if the source does not exist or if the source is not reachable via the same tunnel as the packet came in on. The algorithm used by the Gateway is as shown in Figure-6.
Figure-6 : Anti-Spoofing Check Algorithm
Note that anti-spoofing check algorithm does not prevent malicious communication attempts between VMs hosted on the same server. This is normal, as one VM can spoof labels only if hypervisor software is compromised and if it is compromised it can probably reach other VMs directly without sending traffic to the gateway.
Also, if compromised server hosts multiple VM, malicious traffic can be injected into VPN contexts these VMs belong to directly at the server.
Other non-legitimate communication paths are blocked at the gateway.
To understand the anti-spoofing checks, let’s take a few examples. Using Figure-1 as reference, the Controller would distribute and populate the Gateway’s VPN VRF for Brown as depicted in Figure-7. The white-box represents the FIB used by the Gateway to reach customer Brown’s prefixes, for example VM-6 has a tunnel-next-hop T2, VM-15 has a tunnel-next-hop T4 etc.
Figure-7 : VPN Brown’s Routing Table on the Gateway
VPN Label Spoofing
As described in the Anti-spoofing algorithm in Figure-6, the Gateway does a source-IP lookup of the inner packet and ensures that the source exists in the VPN and is reachable via the same tunnel that the packet is received on. Figure-8 shows 3 scenarios: the first one is legitimate data and is forwarded as per the destination lookup; the second one fails because VM-23 doesn’t exist in VPN Brown’s VRF table, refer to Figure-7; the third one fails because the packet is received on tunnel T3, but the inner source-IP VM-6 has a next hop of tunnel T2.
Figure-8 : Anti-Spoofing Checks
IP Address Spoofing
The anti-spoofing algorithm prevents the attack scenario described in section IP Address Spoofing. When VMs are moved or terminated, the controller withdraws the respective prefixes from the Gateway. In this example, when VM-6 belonging to customer Brown terminates, the route is removed from the Gateway. The packet in Figure-9 is dropped because VM-6 does not exist in the customer VRF, thus preventing this attack.
Figure-9 : Anti-Spoofing Checks
This concept is inspired from the traditional IP RPF check, where an incoming interface could be tied to a source-IP. However, this could not be applied to traffic coming from the network core because there is no concept of an upstream entity to identify the source of the transport. For example, the MPLS transport LSPs between any two PEs are two independent uni-directional LSPs that are not tied together; hence it would not be possible to deterministically say that the packet for a VPN came in on the correct transport LSP, by examining the reverse path for the source address in the VPN.
With IP overlay, the tunnels are symmetric and hence we can take advantage of identifying the tunnel from the data packet. Leveraging on the advantage of our programmable hardware, we can introduce the concept of an incoming overlay interface (tunnel-id in this case) to implement anti-spoofing for the very first time, for the Internet or data-center core.
The anti-spoofing check can be extended and used for any overlay tunnels where traffic forwarding is symmetrical in nature. The overlay tunnels can be GRE/UDP and the data traffic may be Layer-2, for example VX-LAN. Due to the recent explosion of solutions using overlay tunnels, this can become a viable security option for implementing secure VPNs over a shared infrastructure.