Archive
Juniper Employee , Juniper Employee Juniper Employee
Archive
DevOps against DDoS II: Monitoring BGP FlowSpec with Junos PyEz
Apr 28, 2015

Introduction

This article continues my previous DevOps against DDoS I: Programming BGP FlowSpec with Junos PyEz post to mitigate Distributed Denial of Service (DDoS) attacks by providing now basic machinery to monitor BGP Flow Specification [RFC5575] route installation and enforcement.

 

Coming now to monitoring and troubleshooting aspects in this scenario, I am going to make use again of YAML data structures, but instead of applying them for provisioning as in my previous post, they will be used now for operational purposes. And for this concept, I am considering that the relevant datapoints to scan are:

 

  • Flow route proper creation at our centric route reflector/SDN controller (R6) and automatic propagation with BGP
  • Flow route compilation into firewall filter terms at the network perimeter (R10, R11)

These performance indicators will be captured in this Junos PyEz approach using the following structured YAML abstractions:

 

  • Operational tables: which can be interpreted as a collection of items or records, each of them identified with a unique key or distinguisher
  • Views: which can be understood as flexible ways to display and render the previously captured keys in a particular fashion

Thus, YAML views are linked to operational tables and determine the way to present their captured data in an abstract manner, so that the orchestrator in this case, does not really need to understand any Junos OS CLI or XML outputs, but purely parse the views' variables.

 

Again, this provides full flexibility to offer a Junos OS agnostic API at the orchestrator for monitoring and troubleshooting and complement our previous parallel provisioning proposal.

 

Monitoring BGP Flow Specification

 

It is crucial to have a monitoring toolkit available to be able to accurately thwart DDoS attacks and ascertain its effectiveness and longevity over time. [RFC5575] Section 9 also covers some minimal monitoring requests that SHOULD be covered.

 

As explained in my previous article, Junos OS conceives dedicated per-instance routing tables or RIBs to store flow route information (<x>.inetflow.0). This implementation grants effective routing information separation, allows resources to scale and operate deterministically, and follows same guidelines from other similar Junos OS concepts.

 

Once the flow route is accepted (see previous validation options at DevOps against DDoS I: Programming BGP FlowSpec with Junos PyEz), Junos OS compiles and incorporates it as a term in a single internal and orthogonal input forwarding table filter (FTF), named __flowspec_default_inet__ (let's assume default instance as in this use case). The flow route compilation order into a given term is following the local term-order directive, where standard is considered the best practice to match latest [RFC5575] Section 5.1 advices.

 

As part of the filter compilation, Junos OS also creates internal packet and byte counters for each term, meaning for each flow route. This is aligned with [RFC5575] Section 9 and provides precise monitoring samples to follow up the impact assessment of a given attack.

 

Therefore, we can monitor in Junos OS first how the flow route is created (as we have a separate <x>.inetflow.0 RIB for that) and how the flow route is compiled into a filter term with associated counters (as we have separate __flowspec_default_inet__ filter terms with counters) at the network boundaries.

 

In this use case, these datapoints can be extracted as RPC outputs from the following Junos OS commands:

 

  • show route table inetflow.0 extensive : shows flow route table details and specific route actions (accept, discard etc.) associated with the flow route

 

root@R6> show route table inetflow.0 extensive 

inetflow.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
198.51.100.1,*,proto=17,dstport=123/term:1 (1 entry, 1 announced)
TSI:
KRT in dfwd;
Action(s): discard,count
Page 0 idx 0, (group iBGP-inet-flow type Internal) Type 1 val 0x9583330 (adv_entry)
   Advertised metrics:
     Nexthop: Self
     Localpref: 100
     AS path: [65000] I
     Communities: 65000:667 traffic-rate:0:0
Path 198.51.100.1,*,proto=17,dstport=123 Vector len 4.  Val: 0
        *Flow   Preference: 5
                Next hop type: Fictitious
                Address: 0x92a07c4
                Next-hop reference count: 1
                State: <Active>
                Local AS: 65000
                Age: 2:42
                Validation State: unverified
                Task: RT Flow
                Announcement bits (2): 0-Flow 1-BGP_RT_Background
                AS path: I
                Communities: traffic-rate:0:0

 

  • show firewall filter __flowspec_default_inet__: shows byte and packet counts for each flow route table term/firewall filter entry
root@R10> show firewall filter __flowspec_default_inet__    

Filter: __flowspec_default_inet__                              
Counters:
Name                                                Bytes              Packets
198.51.100.1,*,proto=17,dstport=123                  2052                   27

 

YAML operational tables and views

 

As in my previous article, I found a good information resource for these concepts in our http://techwiki.juniper.net/Automation_Scripting/010_Getting_Started_and_Reference/Junos_PyEZ/Troubl... and https://techwiki.juniper.net/Automation_Scripting/010_Getting_Started_and_Reference/Junos_PyEZ/Examp...wiki pages.

 

YAML operational table

 

A YAML operational table is a collection of items or records, each of them identified with a unique key or distinguisher. In this example, an operational table can be arranged out of an RPC output to pinpoint those key variables to extract and parse.

 

Another good example from our wiki describes how to define an operational table: https://techwiki.juniper.net/Automation_Scripting/010_Getting_Started_and_Reference/Junos_PyEZ/Troub....

 

But I'm also going to illustrate this with our use case here. I would first need to identify the correct RPC format to also obtain the operational table args (RPC command arguments) and args_key (command argument without requiring a specific keyword):

 

root@R6> show route table inetflow.0 extensive | display xml rpc  
<rpc-reply xmlns:junos="http://xml.juniper.net/junos/14.1R2/junos">
    <rpc>
        <get-route-information>
                <extensive/>
                <table>inetflow.0</table>
        </get-route-information>
    </rpc>
    <cli>
        <banner></banner>
    </cli>
</rpc-reply>

 

And once the RPC format and arguments are properly identified, I would need to dissect the item (XPath expression to select table items from the command response) and the ultimate key (complete XPath expression for the key performance indicator) to capture the intended data. This can be extracted from the direct XML command output:

 

root@R6> show route table inetflow.0 extensive | display xml 
<rpc-reply xmlns:junos="http://xml.juniper.net/junos/14.1R2/junos">
    <route-information xmlns="http://xml.juniper.net/junos/14.1R2/junos-routing">
        <!-- keepalive -->
        <route-table>
            <table-name>inetflow.0</table-name>
            <destination-count>1</destination-count>
            <total-route-count>1</total-route-count>
            <active-route-count>1</active-route-count>
            <holddown-route-count>0</holddown-route-count>
            <hidden-route-count>0</hidden-route-count>
            <rt junos:style="detail">
                <rt-destination>198.51.100.1,*,proto=17,dstport=123</rt-destination>
                <rt-prefix-length junos:emit="emit">term:1</rt-prefix-length>         
[...]

 

These parameters are incorporated into the following sample YAML operational table:

 

---
FlowRoutesTable:  
 rpc: get-route-information
 args:
  extensive: True
  table: 'inetflow.0'
 args_key: table
 item: route-table/rt
 key: rt-destination
 view: FlowRoutesView

 

And you may see there a latest reference to a view, which becomes the actual binding between this YAML operational table (data collection) and its YAML view (data representation).

 

YAML view

 

A YAML view is a customized way to display and render items previously captured with a YAML operational table. As analogy to Junos OS CLI, think of it as a tailored "| display <yaml-view>" option for the customized RPC built up with the YAML operational table.

 

Our PyEz wiki also includes a very explanatory page describing a table view: http://techwiki.juniper.net/Automation_Scripting/010_Getting_Started_and_Reference/Junos_PyEZ/Troubl...that can be checked for further explanations.

 

Following up the same previous example and provided the same RPC output, I am interested in gathering (in YAML syntax) these items:

 

FlowRoutesView:
 fields:
  destination: rt-destination
  term: rt-prefix-length
  active: rt-entry/active-tag
  age: rt-entry/age
  action: rt-entry/communities/extended-community

which actually correspond to XPaths for the actual NLRI description (rt-destination), the compiled filter term as per term-order (rt-prefix-length), the route active state (active-tag), the route existence age (age) and the final flow route action (extended-community) as most relevant key performance indicators from the flow route:

 

root@R6> show route table inetflow.0 extensive | display xml 
<rpc-reply xmlns:junos="http://xml.juniper.net/junos/14.1R2/junos">
    <route-information xmlns="http://xml.juniper.net/junos/14.1R2/junos-routing">
        <!-- keepalive -->
        <route-table>
[...]
            <rt junos:style="detail">
                <rt-destination>198.51.100.1,*,proto=17,dstport=123</rt-destination>
                <rt-prefix-length junos:emit="emit">term:1</rt-prefix-length>
                <rt-entry-count junos:format="1 entry">1</rt-entry-count>
                <rt-announced-count>1</rt-announced-count>
                <tsi junos:indent="0">
[...]
                </tsi>
                <rt-entry>
                    <active-tag>*</active-tag>
                    <current-active/>
                    <last-active/>
                    <protocol-name>Flow</protocol-name>
                    <preference>5</preference>
                    <nh-type>Fictitious</nh-type>
                    <nh-address>0x92a07c4</nh-address>
                    <nh-reference-count>1</nh-reference-count>
                    <nh-kernel-id>0</nh-kernel-id>
                    <rt-entry-state>Active</rt-entry-state>
                    <local-as>65000</local-as>
                    <age junos:seconds="833607">1w2d 15:33:27</age>
                    <validation-state>unverified</validation-state>
                    [...]
                    <communities>
                        <extended-community>traffic-rate:0:0</extended-community>
                    </communities>
[...]

 

So far I have quickly reviewed these both YAML abstractions that will be used in this second article. Let's see how they crystallize in this proposal...

 

Examples with Junosphere and Junos PyEz: Monitoring BGP FlowSpec


Continuing with the previous topology, I would just want to deliver now some basic tools to monitor BGP Flow Specification route installation and enforcement based on the Junos PyEz framework, using these newer YAML data structures.

Assuming that the previous scenario is already present:

 

 

flow-spec-route-advertisement-II.jpg

 

where R10 and R11 are acting as Autonomous System Border Routers (ASBRs) requiring DDoS attack thwarting, R6 is acting as BGP Flow Specification route reflector and SDN controller, and an external system interacts with it via Netconf Remote Procedure Calls (RPCs) as instructed with Junos PyEz, I will provide now some minimal additional Junos automation gotchas for surveillance and observation purposes.


Monitoring BGP Flow Specification routes with Junos PyEz

 

Taking into account now the previously presented concepts for YAML operational tables and views to extract and present output data, they will be combined with Junos PyEz with these mandates:

 

  • Invoke a single Python file at the orchestrator to extract flow routes from inetflow.0 table based on a YAML file that includes specific flow route details (operational table and view)
  • Parse YAML file variable values to extract flow route details (simply printing to standard output in this example)

As in DevOps against DDoS I: Programming BGP FlowSpec with Junos PyEz, I am assuming local name resolution when instantiating the Device class using  host='r6' or others. This would need to be addressed when testing it in Junosphere, by either adding temporary local name resolution entries, or by directly providing the management IPv4 address that the system assigns to the corresponding Virtual Machine.

 

python-show-flow-route.jpg

 

 

Basic Python script using PyEz to connect and issue RPCs

 

Moving on with the same prior premises, a basic Junos PyEz script can connect and issue direct RPCs to extract direct datapoints (I am actually doing it to get single active route counts from the inetflow.0 table).

 

But it can also be used to leverage YAML operational tables and views to obtain more structured and comprehensive data than a single value. This can be achieved by importing the loadyaml module, extracting the RPC output and storing it into a table object, as explained in the final paragraphs at http://techwiki.juniper.net/Automation_Scripting/010_Getting_Started_and_Reference/Junos_PyEZ/Troubl...:

 

from jnpr.junos.factory import loadyaml
[...]

# Load YAML file eith Table and View for 'show route table inetflow.0 extensive'
flowroutedefs = loadyaml('show-flow-routes.yml')
globals().update(flowroutedefs)
flowroutes = FlowRoutesTable(dev)
flowroutes.get()

[...]

where loadyaml effective loads the previously described YAML file:

 

# $Id$
# Yaml widgets for BGP flow route data extraction
# - FlowRoutesTable extracts 'show route table inetflow.0 extensive'
#    and rt-prefix-length variables are used as class keys()
#    in format 'term:1' etc.
# - FlowRoutesView identifies most relevant fields
#    including destination as NLRI composition, age and action
---
FlowRoutesTable:  
 rpc: get-route-information
 args:
  extensive: True
  table: 'inetflow.0'
 args_key: table
 item: route-table/rt
 key: rt-destination
 view: FlowRoutesView

FlowRoutesView:

 fields:
  destination: rt-destination
  term: rt-prefix-length
  active: rt-entry/active-tag
  age: rt-entry/age
  action: rt-entry/communities/extended-community

and at that point, it suffices to iterate through each table entry with each route as a tuple name to gather the details:

 

[...]
# Extract relevant information per route
for route in flowroutes:
 print "Filter %s -- flowroute %s, action %s, age %s" % (route.term,route.destination,route.action,route.age)
[...

and handle them conveniently (printing them to standard output in this example):

 

user scripts $ python show-flow-routes.py
---- Total number of active flowroutes: 2 ----
Filter term:1 -- flowroute 198.51.100.1,*,proto=17,dstport=53, action traffic-rate:0:0, age 18:25
Filter term:2 -- flowroute 198.51.100.1,*,proto=17,dstport=123, action traffic-rate:0:0, age 29:16

In the end, this is very easy instrumentation for an external Junos OS agnostic orchestrator to scan the current flow route existence, persistence and state. It is just focused on capturing data from R6, as central route reflector and SDN controller, assuming that flow routes would be propagated from there.

 

It can be easily expanded to issue the same Netconf RPCs to R10 and R11 systems, but better than that, let's look directly into the __flowspec_default_inet__ filter composition, so that we implicitly check route validation and acceptance, and also gather the filter counters!

 

Monitoring BGP Flow Specification filters with Junos PyEz

 

Since we want to scan all ASBRs from the network (R10, R11) to verify __flowspec_default_inet__ filter compillation and gather counters results, this exercise requires now explicit Netconf RPCs to each of these devices. While keeping full control-plane rule dissemination control from R6, the orchestrator just needs to minimally get in touch with the ASBRs to collect these details from the network perimeter:

 

 

flow-spec-filter-inspection.jpg

 

As explained in the previous chapter, this just requires Netconf over SSH services enablement on these border routers, as interaction media for our Junos PyEz automation concept:

 

root@R10> show configuration system services 
ftp;
ssh;
telnet;
netconf {
    ssh;
}

Once this is ensured to be enabled in the network and making use of the same previous YAML operational tables and views to extract and present output data, this requirement can be covered in the following fashion:

 

  • Invoke a single Python file at the orchestrator to extract __flowspec_default_inet__firewall filter counters and information using a YAML file that includes specific filter term details (operational table and view)
  • Parse YAML file variable values to extract key counter details (simply printing to standard output in this example)

 

python-show-flow-filter.jpg

The same previous YAML operational tables and views apply here, where loadyaml loads a YAML file such as:

 

# $Id$
# Yaml widgets for BGP flow filter  data extraction
# - FlowFilterTable extracts 'show firewall filter __flowspec_default_inet__'
# - FlowFilterView identifies most relevant filter fields
#    including counter-name, packet-cpunt and byte-count
---
FlowFilterTable:  
 rpc: get-firewall-filter-information
 args:
  filtername: __flowspec_default_inet__
 args_key: filtername
 item: filter-information/counter
 key: counter-name
 view: FlowFilterView

FlowFilterView:
 fields:
  name: counter-name
  packet_count: packet-count
  byte_count: byte-count

which corresponds to XPaths from the following RPC output:

 

root@R10> show firewall filter __flowspec_default_inet__ | display xml 
<rpc-reply xmlns:junos="http://xml.juniper.net/junos/14.1R2/junos">
    <firewall-information xmlns="http://xml.juniper.net/junos/14.1R2/junos-filter">
        <filter-information>
            <filter-name>__flowspec_default_inet__</filter-name>
            <counter>
                <counter-name>198.51.100.1,*,proto=17,dstport=123</counter-name>
                <packet-count>35</packet-count>
                <byte-count>2444</byte-count>
            </counter>
            <counter>
                <counter-name>198.51.100.1,*,proto=17,dstport=53</counter-name>
                <packet-count>55779</packet-count>
                <byte-count>4239204</byte-count>
            </counter>
        </filter-information>
    </firewall-information>
    <cli>
        <banner></banner>
    </cli>
</rpc-reply>

Note there is an additional factor to consider now: how can we extract connectivity information and details to get into each ASBR system (as there is no single RPC now)?

I have taken the approach here to leverage the csv Python library and store these details in a separate local .csv file keeping hostname (assuming DNS resolution is working here), username and password per device at each line:

 

user scripts $ cat asbrs.csv 
hostname,user,password
R10,juniper,Clouds
R11,juniper,Clouds

There are only 2 ASBRs in this example, but a .csv file surely scales beyond for a much higher number of routers in the field. This is obviously a flexible approach and can be adjusted at will (i.e. include IPv4 management address if there is no direct hostname resolution).

 

The main Python script scans these details, stores them as elements from a Python list and uses them to connect to each ASBR in every case. Then, it iterates through every list element for each ASBR, captures its data with same loadyaml usage and sibling FlowFilterTable object creation, and extracts counter details from every filter term view tuple inside each of these objects.

 

When executing the Python file in the orchestrator, just printing to standard output for easy illustration purposes:

 

user scripts $ python show-flow-filter.py
---- Global BGP Flowspec filter for R10 ----
Counter 198.51.100.1,*,proto=17,dstport=123 -- Packets 27, Bytes 2052
Counter 198.51.100.1,*,proto=17,dstport=53 -- Packets 64, Bytes 4864
---- Global BGP Flowspec filter for R11 ----
Counter 198.51.100.1,*,proto=17,dstport=123 -- Packets 8, Bytes 608
Counter 198.51.100.1,*,proto=17,dstport=53 -- Packets 8, Bytes 608

With this simplistic approach, we can provide basic instrumentation to follow up filter installation and how the DDoS attack is being effectively thwarted across the network.

 

A final remark to consider here from the Junos OS implementation is that the __flowspec_default_inet__firewall filter counter name (filter-information/counter/counter-name) corresponds with exactly the same nomenclature to the inetflow.0 NLRI route description (route-table/rt/rt-destination). Because we have the same string in route description and filter counter name, we could directly correlate route-to-filter implementation. I ended up not actually needing it, but this is surely very appealing to be correlated Smiley Wink

 

 

Recomendations for BGP Flow Specification live rollout

 

As final section in this article and considering the wide span and implications from BGP FlowSpec, I would also want to provide some suggestions and heads-up for live rollout:

 

- PR734453 / PSN-2012-10-733: 2012-10 Security Bulletin: Junos: RPD crash when receiving BGP UPDATE wi...

 

  • Receipt of a BGP UPDATE message containing a crafted flow specification NLRI (RFC 5575) may cause RPD to crash.  
    The update creates an invalid inetflow prefix which causes the RPD process to allocate memory until it reaches its assigned memory limit.
    After trying to exceed the process memory limit, RPD will crash and restart.  
    The system recovers after the crash, however a constant stream of malformed updates could cause an extended outage.
  • This is fixed in 10.0S18, 10.4R11, 11.4R5, 12.1R3 and 12.2R1, and RPD now logs and error when receiving the malformed update and avoids the memory allocation loop.

Note that this issue was actually behind the http://blog.cloudflare.com/todays-outage-post-mortem-82515 incident.

 

- PR1004575: The rpd process may crash with more than 65535 flow-spec routes

 

  • When there are more than 65535 "flow-spec" routes existing in the routing table, the rpd process might crash because it exceeds the current maximum supportable scaling numbers. Current scaling numbers are in the range of 10K~16K
  • This issue may be seen if following conditions are met:
    • BGP is enabled for Flow-Specification Routes
    • More than 65535 traffic flow specifications routes in the routing table (Including both routes learned from BGP and locally configured flow routes)
    • Flapping the flow routes
  • This is fixed in 12.3R8, 13.2R6, 13.3R4, 14.1R3 and 14.2R1, but note that current scaling numbers remain in the 10K~16K order.

 

- PR1047271: The rpd process might crash and dump core files after executing "show route table .inet...

 

  • If the flow routes (flow route is an aggregation of match conditions for IP packets) are active in the kernel, the rpd process might crash after executing command "show route table <X>.inetflow.0 extensive".
  • This issue may be seen if following conditions are met:
    • The flow routes are active in the kernel
    • Executing command “show route table <X>.inetflow.0 extensive” in that situation
  • This is fixed in 12.3R9, 13.1R1, 13.2R1 and 13.3R1

Beyond this specific example over Junosphere, please take them as considerations, fixed issues and guidelines for BGP Flow Specification testing and/or live deployment at scale.

 

Conclusions

 

BGP Flow Specification [RFC5575] provides an appealing and programmatic DDoS mitigation SDN framework for service provider and large enterprise networks. Even though this standard is eligible for certain extensions and implementation improvements, it currently provides the needful toolset for operators to deterministically enforce thwarting actions even across networks, such as redirection to a L3VPN towards a scrubbing center or programming firewall filter actions at input forwarding table level.

 

Furthermore, this application leverages well-known BGP as SDN protocol, with a wide range of attributes and routing policy options, just using another family and providing transit services across networks.

 

In this simplistic use case explained across both articles, my intention was to demonstrate that just with a bunch of lines of code using Junos PyEz, that surely need refinement for a proper solution, and existing infrastructure based on Netconf services and a baseline BGP configuration, such an SDN scenario can be easily created offering an external API to the orchestrator. Both for provisioning and monitoring purposes.

 

The DDoS mitigation orchestrator just needs to fill in the YAML file variable with values for every flow route to create or delete in the provisioning case, and simply execute the python files for the monitoring examples. No need to know about Junos OS configuration schemes or details. And this could be easily integrated in a scrubbing center with a portal or any other user interface as an API.

 

Indirectly, I have actually described as well my Junos PyEz learning process with a basic virtual scenario in Junosphere. However, these same principles and guidelines could become the cornerstone for a proper DDoS thwarting solution.

 

We would love to hear feedback and share experiences about DDoS attack mitigation, provided this is a current trending topic and affects all of us in the networking industry, or if you have conceived or intend to learn or use Junos PyEz to build up a solution.

 

We want to improve and learn from these lessons! Please be our guest and chime in, drop your comments here or via #TheRoutingChurn twitter hashtag.

 

 

Feedback