Putting SDN in a Box

So when I started this blog, I said that I would tackle technology topics from a strategic perspective. In terms of technologies, is there a hotter topic in our industry than SDN? And is there anything more “strategic” than the ubiquitous 2x2 matrix? Methinks the answer to both is ‘no’.


Tons of people have written volumes of material on SDN, and I don’t really want to re-create many of those conversations, but I need to anchor on a couple of points. First, SDN is a tool. SDN by itself is not an end state but rather a means of accomplishing some useful objective. Second, one of the objectives of SDN is to deliver a dynamic, highly-automated network.


Now when we talk about automating the network, what we are really saying is that we want to automate the set of tasks – or workflow – that drive how the network is being provisioning, monitored, and troubleshot. The real question is whether or not SDN is the right tool for the job.


So how do we decide if SDN is the right tool?


As it turns out, there are lots of tools that execute workflow. Most of us use some or all of: CLI, scripts, NMS, OSS/BSS, or the increasingly popular SDN solutions. It’s not that any one of these is better or worse than others but rather they are all optimized to execute different types of workflow. The exact tool that you select depends on what you are trying to do.


Workflows are trying to solve two primary challenges : complexity and frequency of change. In this context, complexity is a function of however many lines or commands over however many devices, and frequency of change is how often the workflow is to be executed. Using these axes, we can build out our strategic 2x2:



In this model, the lower left of our matrix includes simple, relatively static workflows like initial device configuration. The configuration is isolated to a single device, and it is typically done at deployment and infrequently updated thereafter. For these workflows, CLI is actually fine. Given the infrequency of execution, it might not even be worth automating in some environments.


As the changes become more complex, we start to see network management systems like Junos Space, OpenView, or Tivoli. These are useful when successful provisioning requires making changes to many disparate devices. Static class-of-service schemes are good examples here. COS is notoriously device-dependent from a configuration perspective, and network management systems provide a means of translating intent into the appropriate set of box-specific configuration statements.


The lower right quadrant is for workflows that are relatively simple but that are fairly repetitive. Troubleshooting tasks, for example, might be frequently executed. Whenever some system or network event happens, you want to execute a set of commands to collect all the diagnostic information you can. This type of workflow can be captured in a script that is triggered by the event. The script then is the workflow tool. Other examples include edge port or VLAN provisioning in a datacenter.


So three of the four quadrants have workflow tools that exist today. (note that I am not making any comments on the efficacy of these tools, merely stating that they exist) But what about workflows that are highly complex and that are executed with a high frequency? The CLI misses on both the complexity and the rate of change angles, scripts are not well-suited to handle complexity, and network management systems do not handle dynamic and short-lived changes very well. There is a need for a new type of workflow tool.


From a strategic perspective, this actually makes pretty good sense. We know that SDN initiatives and whole new companies are rising up. (Since I started typing this blog post out, I am pretty sure that at least 4 new companies have been founded with the words software-defined networking in their business plan.) The rise of new businesses only makes sense if there is a need that is currently unmet in the marketplace. Where needs are unmet, you get new opportunities, and where there are new opportunities, there is new money.


But to be clear here, SDN in this context is a tool that is solving a specific set of workflow problems. As companies race to add SDN capabilities to their portfolio, the question for discerning customers will be whether they think vendors understand the problem that needs to be solved or whether they think the buzz around specific technologies is driving the change. If a vendor has the technology right but the problem wrong, there is no guarantee that the tools will be applied correctly. Similarly, having the problem right but the solution wrong is unlikely to yield better results. Ultimately, networking vendors need to get both right.


We need to capture and truly understand which of these are problems and how we want to solve them.  Also realize that workflow is not just one big problem that needs a big solution (this is where network management has always failed — due to scope creep).  We should look at specific use cases through this lens to classify what problem we're trying to solve, and which tools are the best to solve them.  In the end, complexity is solved through policy and abstraction, and frequency of change is increased by adding programmatic control.  But even in the case of both abstraction and automation, there exists a place to do policy definition and enforcement.  This is the key to making these systems truly operational.  You can automate something, but it's important to define a sandbox in which it's allowed to play.  


Good network design has always taught us that you must build networks the same way you build buildings:  anticipate and contain failure modes.  If you introduce programmatic control to speed up edge provisioning, you must add a corollary sandbox.  ex: if you're doing edge port provisioning, you shouldn't be able to accidentally misconfigure core-facing ports. This allows you to get to the points of adding the functionality needed, with the right amount of policy control to make this actually supportable.  It also is why you should break up these workflow-based solutions into more discrete elements.  It's what allows you to create solutions with a definite scope and failure domain.

Top Kudoed Authors
User Kudos Count