/var/blog
/var/blog
Roll back or roll forward: How to tell if you are really looking for DevOps
10.09.17

A lot of companies look at what the major cloud and XaaS providers are doing with their operations and develop some form of cloud envy. They dream of a highly automated infrastructure capable of making lots of changes at a moment’s notice. Their CFOs envision a lean workforce capable of supporting thousands of devices with a small number of generalist resources. Indeed, the vision of a more efficient, more agile IT is quite enticing. 
 
In plotting their course to this operational nirvana, many people look first to the tools. They view the transition as primarily technical. Of course, these kinds of major transformations are rarely so simple.
 
Start with culture
There is a ton of writing out in the blogosphere on the role of culture in IT, and there is little need to offer a fresh take on a fairly noncontroversial point of view. So let me just assert that if you think that your company will execute a transition without accounting for cultural elements of change management, you are likely going to be in for a disappointing transition. 
 
In short, culture is going to matter, which means you need to spend time getting beyond the tooling to understand where your opportunities and bottlenecks are going to reside. 
 
Automation
I have spent a lot of time over the last 10 years talking to networking customers about automation, and over the last few years, DevNetOps (the networking instantiation of DevOps). More often than not, when people come in wanting to talk DevOps, they really actually want an automation discussion. What’s the difference?
 
I think of automation as primarily focused on workflows. You identify workflows, and then you take steps to make them execute automatically.
 
But automation is more than scripting. If your view of automation is that you are going to write a script to run a sequence of steps for you, chances are that you have used scripting skills to develop a solid approach to abstraction. Reducing 134 steps to a single command is abstracting those 134 steps and providing some set of inputs to execute them in sequence. This is a hugely useful thing to do, but it doesn’t get to the heart of automation.
 
Automation is more about collaboration. When two things (people, systems, organizations) need to work together, there is a requirement that they share information. Put simply, the currency of automation is data. And this means that automation is primarily an exercise in figuring out how to make to things talk (data distribution) and speak the same language (data normalization). This is why you see a lot of message bus and translation layers (API brokers) in many automation discussions.
 
DevOps
Where automation is about workflows, DevOps is more about treating infrastructure as code. It’s about how you make changes to your code (infrastructure), test those changes, and then deploy them. This is why you hear continuous integration and continuous delivery used in many DevOps discussions. 
 
For the vast majority of enterprises, moving straight from ITIL to DevNetOps is a bridge too far (and frequently an unnecessary objective). Most companies are just looking for ways to automate their infrastructure to some degree. And the evolution from CLI-driven to something more event-driven or even machine-driven is a fairly substantial step. 
 
OnARoll.png
When things go wrong…
I use a pretty simple litmus test when I talk to teams about their journey down these two paths.
 
Imagine you are working during a change window on a Saturday night. You make a bunch of changes to the network, and then you realize that things have gone horribly wrong. In that instance, is your instinct to hit Ctrl-Z and roll back? Or is your instinct to double down on the change and roll forward?
 
Filtering out failure
Philosophically, I think of companies in two ways. One type of approach is to filter out failure. These would be companies that have lengthy evaluation cycles. They stress test everything they do. When they make changes, they probably review those changes with a fairly heavy hand. And then they schedule strict change windows. These companies probably also have fairly draconian policies around changes outside the scheduled windows (including long lockdown periods around key dates, like end of year). 
 
In this mode of operation, the goal is to filter out failure. You try to discover the failures before they hit under the basic premise that failure cannot happen in a production environment. 
 
If something goes wrong and you have to change plans, you would have to send them through the process. Because it is arduous and time-consuming, it’s better to roll back any issues, and then take a run at the problem again in some future change window.
 
Designing for resiliency
The counter approach to filtering out failure would be to assume that failure is not just a possibility—it’s a certainty. In this case, the best you can do is to build out a resilient infrastructure, where failure doesn’t necessarily spell doom. 
 
Netflix is perhaps the most famous of those with this approach. Their work around Chaos Monkey (and the broader Simian Army) basically introduces failures into their production environment under the premise that the infrastructure ought to be resilient. I don’t know how many companies would feel comfortable doing that to their own infrastructure, but it certainly gets the difference in design philosophy across.
 
It’s worth pointing out that neither approach is inherently right or wrong. Companies have different missions. For some companies, if their network goes down, people literally die. And for others, it simply means waiting another day to find out if Jon Snow is really dead. 
 
The bottom line
If you are in IT, it’s worth having a very honest conversation within your company about where your sensibilities are. It’s good to have ambition and to start to look at more advanced tooling. But you need to understand where you are from a people, process, and culture perspective as well. 
 
For many companies, the leap to full DevNetOps might be unnecessary. Automation might unlock most of the useful value, and do it in a way that doesn’t require a multi-year transformation to reap the rewards. 
 
Whatever your desired end state, it’s important to understand your starting position, and to plan accordingly. And a little precision around the vocabulary you use to engage your teams and your suppliers will go a long way to cutting through the BS and getting to the meaningful set of changes that will impact your business.

Top Kudoed Authors
User Kudos Count
29