/var/blog
/var/blog
Organizational risk of next-generation networking
08.20.17

As the emphasis in networking shifts from device management to over-the-top integrations with the move towards automation and DevOps, our network engineers will spend an increasing percentage of their time working above the devices that form the foundation of the systems over which they have dominion. This means that engineers will have to add to their already broad skillsets. 
 
bottleneck.jpgBecause of the increased emphasis on these layers of integration and automation, there will be an emerging class of network engineers whose dominant skills are not in the plumbing but rather the operational frameworks that make the plumbing work.
 
And therein lies the risk.
 
The educational problem with incrementalism
Networking has been a discipline of incremental evolution since its inception. With very rare exceptions, the technologies that we add to networking are in addition to the existing technologies that make things work. And because of the extreme importance of backward compatibility, it’s sometimes even impractical to remove things.
 
While there are architectural impacts of such an evolutionary path, I want to focus more on the people impacts. If you are a network engineer, the very practical implication here is that you have to be a master in everything you have learned to date, and then you must continually add new skills as technology gets layered on top.
 
Minimally, this means that the burden to stay current is constantly increasing. Not only do engineers have to brush up on their existing skills, but they must dedicate new time each year to learning and putting into practice the new things that have recently been introduced. 
 
If you assume that an engineer (or an organization of engineers) has a finite amount of time, and that keeping the lights on takes up some base chunk of that time, it means that the remaining time is getting spread thinner and thinner amongst the topics that matter.
 
Organizational impacts of a learning bottleneck
While this can wreak havoc on a network engineer’s time, the cumulative effect over a company’s networking team is also worth understanding. As companies evaluate new technologies and develop strategies to adopt all of the new shiny, shinier, and shiniest things, if there is not sufficient discretionary time within the engineering base, these technologies will languish. 
 
Imagine a C-level executive responding to a conference keynote or airport advertisement that is peddling the latest technology. She pushes on her Vice Presidents to see where the organization is. If the underlying teams do not have enough time dedicated to learning, they will have to somewhat disappointingly respond that they are nowhere with whatever the new thing is. 
 
A systemic change in behavior
This will drive a systemic change in how people spend their time, especially in the upcoming crop of infrastructure engineers. Because the organizational gaps in understanding will necessarily be in the new areas, newer engineers will find that they are more valued if they focus on newer things. For example, who has a better chance of landing a great job: someone who is excellent in BGP or someone who has relevant experience with Terraforms?
 
This dynamic will be reinforced as the purveyors of new things flood the market with education opportunities. We are nascent in the cloud space, but we are already starting to see lots of training courses around the various cloud ecosystems. These will further eat into precious skills development time, and with good reason.
 
So as the years go on, what is a network engineer to do? For every hour spent reviewing relevant foundational skills, there is one hour less to become a force in the new stuff. And that means that the newer employees will get to work on the sexier things while the individual infrastructure person gets more and more cast into her traditional role.
 
A real impact on certifications
This will actually have a very real impact on the nature of certifications. As the market fractures along a variety of new skills lines, the value of any individual certification will go down. And this will further reinforce the movement from the old to the new. 
 
If a certification is no longer valued by management, will individuals pour the hours required to cert and re-cert? Perhaps at first, but undeniably, this will change behavior in the aggregate, leaving the basic set of foundational skills that already exist to wither a bit on the vine. 
 
These skills don’t matter as much, until they do
Of course, people will talk about how understanding some of the more esoteric parts of a particular protocol or technology implementation is increasingly less important. Indeed, as we move towards abstracted control through both centralized controllers and declarative config (or intent-based networking if you want to sound more cool), the details will be called upon less frequently. 
 
But we ought not equate frequency with importance. 
 
If the major AWS S3 outages have taught us nothing, they should have at least taught us this: new or even outsourced does not mean you can ignore solid architecture and understanding. Those who pawn off the details to “the cloud” (in quotes here as I mean it metaphorically more than explicitly) will find that they are ill-equipped to handle things if things go wrong. And they always go wrong, even if only from time to time.
 
What should enterprises do?
It’s important to value the new. It’s important to bring in new skills. But this cannot happen at the expense of solid, on-the-ground understanding of what is actually going on. Visualization tools are fantastic, but they offer evidence that can only be used if someone is capable of matching the patterns and determining both the cause and the remediation. Abstractions are enterprise-changing, but they can mask underlying behavior that needs to be understood at times. And network engineers need to be particularly cognizant of leaky abstractions (finding the right balance between unnecessary control and not enough visibility might prove fatal in some networks).
 
Clever enterprises will insist that next-generation engineers spend time learning the foundational skills from the current generation. But this simply adds additional learning burden, reenforcing the learning bottleneck. So what has to happen?
 
The role of vendors
Ultimately, this dynamic ought to force design behavior within the vendor community. The same emphasis that has been put on new technology development will need to be put on workflows, particularly for diagnosing and remediating issues. It’s not enough to throw off syslog messages and hope that someone can correlate those across the infrastructure to figure out what to do. 
 
And understanding how to handle these workflows means thinking like a user, not a vendor. Solving for perfect-case scenarios (homogeneous environments, for example) is to ignore the reality that people deal with. And providing tooling that helps for easily-detectable things is nice but not meaningful. A link is down? That’s easy. Some latency threshold is surpassed? Yawn. It’s the intermittent, difficult to replicate (and typically soft) failures that need attention. Gradual degradation is just as bad as a complete failure, and worse if all of those lessons on redundancy paid off.
 
The bottom line
We need to be reinforcing the foundational elements of networking. And we need to do it through a combination of reenforcing learning across the ranks along with a renewed focus on real operational innovation. 
 
If we continue to view trends like automation as primarily a means of improving provisioning (one more ZTP press release and I am going to lose it), we will be missing the forest for the bowling balls (i.e., looking in the wrong place entirely). And with the rise of controller architectures, streaming telemetry, and event-driven automation, we might actually be in a position to make some progress here.

Top Kudoed Authors
User Kudos Count
29