2018 and the Dawn of Network Reliability Engineering (NRE)
Dec 20, 2017
Last week at its flagship customer event, NXTWORK, Juniper Networks set a valiant vision for its role in the future of networking: “Engineering. Simplicity.” Next week as we take respite from engineering and set some 2018 goals, simplicity sounds good. Here are some of my ideas inspired by engineering for simplicity and Juniper’s new #simpleAF tag.
Da Vinci called simplicity: ultimate sophistication. It would come more easily to those solving more primitive challenges, but maybe you, like Juniper, audaciously tackle cloud-grade problems, and in such domains and at such scale, simplicity is anything but simple to achieve.
The thing about creating simplicity is that it’s not just about tools or products. If you’re a network operator, another tool won’t make a revolutionary nor lasting impact toward simplifying your life any more than the momentary joy of a holiday gift. Big impacts and life changes start inside out. They don’t happen have-do-be, rather they are be-do-have. Juniper is doing its own work to put simplicity at its company’s core being, but this article, besides some gratuitous predictions, is about transforming your network operations, putting simplicity at your core for a happier prosperous 2018.
Be… the NetOps Change
To be the engineer of network simplicity, it’s time to lose the title of network admin, operator or architect and embrace a new identity as a:
Network Reliability Engineer
Just like sysadmins have graduated from technicians to technologists as SREs, the NRE title is a declaration of a new culture and serves as the zenith for all that we do and have as engineers of network invincibility. And where could invincibility be more important than at the base of the infrastructure that connects everything.
Start with this bold title, and fake it until you make it who you truly are.
Do… It Like They Do on the DevOps Channel
Just like SREs describe their practices as DevOps, network reliability engineering embraces DevNetOps.
While Dev and (app) Ops are working closely together atop cloud-native infrastructures like Kubernetes, the cluster SRE is the crucial ops role that creates operational simplicity by design of separation of concerns. Similarly, the NRE can design simplicity by offering up an API contract to the network for its consumers—probably the IaaS and cluster SREs in fact.
The lesson here is that boundaries are healthy. Separate concerns by APIs and SLA contracts to deliver simplicity to yourself as well as up the chain, whether that is to another overlay network or another kind of orchestration system.
It’s important that your foundational network layers achieve simplicity and elegance too. Trying to put an abstraction layer around and over top of a hot mess is a gift to no one, least of all yourself if it’s your mess.
So how do we clean up the painful mess that is the job of operating networks today? I’ve examined borrowing the good habits of SREs and DevOps pros before, in the shape of DevNetOps blogs and slideshares. Here is a quick recap of good behaviors to move you from the naughty to the nice list, along with my predictions for 2018.
1. Start with Networks as Code Prediction: When people say the network CLI is dead, they jump straight to thinking about GUIs. Well for provisioning changes, you ought to start thinking about Eclipse Che or an IDE instead of product GUIs, and start thinking about GUIs more for observability and management by metrics. Networks as code start with good coding logistics. This coming year, I think we’ll see DevNetOpserati practice this simple truth and realize the harder one that networks as code is better on top of automated networking systems themselves. In other words, networking and config as code belong on top of SDN “intent-driven” systems, not box-by-box configurations in your github repo; nevertheless, that may be a good step depending on where you’re at in your journey.
2. Orchestrate Your Timeline as a Pipeline Prediction: It follows from coding habits that you follow a review and testing process for continuous integration (CI). While vendors dabble in testing automation services and simulation tools, I predict that we will see more of a focus on these in 2018 and opinionated tools that orchestrate the process workflow of CI and eventually continuous delivery/deployment (CD). This is whitespace for vendor offerings right now, and the task is ripe for truly upleveling operator simplicity with process innovation. While the industry talks about automation systems like event-driven infrastructure (EDI) and continuous response, mature DevOps tooling is ready and waiting for DevNetOps CI process pipeline automation. Furthermore, it’s more accessible in terms of codifying or programming with a higher reliability ROI.
3. Micro and Immutable Architecture Prediction: 2017 was the year everyone went koo koo for Kubernetes and containers. It has mainstream adoption for many kinds of applications, but networking systems from OSs to SDNs to VNFs are all lagging on the curve of refactoring into containers. I’ve reported on how finer-grained architectures are the most felicitous for DevOps, DevNetOps, and the software and hardware transformation that networking needs in order to achieve the agility of CD. We must properly architect before we automate. We’ve started to see containers hit some SDN systems in 2017, but I predict 2018 will be the year we start to see VNF waypoints as containers with multi-interface support in CNI and a maturation of higher-order networking in the ruling orchestrator, Kubernetes.
4. Orchestrated CD from Your Pipeline Prediction: I’m doubtful that we’ll hit this mark in 2018 in the area of DevNetOps, mostly because of some of the above prerequisites. On the flip side, it’s obvious that in 2018 we are going to see the network play a big role in micro- services CD thanks to the rise of service meshes that do a much better job of canary and blue-green deployments. CD for actual networking systems is likely experimental in 2018 or bleeding edge for some SDN systems.
5. Resiliency Design and Drills Prediction: This is the fun practice of chaos engineering, designing and automating around failures to stave off black-swan catastrophes. This design is already showing up in preference for more smaller boxes instead of fewer larger boxes. Most enterprises are also getting around to SD-WAN to use simultaneous hybrid WAN connectivity options. There is more to do here in terms of tooling and testing drills in CI pipelines, as well as embracing the “sadistic” side of the NRE culture that kills things for fun to measure and plan for invincibility or automatic recovery. In 2018, I think this will continue to focus on evolving availability designs until we make more progress in areas 2 and 3 above.
6. Continuous Measurement Prediction: The SRE and NRE culture is one of management by metrics; thus, analytics are imperative. There is always progress happening in the area of telemetry and analytics, and I know at Juniper we continue to push forward OpenNTI, JTI and AppFormix. 2018 beckons with the opportunity to do more with collection systems like Prometheus and AppFormix, employing narrow-AI deep learning for the first time to raise new insights beyond normal human observation.
7. Continuous Response Prediction: With the 2017 rage around intent-based or -driven networking, there is sure to be progress in 2018. While the past few years have focused on EDI, I think the most useful EDI actions are largely poised to get subsumed into SDN systems with controllers in them, similar to how Kubernetes’ controllers implement the continuous observation and correction to the declared state. There is however a tribe of NREs that look to automate across networking systems and other IT systems like incident management, ticketing and ChatOps tools. As I wrote about in May, I think the maturing serverless or FaaS space will eventually win as the right platform for these custom EDI actions.
Have… a Happier Holiday
As an NRE, it’s not just about doing DevNetOps behaviors or processes nor is it about having the greatest tools or code. When you’re network reliability engineering for simplicity, it’s equally about what’s really important when you take the NRE cape off and go home: not getting called in and enjoying a happier holiday. And so, simplicity is what I wish for you this holiday season, and for the next one, may you be further down the road of engineering simplicity.