QFX Series switches? Reflections on changes to our switching system
Jul 23, 2015
I’m amongst you, a reader of these IT blogs, trying to pay attention to publications that can improve my professional environment. I must also admit that I do read them out of personal curiosity too! In this respect, the network architecture team I run has recently faced a number of challenges that come up again and again:
How to transition to a new organizational structure for our datacenters? Especially in terms of switching and progressive (large-scale) commissioning of 10 Gbps interfaces on connected equipment.
This post is based on those two points, and I hope that my small contribution will help you think about switching evolutions in your technical environments.
Changing switching in the information systems
Our functional model is a traditional one: a “top-of-rack” structure. Speeds are satisfactory for the traffic we have to handle. Using aggregated links makes it possible to gradually address the limitations imposed on us by an overly rigid spanning-tree.
In spite of these points, the change in network traffic related to virtualization (massively East-West) and the expected standardization of 10 Gbps solutions for all servers prompted us to plan ahead for next-generation switching in the datacenter.
The selection criteria were simple:
Retain one Gbps of distribution capability to integrate existing equipment
Be able to accommodate developments of 10 Gbps and more
Simplify day-to-day administration
Incorporate other current ideas, such as information systems automation and orchestration
The solution therefore needs to enable the reconstruction of an “enterprise-class data center”, for instance be sufficiently flexible to comply with the specifications and scalable to be used over the next few years without having to rethink everything from the ground up.
Let’s not beat about the bush! We only approached and shortlisted a limited number of suppliers, and I can imagine you know them all.
Thanks to them, we were able to progress gradually, which was related both to the constraints I have just mentioned and the promising functionalities that were then at the road-map (or beta-test) stage:
The first EX4300 were integrated using the model we were already familiar with connecting Virtual-Chassis (VC) (Top-of-Rack) to the core of the existing LAN, simplifying connections at the same time. The number of uplinks, for example, was reduced through systematic use of 10 Gbps uplinks per VC instead of 1 Gbps aggregated links on each switch.
Integration of the QFX devices is a gradual process. Initially they are being used in “Distribution/Aggregation” switching before taking on a more important role, with dedicated N3 functions and more advanced N2 use (802.1ad, L2/L3 tunnels, VXLAN transport, etc.)
Implementing these more advanced functions reflects our desire to simplify switching layers, since using these solutions results in a simpler model:
Moreover, the economic model has been improved rather than disrupted:
We are still using unit service costs for ‘standalone’ gigabit links (no need for QFX, EX can run standalone!)
We can now offer gigabit and 10 Gbps links that are either standalone for dedicated projects or shared, integrated with the VC.
I would also like to share two elements that played a part in the final decision: automation of configurations and the operational scalability of the solution.
By automation, I mean integration with the solutions used for orchestration, management or integration of new switches into the existing stacks.
I was particularly conscious of templating capabilities. For example, the ability to group tags in a uniform manner, from the core of the LAN to connecting the switches on the hypervisors.
Integrating a new hypervisor comes down to one question.
Which virtualization cluster will this server be connected to? All the rest of the configuration depends on the answer to that question.
Obviously, the ability to exchange/configure equipment using orchestrators, or even requests (xml, batches, web services, etc.) was something we had to take into consideration, as it meets internal expectations for the deployment of new IT solutions.
The operational scalability of the solution is already in evidence with the integration of a fabric in the data centers.
Using a single administration point for the connections in a single datacenter aisle, advanced upgrade capacities without interruption (or, initially, without any downtime) are all ideas at the implementation phase that take nothing away from the technical choices and are prompting increasing interest in the solution!
In the end?
My team and I took quite a gamble in adopting this solution, since none of us had advanced knowledge of Junos operating system, but the deployment went smoothly, without any obstacles or incidents.
Looking back after a few months, everything is going well and the range was a great discovery from every point of view!