I would like some feedback on whether the VCF feature is the best option for what I am trying to accomplish.
I am setting up a WISP core network inside a datacenter. We have p2p fiber access to several buildings that are all backhauled into the datacenter. I have two QFX5100-48S switches that I would like to use as my core/edge units. My company acquired them through a partnership before my arrival and I am just trying to make the best use of them. I have attached a diagram that illustrates the overall core design and carrier handoffs. They are giving us 2 seperate DIA handoffs and 2 VPLS handoffs via 2 4x10G LACP groups. My diagram shows the VPLS handoff as a 40G port coming from them, but their core routers only utilize 10G and 100G ports so they will give us 10 total cross-connects. Ideally I would like the QFXs to provide redundancy in case a carrier loses one of their core router or we lose a QFX. The VCF is mainly used for spine/leaf and my design is nowhere near that. The connections from the QFXs will connect to the core Mikrotik firewalls that will NAT all customer traffic, the mgmt switch and the mgmt firewall that I will use for remote access and server NAT.
Am I better off just letting the QFXs operate as individual routers with an iBGP mesh towards the Mikrotiks? Since I have 4 10G cross-connects for each VPLS, I could connect two to each carrier router and that would protect against losing one my QFX unit.
First of all: VCF will require at least 4 devices to work. What you can do with two units is Virtual Chassis (VC), giving the same one logical switch towards the outside.
Depending on our uptime requirements and potential failure point I see two overall options:
1. Doing the two QFX5100's in a VC giving one place to configure everything. That is making things way more simple from a configuration point of view. You can also do LACP groups across the QFX5100's only degrading them if a member fails. Will remove the need for VRRP as well.
Draw back is that upgrades will potentially have a small downtime as you have to upgrade via NSSU and my experience is that it's not necessarily hitless when deamons are started again on the backup member after the upgrade.... and if your doing something wrong during a change, both members are affected.
2. Doing the two QFX5100's as individual nodes. That will require VRRP and yourself to ensure they are both in sync configuration-wise. You would though ensure that an error on one unit will not directly affect the other one.
This will also make it possible to do standalone upgrades of each unit with minimal downtime on the solution.
I think I would go with option 2 if I had to choose.
P.S. MC-LAG is also an option for switch-redundancy but I don't see it as something which you should consider in this use case.
Thanks for your reply. I am leaning towards option 2 as well but that raises the question of how I can utilize both DIA/VPLS uplinks in load-balancing fashion. Would I have to create two VRRP instances where one QFX is primary for DIA traffic and the other QFX is primary for VPLS traffic? Do I have other options beyond that?