Hello Posting here in a run up to open cases left and right.
Last friday I had the following issue occur which we only resolved by taking the effected switch(s) out of the network.
We are in the process of upgrading our Network from EX4200 to EX4300 and QFX5100 to accomidate our mixed infrastructure with a heavy production need of our VMware platform. To do this I want to use the 5100's as our "core" using 2 virtual chassis idenpentantly configured as VC's in 2 of our 3 fire zones in the datacenter. Currently I only have a functional interconnect between core01 to core02 (qfx's) and as leafs I have the EX4300 Connected to each of them. Firezone 2 is only connected to firezone 2 and between rooms to eachother as a daisy chain(for now).
All together there are 4 QFX5100's and 10 EX4300 interconnected to each other, till I sotrt the issue with the QFX's only as a daisy chain.
Now to the problem :
Initially it ran fine having about 40 Vmware hosts cross connected to all the switches (not every host to every switch but to switches in its respective racks/rooms) Last friday all off a sudden the QFX in Firezone 3 stopped passing IP traffic , initially we suspected the switch to be dead but it was running and with LLDP I could see its respective neighbors however anythign connected to this switch was not reachable via IP. We have rebooted the switch and looked at logs. But except the current software known software bug that spams the logs full there was nothing to see.
We in the end restored traffic via patching the interconenct directlly onto the EX4300 in either firezone and completelly excluding the QFX's from the switched network.
As we are planning on using VxLAN with the QFX's we have no STP of anyform and this is a open todo to resolve the redundancy, but ok that is why it is not configured.
Me Systems Engineering and our Backbone team differ in opnion into cause , effect and resolution here so want to see if this soudns familiar to anyone.