Branch SRX (e.g. 240) is used in packet mode as a PE router and has customers attached. Customers may be connected to L3 interfaces or VLAN interfaces. If a customer introduces a L2 loop in their directly connected network it kills the SRX. Serial console is reponsive, but all routing protocols and remote access is lost as long as the loop persists. As are all other customers attached to the same srx.
Is there any way to prevent this? An ideal way would be to rate-limit BUM traffic and generate log error of some sort. Things I have tried unsuccesfully so far: Ddos protection (works only on mx) storm control (not supported) fw filter match traffic-type (not supported) lo filter (cannot match to e.g. arp)
Bpdu-block works partially in some scenarios (vlan interface and looping device transmits/passes bpdus)
I believe You need to re-think Your PE-CE connectivity design and never ever allow to connect >1 CE L2 port/switchport without LAG to the same PE into the same VLAN. Below are some arguments You may wish to ponder:
- If the customer needs >1 switchport to be connected to the same PE same VLAN, then what could be the valid reason to do so? Bandwidth increase? Then LAG can provide this. Redundancy? Connecting 2 CEs to the same PE or single CE with 2 parallel links to the same PE does not help in case PE goes down.
- next, assuming customer does not care about/does not pay for PE redundancy:
-- then "single CE with 2 parallel links to the same PE same VLAN" scenario is prime candidate for enabling LAG.
-- finally, let's rest a bit on the scenario when 2 CEs are connected with 2 physical links to the same PE same VLAN. What is the valid reason to do that? Customer needs to route between CE[1,2]-PE and to switch between CE1-CE2 at the same time? Then if customer is able to introduce L2 loop in this scenario, then s/he surely have a direct connection between CE1 and CE2 and I see no reason to connect >1 CE L2 port/switchport without LAG to the same PE into the same VLAN.
Thanks for your comments. They are valid, but unfortunately a little bit different issue.
The SRX loop problem exists even if I assign the customer a single L3 router port. If customer decides to connect the port to a switch and accidentally loops the switch/vlan, the SRX dies. So the problem does not require L2 or vlan interface (or multiple such) on the SRX.
With default config mx (re) will die too but this can be fixed with ddos policer.