07-09-2012 10:50 AM - edited 07-10-2012 12:31 AM
Hi -- has anyone come across this:
We have a spanning tree (vstp) topology with 1 to 6 vlans shared between around 20 EX2200s access switches and two Cisco 6509s serving as distribution switches (vlans trunked on links between the 6509s as well).
The setup ran without problems on JUNOS 10.4R5.5. On Thursday a colleague upgraded some of the switches to 11.4R2.14 and STP got out of sync with the 6509s. Needless to say absolutely no changes were done do the configs.
Symptoms: EX2200 log: endless looping entries like these ones here
Jul 9 13:48:32 ex2200-1 eswd[979]: ESWD_STP_STATE_CHANGE_INFO: STP state for interface ge-0/1/1.0 context id 2 changed from BLOCKING to LEARNING
Jul 9 13:48:32 ex2200-1 eswd[979]: ESWD_STP_STATE_CHANGE_INFO: STP state for interface ge-0/1/1.0 context id 3 changed from BLOCKING to LEARNING
Jul 9 13:48:33 ex2200-1 eswd[979]: ESWD_STP_STATE_CHANGE_INFO: STP state for interface ge-0/1/1.0 context id 2 changed from LEARNING to FORWARDING
Jul 9 13:48:33 ex2200-1 eswd[979]: ESWD_STP_STATE_CHANGE_INFO: STP state for interface ge-0/1/1.0 context id 3 changed from LEARNING to FORWARDING
Jul 9 13:48:34 ex2200-1 eswd[979]: ESWD_STP_STATE_CHANGE_INFO: STP state for interface ge-0/1/1.0 context id 2 changed from FORWARDING to BLOCKING
Jul 9 13:48:34 ex2200-1 eswd[979]: ESWD_STP_STATE_CHANGE_INFO: STP state for interface ge-0/1/1.0 context id 3 changed from FORWARDING to BLOCKING
if adding the following stmt to the vstp configuration:
set protocols vstp vlan all interface uplinks bpdu-timeout-action log
then the following messages adds to it:
Loop_Protect: Port ge-0/1/1.0: Received information expired on Loop Protect enabled port
The 2nd Cisco 6509 (configured as backup root bridge) shows an interesting state:
boa-ft-b#sh spanning-tree int gi2/8
Vlan Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
VLAN0255 Desg LRN 4 128.136 P2p Dispute
VLAN1733 Desg BLK 4 128.136 P2p Dispute
VLAN1734 Desg BLK 4 128.136 P2p Dispute
Researching what "dispute" means for rapid-pvst at cisco yields that the cisco box determines from the BPDUs it receives upstream from the EX2200s that there is sort of a unidirectional link to that downstream EX2200 (the EX2200 seems to ignore the BPDUs the 2nd 6509 offers). That seems to match what the EX2200 logs. It s confirmed that there is no udld problem underneath...
We ve been able now to recreate that scenario in our test environment and have done some code version testing to see with which JUNOS versions we see this behaviour -- our results:
10.4R5.5 ok
10.4R9.2 ok
11.1R6.4 ok
11.2R7.4 error
11.4R2.14 error
11.4R3.7 error
12.1R2.9 error
our vstp setup on the EX2200
set protocols vstp bpdu-block-on-edge
set protocols vstp vlan all bridge-priority 48k
set protocols vstp vlan all interface all edge
set protocols vstp vlan all interface uplinks mode point-to-point
set protocols vstp vlan all interface uplinks bpdu-timeout-action log
(the interface range uplinks holds ge-0/1/0 (to root bridge) and ge-0/1/1 (to backup root bridge))
our rapid-pvst setup on the 6509 non-root bridge
spanning-tree mode rapid-pvst
no spanning-tree optimize bpdu transmission
spanning-tree extend system-id
spanning-tree vlan 255,1733-1734 priority 8192 (the root bridge has 4096)
spanning-tree vlan 255,1733-1734 forward-time 6
spanning-tree vlan 255,1733-1734 max-age 6
interface GigabitEthernet2/8
description trunk to Juniper EX2200
switchport
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 255,1733,1734
switchport mode trunk
switchport nonegotiate
no cdp enable
spanning-tree link-type point-to-point
spanning-tree guard loop
Now our question:
We ve already checked the release letters but could not find a hint regarding changed default values or new config settings we would have to apply so currently we re lost as what to do to fix the issue.
Any clues anyone?
Thank you in advance for any pointer on this.
Regards, Joachim
07-09-2012 05:06 PM
07-10-2012 04:19 AM
Hi Nikolay,
good idea -- however the results are
jbrauer@boa-ka-xyz> show spanning-tree interface ge-0/1/1 detail
Spanning tree interface parameters for VLAN 1733
Interface name : ge-0/1/1.0
Port identifier : 128.610
Designated port ID : 128.610
Port cost : 20000
Port state : Learning
Designated bridge ID : 50885.50:c5:8d:a9:0f:c1
Port role : Designated
Link type : Pt-Pt/NONEDGE
Boundary port : NA
Edge delay while expiry count : 708
Rcvd info while expiry count : 708
Spanning tree interface parameters for VLAN 1734
Interface name : ge-0/1/1.0
Port identifier : 128.610
Designated port ID : 128.610
Port cost : 20000
Port state : Learning
Designated bridge ID : 50886.50:c5:8d:a9:0f:c1
Port role : Designated
Link type : Pt-Pt/NONEDGE
Boundary port : NA
Edge delay while expiry count : 707
Rcvd info while expiry count : 707
I ve even tried to replace the command
set protocols vstp vlan all interface all edge
with commands for the access ports, e.g.
set protocols vstp vlan all interface IA_YZ_PROJ_B_1733 edge
set protocols vstp vlan all interface IA_YZ_PROJ_C_1734 edge
set protocols vstp vlan all interface uplinks mode point-to-point
thus excluding the interface-range "uplinks" from being specified as both edge and p2p, but to no avail.
So I think it might be time to open a JTAC...
Thanks anyway for your help!
07-16-2012 08:39 AM
We're seeing something similar where we have Cisco 6504s running IOS 12.2(33)SXI in the distribution layer and Juniper EX4200s running Junos 12.1 in the access layer:
d-WQUAD-2#sho spann vlan 543
VLAN0543
Spanning tree enabled protocol rstp
Root ID Priority 8735
Address c89c.1d4d.18c0
Cost 5
Port 2 (GigabitEthernet1/2)
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 33311 (priority 32768 sys-id-ext 543)
Address 0024.972f.a540
Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Aging Time 14400
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi2/11 Desg BLK 4 128.139 P2p Dispute
On the Juniper switch the connected interface flaps back and forth between blocking and
forwarding:
khillig@WQUAD_L222_150-10> show spanning-tree interface ge-1/1/0.0 vlan-id 543 detail
Spanning tree interface parameters for VLAN 543
Interface name : ge-1/1/0.0
Port identifier : 128.617
Designated port ID : 128.139
Port cost : 20000
Port state : Blocking
Designated bridge ID : 12831.00:24:97:2f:a5:40
Port role : Alternate
Link type : Pt-Pt/NONEDGE
khillig@WQUAD_L222_150-10> show spanning-tree interface ge-1/1/0.0 vlan-id 543 detail
Spanning tree interface parameters for VLAN 543
Interface name : ge-1/1/0.0
Port identifier : 128.617
Designated port ID : 128.617
Port cost : 20000
Port state : Forwarding
Designated bridge ID : 33311.28:c0:da:40:40:01
Port role : Designated
Link type : Pt-Pt/EDGE
I thought this might be that we hadn't explicitly created VLAN1 on the Junipers -
something we've had to do in older JunOS versions to get LLDP to work properly -
but that's not the answer in our case. I'm still poking around, but if you come up with
the answer first please post it here!
07-16-2012 06:08 PM
This isn't an answer, but it may be helpful.
I was just messing with this over the weekend between a 3560 and 2200 running 12.1R2. Worked fine using vstp with the 3560 as the root. When I changed to make the 2200 become root, the 3560 got cut off because the 2200 blocks facing the 3560.
Poking through the kb pulled up KB18291. Hmm, and I also just found KB22111. Neither the explicit answer but it did give me something to try. After reading through the first article I tried setting up RSTP in parallel with VSTP. That got the port unblocked.
I also found that if the Cisco side was changed from rapid-pvst to pvst, it worked fine with just VSTP on the EX side.
That's as far as I got. I don't know what is happening on other vlans since the trunk I was using only had vlan 1 on at at the time. I was going to mess with it further but got sidetracked into something else after it was working again.
-Chad
07-17-2012 07:52 AM
07-17-2012 11:08 AM
11-27-2012 01:54 PM
Did you ever get a resolution
12-05-2012 05:10 AM
There is no real resolution yet, but a workaround has been figured out...
Somehow this issue is related to dhcp-snooping and/or dynamic arp inspection. Removing it helped to prevent looping/blocking.
I am not sure whether all of the following statements are involved here - however at least the first one can cause it:
1) set ethernet-switching-options secure-access-port interface *uplinks* dhcp-trusted
2) set ethernet-switching-options secure-access-port vlan vlan_X arp-inspection
3) set ethernet-switching-options secure-access-port vlan vlan_X examine-dhcp
A PR for this should have been opened up in august but since now it seems to be not visible in the PR Tracking Tool.
If you have the same issue you should contact the support - it might help them to fix this...
12-28-2012 03:59 AM
As a short update the Junos versions we were told in which this issue will be fixed:
11.4R7 12.1R5 12.2R3 12.3R1