04-02-2012 08:07 AM
Hello guys,
our company owns a couple of ERX'es-310, with following version: Version: 8.2.4 patch-0.18 [BuildId 11363].
We did some throughput testings on it with EXFO devices and noticed that there are ~1% drops. The total swichfabric utilization in bandwidth is far from 10gb/s. But the show fabric-queue command show this when the network is in "idle" mode:
clear fabric
ERX3#show fabric-queue traffic egress forwarded forwarded dropped dropped class slot type packets bytes packets bytes ----------- ------ --------- --------- --------- ------- ------- best-effort all committed 644775 502592569 7 3016 best-effort all conformed 0 0 0 0 best-effort all exceeded 0 0 0 0 ERX3#sleep 9 Please wait.... ERX3#show fabric-queue traffic egress forwarded forwarded dropped dropped class slot type packets bytes packets bytes ----------- ------ --------- --------- ---------- ------- ------- best-effort all committed 2293120 1790155211 14 11901 best-effort all conformed 0 0 0 0 best-effort all exceeded 0 0 0 0 ERX3L#sleep 9 Please wait.... ERX3#show fabric-queue traffic egress forwarded forwarded dropped dropped class slot type packets bytes packets bytes ----------- ------ --------- --------- ---------- ------- ------- best-effort all committed 3926284 3079770141 6 3536 best-effort all conformed 0 0 0 0 best-effort all exceeded 0 0 0 0
#show fabric-queue detail
traffic egress forwarded forwarded dropped dropped
class slot type packets bytes packets bytes
----------- ------ --------- ---------- ------------- ------- ----------
best-effort 1 committed 1663807634 1405200338040 2814184 2380074511
best-effort 1 conformed 176205 123402547 147 69361
best-effort 1 exceeded 0 0 0 0
best-effort 2 committed 377073267 212249576022 20 15368
best-effort 2 conformed 5395647 7877781029 0 0
best-effort 2 exceeded 0 0 0 0
it seems that swichfabric is droping packets why it can be ? the device is using best-effort seeting..
also the uptime on device is ~850 days,.
noticed that traffic procesed on slot 1 is experiencing very big drops, users can easily notice that.
any ideas ?
04-10-2012 05:49 AM
The drop counts displayed in 'show fabric-queue detail' command doesn't necessarily mean a problem.
The drop count is calculated and displayed by comparing the number of packets that were sent by one slot and the number of packets that were received by the other slot. On large or bursty traffic between the cards, the command may report drops all the time because some packets are "in flight" yet to be acknowledged by the receiving card.
As far as the drop count isn't steadily incrementing, it is not a problem. You may see the drop count decrementing or going to zero, which means those 'in-flight' packets were received.
However, incrementing drops seen in the fabric may be indicative of a problem.
Other common reasons can also be if the egress card is oversubscribed or if the egress port is shaped/rate-limited which can backpressure the fabric and result in drops.
If none of the above are true then it may need further investigation.
04-10-2012 08:48 AM - edited 04-10-2012 08:52 AM
hello, thanks for reply..
The most drops are in peak time. I know about these drops 'in flight' but the drop count especialy in peak time increasing very fast with very small decrease (which is as you mentioned "in flight") so it increases and increases. ERX has two user slots one with 8 ports and one with 2. The traffic we have a problem with is on slot 1. Where 6 of 8 ports are forwarding. in about ~50%. In that slot there are 2 lags: One with 4 ports and one with two ports. There are no shaping or rate limiting on ports, just on user subinterfaces. In after work hours the drops are smaller, but still increasing, but then ERX is working good, the problem is only in peak time. When some traffic is redirected to slot 2 the drops is smaller, but since slot 2 has only two ports it;s not enough.
So what I can do ?
Since drops is in one slot, the traffic crosses switch fabric or not ?
Thanks in advance..
04-17-2012 12:48 AM
The traffic does traverse through the switch fabric, even if you see drops on one slot alone.
Since we have LAG in slot 1, have you verified if any of the member ports are at full capacity?
The SA/DA has may result in irregular distribution and if this causes any particular port to run at full capacity, it may backpressure fabric.
And just to make sure that the high drop counts are not directly proportional to the peak traffic , you can perform this check.
1) Clear fabric-queue before peak time starts.
2) Check drops during peak time or just take a high drop count value for reference
3) During non-peak hours, verify fabric queue stats - if the drop count has fallen back to a lower value, then it should be fine.
To verify if there are real drops between Ingress and Egress within the box, you will have to check by plugging in test devices and check statistics for transit traffic through the ERX.