SRX300 latest Junos 18.4R1.8 possible performance/throughput issues
Wasn't sure if this needed to go through the SRX forum or Junos. Sorry, may be a little long winded.
I wanted to post here to be informative and possibly get some additional help and things that can be done to help track down an issue like this. Since this is personal for home use I don't have support on the SRX300. But I wanted other people here to know as well.
I am running an SRX300 at my house which I use to learn, test, and try new features of Junos on. This SRX300 is connected to gigabit service and performs flawlessly (until 18.4) running as a gateway for 10 security zones, basic firewall functions, and NAT. I can hear what some people are thinking. It was previusly running 18.3R1.
This helps with my daily job as we work with Junos for SRX/EX/QFX platforms. Still learning the ins/outs of Junos after 2 years.
So I installed the latest 18.4R1.8 that came out 12/21/2018. Upgrade smoothly and everything seemed fine. I was performing some downloads that nearly to saturate the 1Gbps link using multiple sessions. This was done on prior releases with no issues. During this high throughput scenario the throughtut dropped to about 20 Mbps and latency went to 400-1000ms. Performance was suffering.
I check the messages log and saw alerts about CPU threshold crossed and to expect packet loss.
I checked the "show chassis routing-engine" and the CPU looked great. The I found the command "show security monitoring fpc 0" had output that showed the CPU Utilization at 100%. In this case, memory looked good and session flows were what I was expecting based on previous experience.
If I killed the high throughput download everything came back down to normal and all was fine. So I was able to reproduce the drop in throughput and the increase in latency.
I decided to check "show pfe statistics traffic" which was giving me the current pps. I was sitting around 97k pps during the test. From what I can see I felt like I was still within the limits of the hardware. Someone please correct me if I am wrong and/or interpreting this incorrectly.
From this point I didn't know what else to look at so I decided to roll back Junos versions. I rolled back to Junos 18.3R1 service release S1.4. Re-testing the scenario and everything is working fine. The output of "show security monitoring fpc 0" showed the cpu at less than 70% and the pfe statistics showed the same for pps. Everything was humming along fine, full throughput as expected and no change in latency.
I assume there is some bug in Junos 18.4? Does anyone have any suggestions on additional troubleshooting or other data I could gather to track down what the issue may have been?
It definitely sounds like a bug becuase the platform, the configuration and the traffic pattern is the same, Thanks for the headsup, if I found more information related to your situation I will pass it along. Please let us know if the traffic is inspected/affected by any extra features different than the regular security-policies and NAT.
As for now I will advise to stay with the recommended junos version for the SRX300 (15.1X49-D150) unless you are looking to implement a feature thats only available in the lastes 18 version:
I would like to stay on what is the recommended Junos version but we have run across bugs within those as well on multiple occassions. We actually run alot of stuff on the 17.4 code due to features that we need. Of course 17.4 has its own set of bugs to work with. SoI have been experimenting on the side with newer releases like this.
But this is also what frustrates me about Junos and the recommended versions. New builds keep getting released with new features that make these platforms competitive with other vendors and/or really make a huge difference in the functionality and manageability of the platform (custom mgmt_junos routing-instance, Unified Security Policies, SSL Inspection Whitelist Exemptions based on Custom URL Categories, on box python scripting with event scripts for SRX). I could go on and on. These features don't seem to be back ported to what is the recommended Junos version.
At what point will Juniper move on from 15.1x49 and start recommending the newer builds? Why keep releasing new features for the last 2 years (how long since I started using Junos) if you aren't going to recommend the code version that it runs on?