SRX Services Gateway
SRX Services Gateway

SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-16-2014 02:08 PM

Hi, all,

 

We have been using SSG5 to build IPsec tunnels to our HQ for our data center servers IPMI access, occasionaly we will need image servers over IPsec VPN, we never had any problem with SSG5. Now that SSG series is EoS and SRX series seem to have far better performance than SSGs. So for a brand new data center we purchased SRX210H to replace SSG5, we had no problem to bring up IPsec tunnels and run BGP to head end ISG 2000, but when we push production traffic through SRX210 (the IPsec VPN traffic rate is far less than 85Mbps specd), SRX210H will have problem to keep BGP sessions up due to hold time expiration, and the follow message keeps poping up:

 

PERF_MON: RTPERF_CPU_THRESHOLD_EXCEEDED: FPC 0 PIC 0 CPU utilization exceeds threshold, current value=97

 

I opened JTAC case, JTAC engineer suggested we upgrade the software to 12.1X44-D35.5, but that did not resolve the problem, as long as there is over 20Mbps IPsec traffic, BGP session over IPsec tunnel will flap.

 

Based on your experience, could this be a simple configuration issue or SRX210 can not handle the same rate of IPsec throughput as an ancient SSG5 can do?

26 REPLIES 26
SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-16-2014 08:11 PM

Hi oldcreek,

 

PERF_MON: RTPERF_CPU_THRESHOLD_EXCEEDED: FPC 0 PIC 0 CPU utilization exceeds threshold, current value=97

 

This indicates that Dataplane CPU ( Traffic ) is very high.

 

These alerts will be seen when any of the ALG resources are used up.

 

Check any ALG is unwantedly  triggered for the vpn traffic .

 

BGP going down could be because of 2 things:

 

1. High Control Plane CPU ( show chassis routing-engine)

2. Packet drops across VPN tunnel

 

Try changing the MTU settings  and check the status;

 

    flow {
        tcp-mss {
            ipsec-vpn {
                mss 1420;
            }
        }
    }
}

As per datasheet , VPN throughput (large Packets ) is 85Mbps.

 

http://www.juniper.net/us/en/local/pdf/datasheets/1000281-en.pdf

 

Regards
rparthi
 

Please Mark My Solution Accepted if it Helped, Kudos are Appreciated Too

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-16-2014 09:28 PM

Hi, rparthi,

 

Thank you for your reply, JTAC escalation has concluded that this is SRX210's limitation, there is no workaround, we will switch back to SSG5.

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-16-2014 10:09 PM

Hi ,

 

may i have the JTAC case number for reference.

 

Regards,

rparthi

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-16-2014 10:29 PM

2014-0814-0787

 

Please do let me know if you have different opinion, we do want to move to SRX platform.

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-16-2014 10:46 PM

Hi ,

 

I went through the case.

it looks like SRX is overwhelmed with huge packets per second rate.

 

is it legimate vpn traffic ? did you capture the packets and confirmed whether these are legimate traffic causing high data plane cpu?

 

You could have tried static routing and tested it......

 

 

if the traffic are legitimate , then engineer analysis holds good,.....

 

Regards
rparthi
 

Please Mark My Solution Accepted if it Helped, Kudos are Appreciated Too

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-16-2014 11:00 PM

Hi oldcreek,

 

What type of VPN are you using, e.g SHA/AES, route based/policy etc? Also are you using the same encryption as you were on the SSG5 vs the SRX?

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-16-2014 11:28 PM

ESP, 3DES/SHA, nothing out of ordinary, since I am running BGP on top of it, it has to be route-based VPN, the encryption proposals are set in head end ISG2000, so yes, SSG5 and SRX are using the same encryption.

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

[ Edited ]
‎08-16-2014 11:39 PM

Of course those are legitimate VPN traffic, maybe SRX210 is having difficulties to keep up, but it should always prioritize control traffic and selectively drop data traffic, right? dropping data traffic will make application slow, dropping control traffic will cause outage.

 

During the debugging with JTAC, I pumped same amount/rate of traffic over SSG5 (by simultaniously copying a large core dump file from one same HQ server to 4 servers behind SSG5 in a nearby data center), SSG5 admin access becomes sluggish but BGP sessions never dropped, and transfer rate is much better than SRX210, where SRX210 would keep up for about a minute or so then totally stopped transmit or receive, CPU utils went to zero, but could not ping tunnel's other side IP address, at this point, manually clear ipsec sa will bring the network back, otherwise it will recover by itself for a longer period of time -- I failed to understand why you bring up static routing could be solution here, not to metion that there is absolutely impossible for us to use static routes to manage the network.

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-17-2014 12:17 AM

Thanks for the info.

 

Is it actually the original 210H or the newer 210HE or even 210H2 model?

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

[ Edited ]
‎08-17-2014 12:21 AM

RE-SRX210HE2

2x GE, 6x FE, 1x 3G

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-17-2014 03:11 AM

Okay so I just setup the following in my lab:

 

- SSG 5 256MB ScreenOS 6.3

- SRX 100B 512MB JunOS 12.1X46 (the SRX210HE2 has a faster CPU and more memory so should be even quicker).

- Another BGP Router

 

I connected the SSG and SRX directly via 100mbit and then setup a route based 3DES/SHA1 ipsec VPN between the two.

 

I then connected both the SSG and the SRX (using a different port on both SSG and SRX) to another router and setup some BGP sessions. Both SSG and SRX received about 300 routes via BGP.

 

I then ran a secure copy (scp) between my MacBook Air on the SSG5 to a low end asus atom box on the SRX100 side. I copied a couple of different iso files (so large files), about 5GB in total.


I was able to just hit 7megabytes/sec between the two machines via the IPsec VPN.

 

While testing I watched my external BGP router and both SSG and SRX kept the BGP session going without issues.

 

I also checked the CPU load on both devices.

 

The SSG 5 was running at 87% CPU (get perf cpu)

, the command line interface was slow and the status light no longer flashed evenly. The device was only just keeping up.

 

The SRX 100 was running at 73% CPU (show chassis forwarding) (Real-time threads CPU utilization). So almost 15% less, the command line interface was also responsive and worked correctly.

 

Therefore I believe there is probably a configuration issue or other bug you are hitting, you should easily be able to push 20mbit/sec of image files through the SRX.

 

Can you please post your SRX configuration for us to look at? Also check to ensure that you are getting full duplex on your ethernet connection.

 

Thanks!

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

[ Edited ]
‎08-17-2014 08:34 AM

Hi,

 

Thank you so much for spending time on this issue, I also would like to believe there is something wrong with my configure that JTAC engineers might've neglected, configuration is attached, the other side is Netscreen ISG2000, which has 50 or so route-based IPsec VPN tunnels configured, and I have no reason to suspect ISG2000 side configuration because ISG2000 side configuration does not treat this SRX peer differently.

 

All interfaces are in full-deplux, uplink to Internet is ge-0/0/0, down links are fe-0/0/2 and fe-0/0/03

 

My setup is a little different from yours, my BGP sessions are running over IPsec tunnel bound st0 interface, not over a seperate interface, regarding traffic rate, I did not collect the real time traffic information, all I know is for the same amount of scp traffic, SSG5 was able to sustain higher transfer rate than SRX210 and never break the network under the load.

Attachments

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-17-2014 11:42 AM
I see you're running a cluster with what looks like a gig module installed? I wonder if that has something todo with it.

I doubt BGP is the cause unless you have thousands of routes.

Things to do/test:
1) Ensure 1350 MTU is set on both sides of the VPN
2) Please disable the existing trace and logging you have enabled. These will slow the device down a lot and should only be on for debugging.
3) After the above try disabling the cluster.
SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-17-2014 11:53 AM

Thanks, if the problem is on forwarding, I am not sure how changing MTU will have any effect, but I will try, trace is disabled. Although it is a cluster configuration, the standby is not actually online.  Do you see anything else that could be wrong in the configuration?

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-17-2014 03:37 PM

Sorry I missed the line: deactivate security flow traceoptions

 

It looks like a pretty basic config, MTU is really something you should test though.

 

Nothing is jumping out at me as anything that should cause your issue.

 

If I get a chance in the next couple of days I will load up your configuration on my device and test it.

 

You can also check: show system processes extensive

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-17-2014 04:46 PM
You may also wish to try removing the IPSec vpn-monitor options, in case cpu or circuit saturation is causing it to tear down the tunnel.
SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

[ Edited ]
‎08-17-2014 07:41 PM

JTAC suggested add VPN monitor option such that when traffic stalled IPsec SA can be re-negotiated faster.

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

‎08-19-2014 06:16 PM

Don't mean to hijack your thread but I've had a case (2012-1129-0977) that was open for 10 months back in Nov 2012 with a very similar issue (a flood of PERF_MON: RTPERF_CPU_THRESHOLD_EXCEEDED messages).  Engineers were on the box for hours taking logs and checking the CPU, etc.  Eventually, they closed the case and suggested I upgrade to 12.1X44.  I just recently upgraded to this version and sure enough the problem is still there.  We do not run BGP, just a simple IPSec tunnel that hits about 40Mbs between 2 SRX220's replicating a storage array.  It seemed to all start when we added a firewall policer, but even after removing it we were still getting those messages.  Hope you find a fix and please post if any updates come along.

SRX Services Gateway

Re: SSG5 vs SRX210H IPsec throughput performance, RTPERF_CPU_THRESHOLD_EXCEEDED

[ Edited ]
‎08-19-2014 09:58 PM

Not at all.  We spent another solid four hours with JTAC because we experienced the same issue with a SRX210 we recently deployed in a remote office (again we used to use SSGs for siminar remote offices, never had a single issue), this remote office's uplink is cap'd at 10Mb/s, yet we lost connection out of nowhere during business hours.

 

I don't think the problem has anything to do with BGP,  when we send (via scp) a large core dump file over IPsec from a server behind ISG2000 (the hub VPN gateway where all remote offices and remote data centers management network connect to) to another server behind SRX210, the transfer rate will start at around 1.5Mb/s, then continuously slow down until the transfer totally stopped, at which point, IP connectivity between the two servers were completely lost and then recover by itself after 200 pings (~400 seconds). Manually clear ipsec sa will recover the connection immediately. It seems to me that encryption/decryption engine on SRX210  somehow just stopped working under small load of traffic. Between exactly the same two servers, if we scp the file over public internet, the transfer rate is over 20Mb/s, so there is nothing wrong with the two servers.