SRX Services Gateway
Reply
Contributor
ecables
Posts: 39
Registered: ‎07-25-2011
0

Active sessions timeout @ 14,400 seconds (4 hours)

SRX240H-POE running 11.1R3.5 in a HA cluster

 

I've been battling this issue for a while now; thought I would pose it to this forum to see if anyone else had seen a similar behavior, or get suggestions on how to diagnose/troubleshoot the problem.

 

The observed behavior is that any TCP session that is active for longer than 14,400 seconds is torn down.  This includes BGP sessions from the SRX itself, or transit sessions from clients to servers between zones (trust -> untrust, for example).

 

The session expiry seems to only affect the session-id in question, and not the forwarding abilities of the SRX itself. Let me provide an example scenario.

 

 - Client A connects to Server B via SSH (tcp/22)

 - Once the session duration reaches 14,400 seconds the SSH connection is lost

 - If Client A runs a continuous ping to Server B when the 14,400 duration occurs, no packets are dropped

 - If Client A opens a second SSH connection to Server B, that session is also tied to the 14,400 session expiry, and is maintained when the first session times out/closes.

 

Is there an internal timeout of 14,400 configured somewhere in the SRX that I am not seeing?

Recognized Expert
Visitor
Posts: 121
Registered: ‎08-30-2010
0

Re: Active sessions timeout @ 14,400 seconds (4 hours)

Hi,

 

The ideal time out of the srx is 30 min i,e 1800 sec.In your case you mentioned that the session is torn apart after 14,400 sec.Is it the ideal timeout? I am not sure why it is showing 14,500 maybe you have defined some custom application.

Can you paste the custom application configuration and the output of command 

>show security flow session summnary 

 

Regards,

Visitor

Contributor
ecables
Posts: 39
Registered: ‎07-25-2011
0

Re: Active sessions timeout @ 14,400 seconds (4 hours)

[ Edited ]

> show security flow session summary 

node0:

--------------------------------------------------------------------------

Unicast-sessions: 26

Multicast-sessions: 0

Failed-sessions: 0

Sessions-in-use: 38 

Valid sessions: 26 

Pending sessions: 0 

Invalidated sessions: 12 

Sessions in other states: 0

Maximum-sessions: 65536

 

node1:

--------------------------------------------------------------------------

Unicast-sessions: 16

Multicast-sessions: 0

Failed-sessions: 0

Sessions-in-use: 16 

Valid sessions: 16 

Pending sessions: 0 

Invalidated sessions: 0 

Sessions in other states: 0

Maximum-sessions: 65536

 

What I have noticed is that when a new session is created, node0's "Current timeout:" field in the session is maintained based on the activity of the session, hovering around 1780-1800 seconds:

 

> show security flow session session-identifier 50159 | match timeout        

Maximum timeout: 1800, Current timeout: 1788

 

The same session, mirrored to node1 shows the following output for its "Current Timeout" field:

> ...on-identifier 10350 node 1 | match timeout        

Maximum timeout: 1800, Current timeout: 2014

 

Node1's timeout started at 14400, and decrements down to 0, which coincides with the session teardown that I'm observing.


Can anyone else compare node0 and node1 current timeout values to see if the secondary node timeout value is being refreshed?

Contributor
ecables
Posts: 39
Registered: ‎07-25-2011
0

Re: Active sessions timeout @ 14,400 seconds (4 hours)

Here's another example of the secondary node's timeout values, all of which begin at 14400 seconds, and decrement to zero regardless of the session activity.  Is this normal behavior?  How is the secondary node's Timeout value higher than the "Maximum timeout" value specified in the session details (1800)?

 

> show security flow session node 1 | match Timeout    

Session ID: 10350, Policy name: 5/5, State: Active, Timeout: 1544, Valid

Session ID: 10502, Policy name: 5/5, State: Active, Timeout: 4024, Valid

Session ID: 10518, Policy name: 1/6, State: Active, Timeout: 4222, Valid

Session ID: 10880, Policy name: 5/5, State: Active, Timeout: 8210, Valid

Session ID: 11058, Policy name: 1/6, State: Active, Timeout: 10086, Valid

Session ID: 11091, Policy name: 5/5, State: Active, Timeout: 10500, Valid

Session ID: 11139, Policy name: 1/6, State: Active, Timeout: 10940, Valid

Session ID: 11172, Policy name: 1/6, State: Active, Timeout: 11298, Valid

Session ID: 11303, Policy name: self-traffic-policy/1, State: Active, Timeout: 13062, Valid

Session ID: 11315, Policy name: 1/6, State: Active, Timeout: 13236, Valid

Session ID: 11363, Policy name: 1/6, State: Active, Timeout: 14214, Valid

Session ID: 11389, Policy name: self-traffic-policy/1, State: Active, Timeout: 1792, Valid

Recognized Expert
Visitor
Posts: 121
Registered: ‎08-30-2010
0

Re: Active sessions timeout @ 14,400 seconds (4 hours)

[ Edited ]

Hi,

 

The tear down should not be based on node 1 timeout. The node1 timeout is always 8 times the node0 time out.

When the session is nearly 60 sec or something like this on node 1,what is the session status on node0?

What version of junos you are using?Also provide me the output of >show chassis cluster status

 

Regards,

Visitor

Contributor
ecables
Posts: 39
Registered: ‎07-25-2011
0

Re: Active sessions timeout @ 14,400 seconds (4 hours)

[ Edited ]

Sorry, I indicated the version in the first post, but here it is again:

SRX240H-POE running 11.1R3.5 in a HA cluster

 

To test my theory that the secondary node is somehow causing the session teardown, I powered off node1, and a session has now surpassed the 4 hour mark.  Prior to this point no single session has ever exceeded 14,400 seconds.

 

Here is one of my sessions now:

> ...n session-identifier 50159 | match Duration       

Start time: 332551, Duration: 14711

 

Sooo, what is the mechanism causing the secondary node to disrupt sessions @ the 4 hour mark?

Distinguished Expert
keithr
Posts: 979
Registered: ‎09-10-2009
0

Re: Active sessions timeout @ 14,400 seconds (4 hours)

I had a very similar issue on a ScreenOS NSRP cluster.

 

The key to fixing that was to "set nsrp rto-mirror session ageout-ack".

 

I wonder what the equivalent would be for JSRP?

-kr


---
If this solves your problem, please mark this post as "Accepted Solution."
Kudos are always appreciated.
Recognized Expert
Visitor
Posts: 121
Registered: ‎08-30-2010
0

Re: Active sessions timeout @ 14,400 seconds (4 hours)

Hi ,

 

The only way this could happen is because of some thing like this :

 

- The node 0 is primary and the session establishes between  the end peers and session time out is 1800 node0 and 14400 node 0

- The session are sync for the very first time but not aftewards or the device fail to secondary for traffic.

- Once that is done the sessions never remains constant at 14,400 but decreases down.

 

This is not an ideal behaviour.Check the chassis cluster status? Do you see an RG doing back to back failover?

Have to configure prempt on the configuration for the chassis cluster? In case you have try removing them and test again.

 

Regards,m

Visitor

Contributor
ecables
Posts: 39
Registered: ‎07-25-2011
0

Re: Active sessions timeout @ 14,400 seconds (4 hours)

So the cluster (before I powered down node1) was configured as such:

node0 primary for rg0 and rg1

node1 secondary for rg0 and rg1

 

No failover occurred, and control-plane statistics appeared normal between the peers.  The 'show chassis cluster status' showed everything functioning per the documentation (primary 255, secondary 1) for rg0 and rg1 on node0 and node1 respectively.

 

The session timeout value on node0 refreshed as data was seen on the session, while node1 began at 14,400 seconds and decremented to zero, never refreshing the timeout value based on session activity.

 

So do session timeout values get updated over the fabric links or some other mechanism?  In what situation would node0 not update node1 on the "activeness" of a session, resulting in the timeout value being refreshed on node1?

Contributor
ecables
Posts: 39
Registered: ‎07-25-2011
0

Re: Active sessions timeout @ 14,400 seconds (4 hours)

Continuing this test, I've configured the junos-ssh inactivity timer to be 30 seconds, so that I can work in a smaller window than 4 hours.  The results are that SSH connections through the SRX are terminated once node1's timeout value reaches zero.

 

Here's some interesting output using a 'show security flow' with the src/dst pair specified:

node0:

--------------------------------------------------------------------------

Session ID: 31363, Policy name: 1/6, State: Active, Timeout: 30, Valid

node1:

--------------------------------------------------------------------------

Session ID: 255, Policy name: 1/6, State: Active, Timeout: 1700, Valid

 

node0:

--------------------------------------------------------------------------

Session ID: 31363, Policy name: 1/6, State: Active, Timeout: 30, Validnode1:

--------------------------------------------------------------------------

Session ID: 255, Policy name: 1/6, State: Active, Timeout: 1698, Valid

 

node0:

--------------------------------------------------------------------------

Session ID: 31363, Policy name: 1/6, State: Active, Timeout: 30, Valid

node1:

--------------------------------------------------------------------------

Session ID: 255, Policy name: 1/6, State: Active, Timeout: 1682, Valid

 

 

And here's where things get weird, watch the Timeout value on node1 drop from 1682 to 26 in a matter of seconds:

 

node0:.

--------------------------------------------------------------------------

Session ID: 31363, Policy name: 1/6, State: Active, Timeout: 30, Valid

node1:

--------------------------------------------------------------------------

Session ID: 255, Policy name: 1/6, State: Active, Timeout: 26, Valid

 

node0:

--------------------------------------------------------------------------

Session ID: 31363, Policy name: 1/6, State: Active, Timeout: 28, Valid

node1:

--------------------------------------------------------------------------

Session ID: 255, Policy name: 1/6, State: Active, Timeout: 22, Valid

 

The node1 timer continues to decrement until it hits zero, while the node0 timer remains near 30, and the session is terminated.

 

 

 

Copyright© 1999-2013 Juniper Networks, Inc. All rights reserved.