Hi,
Firstly a bit of background.
I have two vmware ESX servers, each connected to a SSG140 ( running 6.2r01 ). I have configured a route based site to site VPN between the two firewalls. I have an any/any policy installed on both devices. The two ssgs are connected to permanant connections on the internet, with a latency of about 15ms between them.
Everythign appears to be ok in terms of connectivity. I can ssh between them, and manage them happily.. However.... ( and heres the problem )
I am trying to do large file transfers ( 20GB+ ) between the two esx servers, using scp or sftp. Sometime into the file transfer i get a stall. See this..
[mph@ESX1 veeam]$ scp bigfile.test veeam@10.5.0.5:/home/veeam/bigtestback.file
mph@10.5.0.5's password:
bigfile.test 10% 140MB 124.4KB/s - stalled -Read from remote host 10.5.0.5: Connection timed out
This is very typical of what i get, though the data volume transfered varys from connection to connection.
From the perspective of the ESX server, this would normally mean that connectivity between the hosts has been lost. However a continuous ping that between the two servers indicates that the network remains up! And that is running down the same IPSec VPN tunnel.
The problem is symetric, it does'nt matter which end initates. I've checked that the servers are ok, i can scp huge files between servers on the same lan.. There doe'snt appear to be any errors on the physical interfaces.
I'm at a loss to explain whats going on here.. Things that i think it could be..
(a) The firewall almost certainly will be having to fragment packets to get them into the IPSec tunnel.. Could the firewalls be running out of resource?
(b) The firewall is just closing the connection down for some reason.. ( though the logs don't indicate they are )
Can anyone else make any suggestions what to check?