I’ve been working with a company that integrates with several partners. One of these partners requires an AWS to ASA VPN to access their services.
That shouldn’t be a problem at all of course. The company in question has ASA’s running Firepower Threat Defence, which supports site-to-site VPN’s in a very similar manner to the traditional ASA.
So, I configured an ‘always on’ policy-based VPN (No VTI support in FTD yet), which seems to work fine. Well, for a while anyway.
So, What’s the Problem?
On further investigation on both sides, we found that the VPN tunnel was dropping for a few seconds, and coming back up.
While running the continual ping, we saw that there were two pings lost consistently every hour. This doesn’t sound like much, but it did make SQL unhappy. The continual ping also made sure that there was no idle timeout causing this problem.
There were no alerts on our internet connection or any other part of the network. Transport seems to be stable, and there was no packet fragmentation.
With the basics out of the way, it’s time to look deeper. Looking into the logs in FMC, I found these errors:
Removing peer from correlator table failed, no match Rejecting IPSec tunnel: no matching crypto map entry for remote proxy
While it looks like we’re onto something here, AWS reports that this is an expected error in some cases. AWS provides an option to configure a backup VPN tunnel. When we don’t use a backup tunnel, we get these errors. In this case, we can ignore these logs.
In the FTD device, we can still connect to the classic ASA CLI. From here we can run the old commands that we’re used to, such as show vpn-sessiondb l2l.
That command shows us, among other things, how long the session has been up.
From this, I was able to see that the session never went over 60 minutes. In fact, it was dropping exactly at 60 minutes. Definitley looking like a timer expiring somewhere.
We now have two debugs to run:
debug crypto isakmp 127 debug crypto ipsec 127
These debugs help us to determine if there’s a problem with phase-1 or phase-2 stability.
And what do you know… The device on the AWS side of the tunnel is sending a termination message every hour.
I discussed this with TAC, and they agreed that this should be a negotiated value. That is, the two IKE peers should decide on using the lower value. But this doesn’t seem to be working.
Normally when this timer expires, the peers should negotiate new session keys. This should be transparent, and not drop any data. In my case, AWS was ready for a new key, but the ASA wasn’t. This caused the entire session to drop, and a new session to be created from scratch.
The ultimate fix was to manually configure the two endpoints to use the same values. AWS is not flexible on this point, so I reconfigured the ASA. Once that was done, the tunnel was 100% stable.
Friday July 20, 2018