ASA Cluster Design Considerations
Last Updated: [last-modified] (UTC)
Welcome to ASA Cluster Design. If you’re reading this, you probably have a plan deploy an ASA cluster. If that’s the case, make sure you understand how clusters work before you dig into the design phase. To this end, please consider the article on ASA Clustering first. When you’ve read it, then move on to ASA Cluster Configuration.
This is a good opportunity for a quick review of the two ASA software types. There’s the traditional ASA image, also known as “ASA with Firepower Services”. The “Firepower services” part is optional. This ASA code has been around for years. Firepower services, if you have it, will run as a separate software module.
The other software type is Firepower Threat Defence. This is the new ‘unified’ software image. This takes the ASA firewall code and the Firepower IPS code and combines them into one platform.
You may be aware of ASA 5500-X series appliances and Firepower appliances. While they’re quite different internally, both appliances can run either software image. In this article, the term ASA refers to both appliances, unless otherwise specified.
Hardware and Software
To narrow down your selection, first be aware of which models support clustering. In the 5500-X series platform, anything from the 5515-X and up is supported. The 5512-X is also supported but needs to run the Security Plus license. Only the traditional image supports clustering for now.
The 4100 and 9300 series support clustering with the traditional and FTD software. Today, clustering is not available on the 2100 series appliance with FTD. This will likely be added soon.
Next, consider how many members will need to run in the cluster. You may not have the answer to this yet, but there is a critical piece of information to be aware of. All appliances from the 5512-X to the 5555-X can have two members per cluster.
If you need more members, you will need to look at the 5585’s or the Firepower appliances. The 5585’s can have up to 16 members per cluster. Currently, the Firepower appliances support up to 6 members per cluster. They seem capable of far more, but aren’t yet validated. In the future, they too will support up to 16 members.
The tricky part is to determine what performance you will need. There are three main metrics for you to consider:
- The peak throughput (or bandwidth)
- The maximum number of connections
- The maximum number of connections per second
Below are two tables with approximate capabilities of the various ASA’s on offer. The values here come from Cisco’s data sheets, which are in the reference section below. When you examine these values, keep in mind, they are the best case scenarios for a single appliance. They are only a ‘comparison rate’ to compare different models.
During your plan, you will need to factor in realistic performance values. Your account manager can help you get these values. You will also need to plan for growth.
Clusters are very scalable. But you can’t double your performance by doubling the number of members. Use these guidelines:
- You will get 70% of the total throughput of all members in the cluster
- You will get 60% of the total connections of all members in the cluster
- You will get 50% of the connections per second of all members in the cluster
Consider this example. You need 1.5 Gbps of throughput, 800K connections, and up to 40K new connections per second. In this case, you could consider a cluster of two 5555-X’s.
|Connections per Sec
|5585-X (SSP 10)
|5585-X (SSP 20)
|5585-X (SSP 40)
|5585-X (SSP 60)
|Connections per Sec
|9300 (1x SM-24)
|9300 (1x SM-36)
|9300 (1x SM-44)
|9300 (3x SM-44)
Switches and Routers
The cluster is usually connected to upstream switches. This is definitely the case with spanned-etherchannel. Any upstream switch should be fine, as long as it runs LACP. The good news is Cisco has validated a range of their switches. To see an up to date list, see the BRKSEC-3032 presentation, page 9.
For the sake of redundancy, use more than one upstream devices. This is easy in individual interface mode, due to the routed interfaces (more on this later). In spanned-etherchannel mode, the switches will need to use vPC or VSS.
If vPC is your choice, use the peer-switch command. This makes the vPC switches respond as if they were a single unit. This prevents unnecessary usage of the vPC peer-link. If you need to know more about vPC, have a look at the Virtual Port Channels article.
Now for a tricky one. Do you want to use spanned-etherchannel? Will you use vPC to connect the switches to the cluster? Do you also need to use dynamic routing? If you answer yes to all three, pay attention to this next part. Dynamic routing over vPC is not supported in all cases. You will need to make sure you have a suitable Nexus switch. If you have a 7000 series, you will also need to use appropriate line cards. This is a new-ish feature, so you also need to run an appropriate version of NXOS.
Generally, the newer releases have more features. The normal considerations apply here. Select a version known to be stable, read the release notes, and be aware of any caveats. Also consider that the ‘even’ releases (9.4, 9.6 and so on) are the extended maintenance releases. The odd releases are standard maintenance.
Do you need Firepower? If so, you may want to run FTD. Currently, FTD clustering is only supported on the Firepower 4100 and 9300 appliances. If you want to cluster Firepower appliances with FTD, you will need FTD 6.2 or newer. Each FTD release adds more features, so stay up to date with the latest release information.
Clustering has been around for a while on the 5585 platform. Version 9.1(4) saw the addition of clustering to other 5500-X series. Unfortunately, clustering on the 5500-X platform is only in the traditional image. It is unlikely that FTD will support clustering on the 5500-X platform until at least 2019.
There are two interface modes you can choose from; spanned-etherchannel and individual. In general, try to use spanned-etherchannel. Of course, you may have a good reason to use individual interfaces, which is fine.
Do you plan to use transparent mode? You will need to use spanned-etherchannel. Do you plan to use contexts? The interface mode needs to be the same for all contexts. If one context is transparent, all contexts will need to use spanned-etherchannel mode.
Are you going to use spanned-etherchannel mode? Be sure to pay close attention to the load balancing section below.
If you’re going to use individual interfaces, you can select between PBR, ECMP, and ITD. In general, ECMP is the simplest to use. If you already have PBR or ITD in your environment, they may be more suitable for you.
When using PBR, remember to use object tracking. This way, if an ASA fails, PBR will be ‘aware’ of it and will not forward traffic to an unreachable member. If you want ITD, remember it is an extra license on the Nexus platform.
Load Balancing and Hashing
In spanned-etherchannel mode, the cluster connects to switches with etherchannel. The switch handles the load-balancing of connections over the cluster. The switch’s hashing algorithm will determine how effective it will be.
There are two good choices for hashing algorithms. These are src-dst-ip-l4port and src-dst-ip. Both of these factor in source and destination IP address. As a general rule, use L4 port number as well, as it provides a more granular spread of traffic.
Do not use VLAN in the hashing algorithm. This is because the VLAN will change as the traffic passes across the ASA interfaces. A change in VLAN would result in a change in hashing. A change in hashing would result in asymmetric traffic.
There is case where src-dst-ip may be the better choice. When the cluster uses PAT, it is assigned a pool of IP’s. Each cluster member takes one or more IP’s from the pool to use with PAT. Each IP in the pool can only belong to one member.
Think about what happens if hashing uses the L4 port. Each PAT IP will support several connections with different ports. These connections will all have different hashes, which are spread across the links. This means that the traffic for a single IP will be asymmetrically spread across the cluster.
If you use PAT in your environment, consider using a hashing method that uses IP addresses, not ports.
Cluster Control Link
The CCL is vital in the cluster. If the CCL goes down, the member is removed from the cluster. If possible cable the CCL with multi-chassis ethernet, such as vPC or VSS.
The CCL throughput should match or exceed the throughput of the data links. Why? In a worst case scenario, all data packets are forwarded across the CCL. Also, state updates use the CCL. To plan for this scenario, try to size the CCL to 110% of the data link. For example, if each member has 4x 1Gbps data links, then the CCL on each member should also have at least 4x 1Gbps links.
In an ideal scenario, data traffic is completely symmetric. In this case, the CCL is not used for data traffic, only state updates. This is why it’s important to try to get the hashing algorithm right. Eliminating NAT where possible will also help.
In the case of an inter-site cluster, the DCI should have a one-way latency of 20ms or less. Versions before 9.6 recommended 5ms or less. If the latency is higher, for example 50ms, you are unlikely to have a problem. 20ms is the highest latency that Cisco have validated.
Now for the most critical part. The CCL passes data packets between members with a 100-byte trailer. This means the MTU on the CCL must be at least 100 bytes higher than the data interfaces. If this isn’t the case, the messages will have to be fragmented. This will cause performance problems.
Network Address Translation
Do you need NAT in your environment? Probably. But, be aware that NAT causes traffic asymmetry. This happens because NAT changes IP addresses and ports. This causes the hashing algorithms to get a different result on ingress and egress.
Do you use PAT? Probably. It’s common for an edge firewall. PAT needs to use a pool of outside IP addresses. The number of IP’s in the pool must match or exceed the number of members in the cluster. That is, each member must have at least one IP address for PAT.
If there aren’t enough IP’s in the pool, members will not get an IP, and PAT will fail on that member. This happens regardless of what resources are available to other members.
Do you want to use interface PAT? If you do, you will need to use spanned-etherchannel interface mode.
From an design perspective, think about whether you can do NAT on another device in the network. NAT is the natural enemy of ASA clusters.
Do you need to use VPN? For now, Remote Access (RA) VPN is not supported.
Clustering supports site-to-site VPN’s, but they are a centralised feature. They should be decentralised in 9.8.1, which is planned for release sometime before the end of April 2017.
If you do need VPN, consider whether it needs to be on the ASA cluster, or whether it could be on another device.
Management interfaces should be in individual interface mode. This is to make members accessible during a CCL or data link failure. If the data interfaces use spanned-etherchannel, management interfaces need to use static routing.
If possible, consider using a separate out-of-band management network. Or at least, use a separate VLAN for management traffic.
Do you need to use AAA for network access? If you do, be aware that this is a centralised feature, and will not scale by adding more cluster members.
Cisco – ASA Cluster (v9.6)
Cisco – Cisco Firepower NGFW