vPC Design and Best Practices
Last Updated: [last-modified] (UTC)
Like any technology, you can do vPC the right way and the wrong way. Throughout this article, we’ll discuss best practices and design considerations for vPC.
The keep alive link sends heartbeat messages between vPC peers. It is a layer-3 link, and should be a part of a dedicated VRF. The keep alive link should never be a VLAN on the peer-link. Keep the two links separate, to detect peer-link failures.
A low bandwidth link is suitable for the keep-alive, as it is only carrying small messages. The link does not need to be an etherchannel.
There are three ways to configure the keepalive link:
- A single link between the switches, or several links in a port channel
- A single link between the switches, between the management ports
- Connecting through existing layer-3 infrastructure, such as an out-of-band management network. This may also use the management ports
The method depends on whether the switches are chassis-based or a fixed-port switch. With a fixed-port switch, use connections via the management ports.
A chassis should use a dedicated link or port channel. This is because they use supervisors, often in active/passive mode. Only the active supervisor’s management port is live. If there is a supervisor failover, an active management port may connect to an inactive one. This results in a keepalive link failure.
The peer-link is the most important component of the vPC domain. Protect the peer-link from failure where ever possible. If the peer-link does fail, traffic will not stop flowing completely. It will likely isolate orphan ports, and will prevent new member ports from coming up.
Protect the peer-link by using at least two 10G interfaces in a port channel. This must be a point-to-point link that does not flow through any other infrastructure.
When using a chassis-based switch, be sure to use ports from different line cards. This eliminates the line card as a single point of failure
In some failure scenarios, traffic may be redirected over the peer-link. For example, this can happen if routed ports from the aggregation layer to the core fail. Make sure to use an appropriate peer-link size to support all the throughput. For example, consider a case where 10 vPC’s are configured. Each vPC has two 10G ports, one on each switch. At the largest throughput, 100G can pass through each side. If there is a failure and this traffic needs to pass over the peer-link, then it needs to carry up to 100G.
Additionally, multicast traffic uses the peer-link on a regular basis. Even if there is no failure, the peer-link must be large enough to carry all the multicast traffic.
There is no need to enable UDLD on the peer-link. By default, the peer-link uses spanning tree edge ports, which enable Bridge Assurance. This uses bidirectional BPDU frames, which are capable of detecting unidirectional links.
Use manual VLAN pruning on the peer-link to allow only vPC VLANs. This makes sure that traffic does not travel over the peer-link if it’s not supposed to.
During the design phase, decide which switch should be the primary. Manually specify the priorities to match the design. In any diagrams, depict the left switch as the primary and the right as the secondary. This makes the state primary/secondary predictable, which makes troubleshooting easier.
If Layer-3 traffic is also used on the switches, make sure to use the peer-gateway command. If traffic is sent to the wrong switch, this removes the need to forward the traffic over the peer-link.
Consider using the auto recovery command. This helps to recover from failure without administrative intervention. This command may cause both switches to forward traffic during a split-brain scenario. This could cause a switching loop or duplicate frames. The best advice here is to understand the effects of the command. Decide during the design phase whether it is right in your network.
Failures that auto recovery can help with include:
- Peer-link failure, followed by primary switch failure
- Double switch reload, where only one switch comes back
If there is more than one pair of vPC switches, it is critical that they all use different System ID’s. The System ID is an LACP value, based on the Switch MAC address and the Domain ID. If there is a cabling error, or multi-level vPC’s are used, and the System ID is not unique, there will be a conflict. Keep the System ID unique on each pair of switches. Do this by configuring them with different Domain ID’s.
Another reason to keep the Domain ID’s separate, is that the System MAC is based on Domain ID. The last byte of the System MAC comes from domain number. So, if two different pairs of vPC switches have the same domain, they may also end up with the same System MAC.
There are several short best practice guidelines for connecting devices:
- Wherever possible, use LACP
- The configuration must match on both switches
- Use the same value for the port channel ID and the vPC ID
- If the connected device is another Nexus switch, leave graceful convergence enabled
- If the connected device is not another Nexus switch, disable graceful convergence
- Use as many fields as possible for the hashing algorithm. For example, use source IP, destination IP, Layer-4 port number, and VLAN, if possible
- Do not enable UDLD on vPC member ports. LACP is able to handle unidirectional links
The most important piece of advice is don‘t disable spanning tree! Yes, vPC helps bypass the downsides of spanning-tree. It does not replace spanning tree.
Consider what would happen if someone accidentally or maliciously connected a rogue switch to your network? Or what if there were failures in the keepalive and peer-links? Cabling mistakes? Loops can still happen, and spanning tree will protect you from them.
If you can, try to plan out the network to use a single ‘flavour’ of spanning tree. Nexus switches support Rapid-PVST+ and MST, so they’re good options. Where possible, the vPC domain should be the spanning tree root bridge.
If you can, use the peer-switch command. Both switches will have the same bridge priority. Otherwise, make the configured primary switch the root bridge. This makes management and troubleshooting simpler.
Will there be non-vPC switches in the spanning tree domain? In larger deployments this is likely, so use the spanning-tree pseudo information command.
When the peer-link is configured, it is set to spanning tree port type network by default. This also means that, by default, Bridge Assurance is also enabled. Do not disabled Bridge Assurance on the peer-link.
You may disable Bridge Assurance on the vPC member ports. These ports won’t take part in spanning tree.
If using back-to-back vPC across DCI links,
enable BPDU filtering. This disables spanning tree between the data centres. The reason for this is to prevent a large spanning tree domain from forming across the data centres. If this were to happen, traffic may utilise the DCI links unnecessarily.
Configure the DCI link ports as edge ports. This will cause the links to come up faster if there is a failure. Also, disable Bridge Assurance on the DCI links. The aggregation switches (not the DCI link switches) should have root guard enabled. This is so the DCI link switches can’t become the root of the domain.
Only connect two data centres this way. Connecting more can cause loops. If more are needed, consider OTV or VxLAN.
vPC’s can co-exist with routing. vPC’s are usually used in the aggregation and access layers. vPC’s connect to hosts and other network devices, while routed ports connect to the core. In fact, it is Cisco’s recommendation that vPC is not used to connect to the core. This applies even if the vPC links are ‘routed’ using SVI’s.
When routing over vPC with SVI’s, configure the SVI’s as passive interfaces. This is to prevent many IGP neighbour relationships forming. Use a single SVI for IGP peering.
When a switch reloads, the routing protocols will take some time to reconverge. If the vPC ports come back up before convergence is complete, some traffic may be black-holed. This is because the switch will accept traffic before it has a complete routing table. Use the delay restore command, to give the routing protocols time to converge. The delay restore command delays bringing the member ports up. If used correctly, there will be no packet loss.
By default, delay restore is already enabled with a timer of 30 seconds. This can be tuned from 1 – 3600 seconds. Make sure that the configuration matches on both switch peers.
When this is enabled, tuned it according to the network needs. Base the delay timer around how long it takes for the IGP to converge. A tightly tuned IGP will converge faster and need less delay time. BGP on the other hand will take longer, and need more delay.
When using HSRP (or VRRP), one switch is active on the control plane, while the other is passive. Both are active for the data plane. Get the switch that is the vPC configured primary. For ease of administration make it the active HSRP switch. Also, disable IP redirects on the SVI’s. This prevents passing traffic over the peer-link, resulting in dropped traffic.
When using HSRP (or VRRP), use the default timers. It may be tempting to try to tweak the timers, but sometimes this can harm more than help. Shortening the timers doesn’t benefit failover, as both switches are already forwarding traffic. If there are short timers, there may be more ARP requests, which adds stress to the control plane. The Cisco recommendation is to use timers 1 3.
vPC Object tracking is not recommended in HSRP or VRRP designs. A failure of a tracked port on one of the peer switches will cause it to shut down HSRP SVI’s. When this switch receives traffic, it will forward it across the peer-link to the active SVI. This switch will likely drop the traffic due to the duplicate frame prevention rule
Sample HSRP configuration with vPC: