Spanning Tree and VLANs in the Campus

Last Updated: [last-modified] (UTC)

In the CCDP ARCH exam, VLANs in the campus are important. One reason they’re so important is because of how they affect spanning-tree.

A good spanning-tree design is important to limit fate-sharing. Fate sharing is where a problem in one part of a network affects another part of the network.

VLANs in The Distribution Block

Stability

Recommended Resource
Designing for Cisco Network Service Architectures (ARCH) Foundation Learning Guide: CCDP ARCH 300-320 (4thEdition) (Foundation Learning Guides)

The best solution, if possible, is to limit a VLAN to a single access switch. This is the routed access-layer design.

While this is a nice design, it is not always possible. In this case, you may have VLAN’s spanning several switches. This requires a bit more thought, as FHRP’s and spanning-tree come into play.

One issue that can occur is unicast flooding. Consider the example in the diagram below.

Traffic from a server passes through an access layer switch. It travels to the active HSRP switch in the distribution layer, then up to the core.

There are ECMP links between the core and distribution layers. So, when return traffic comes down from the core, it selects a link to the other distribution switch. This leads to the distribution switch flooding to all connected access layer switches.

So why does this happen? In short, it’s based around ARP and CAM timeouts.

The CAM table is a list of MAC addresses, and the ports they bind to. The ARP table is a list of MAC to IP bindings. Each entry in these tables has a timer. When the timer expires, the switch removes the entry.

In the example above, the standby node knows the IP to MAC binding from ARP. But, the CAM entry has expired, as no traffic has been passing through. To find the port the MAC is on, the switch will flood the network, which is sub-optimal behaviour.

Tune this behaviour by changing the ARP cache timer to be equal or shorter than the CAM timer. By default, the ARP timer is 4 hours, and the CAM timer is 300 seconds. The switch sends ARP’s more often, which refreshes the CAM entries.

There is another issue with this network. Spanned VLANs lead to larger layer-2 networks. This generally means more flooding and broadcast traffic.

One way to tune this is to use VSS at the distribution layer. This still allows VLAN spanning but includes some improvements to the topology. For one, FHRP’s are no longer required, as the pair of distribution switches appear as a single device. Also, VSS uses a shared control plane, so the switches share the MAC addresses that they learn.

If you need to span VLANs, stop and think for a moment. Do you need to span all your VLANs across every switch? Manually pruning VLANs off trunk links is useful here.

Trunking Recommendations

There are trunks between layers in the layer-2 and virtual-switch designs. There are a few best practices that may be of use here.

For one thing, VTP may be useful for the virtual-switch model but is not recommended otherwise. Instead, use manual pruning of VLANs on each trunk.

If you are going to use VTP, use version 3, which is a bit safer. Make the distribution switches VTP servers, and the access switches VTP clients.

Second, think about optimising convergence. Rather than using DTP, hard-code the trunk to ‘on’ with switchport nonegotiate. This saves about 2 seconds of downtime by cutting out DTP negotiation.

Third, think of security. Assign an unused VLAN ID as the native VLAN. This prevents a VLAN hopping attack.

Spanning Tree

Spanning-tree was first defined in 802.1D. When there was a link failure, it could take the network 30 – 50 seconds to converge. This may have been acceptable back then, but today is a different story. Now we need to worry about applications like voice, video, and virtual desktops. To address this, there are a few improvements in modern versions.

More recently, routed networks are becoming more popular. It supports fast convergence, redundant links (ECMP), and is easy to configure. This is in contrast with spanning-tree. STP blocks links, has a longer convergence time, and increases the size of the failure domain.

So why not deploy a routed network, and disable spanning-tree? There are two reasons. One, some networks need to span VLANs across switches. This is not possible across routed links. Two, what if someone accidentally (or maliciously) creates a loop in your network? Your old friend spanning-tree will be there to save you.

Spanning-Tree Tool Set

There are several spanning-tree tools at our disposal:

BPDUGuard shuts down a port if it receives a BPDU. Enable this on any port where BPDU’s are not expected, such as client facing ports.

BPDUFilter ignores any BPDUs seen on a port. This disables spanning-tree. While there may be some rare corner-cases where this is useful, it can be dangerous. Use this with caution.

RootGuard shuts down a port if it receives a more ‘superior’ BPDU. This is for ports where the root bridge should not exist. This tool prevents an attacker from inserting their own switch. This could result in them taking over the spanning-tree domain.

LoopGuard looks for BPDU’s on a port and shuts down the port if BPDU’s stop flowing. BPDU’s are a critical component of loop prevention, so if they stop flowing, a loop could form. This may happen if the OS or spanning-tree process crashes, or in the case of a unidirectional link. Enable this on any port that connects to another switch.

There are also a few old deprecated spanning-tree tools. UplinkFast and BackboneFast are Cisco enhancements to Classic Spanning-Tree. Newer versions of spanning-tree (for example, RSTP) has this functionality built in.

Spanning-Tree Design

It is becoming common to use a routed network design. In a network like this, there is an argument that disabling spanning-tree adds to security. The reasoning is that an attacker is unable to compromise a root bridge if there is no spanning-tree.

While this makes sense on the surface, there’s another aspect to consider. What would happen if an end-user accidentally plugged a cable into two wall sockets? If there is no spanning-tree at all, a loop could form. This is a higher risk than an attacker trying to compromise the root bridge.

The recommendation is to leave spanning-tree turned on. In the routed model, use BPDUGuard to protect ports facing clients or servers.

In the layer-2 model, spanning-tree extends further than the access layer. There is a greater need for planning in this model.

Control where the root bridge lives. The best place for the root bridge is in the distribution layer. One of the distribution switches can be the root bridge, and the other can be the backup.

Avoid allowing an access layer switch to become a root bridge. Do this by enabling RootGuard on most ports in the distribution layer.

Use LoopGuard everywhere to protect against BPDU loss.

There are several flavours of spanning-tree available, so it’s important to choose wisely.

Avoid Classic Spanning-Tree (CST). This is the original 802.1D standard and is very slow to converge (30-50 seconds). Also avoid PVST, which is almost the same as CST, but allows port blocking on a per-VLAN basis.

Instead, consider using one of these options:

Rapid STP (RSTP) is an industry standard and will converge much faster than CST. Like CST, this does not create root bridges per VLAN, blocking entire links.

Per-VLAN Rapid STP (PVRSTP) is a Cisco enhancement of RSTP. This version creates root bridges for each VLAN, allowing all links to be active in some way. This is the recommended version in an all-Cisco network.

Multiple STP (MST) is also an industry standard. It creates separate instances of spanning-tree for groups of VLANs. This is in contrast to other versions, like PVST, which creates an instance for each VLAN. This is less wasteful of resources. Use this version in multi-vendor environments.

Flex Links

Flex-Links are a pair of layer-2 links that connect to different upstream switches. An access switch may connect to two distribution switches using flex links.

In a flex-link pair, one link will be active, and the other will be standby. Only the primary link forwards traffic. The secondary link will take over when the primary fails. When restoring the primary, it becomes active. The secondary transitions to standby again.

This is an alternative to spanning-tree. It provides backup paths, without creating loops. In fact, the switch disables spanning-tree on flex-link ports. Flex-links must be layer-2 interfaces. VLAN SVI’s and routed ports cannot be flex-links.

If you only have a pair of links to use, this may be a better solution than spanning-tree. There are no listening and learning states to transition through. And, there’s less complexity without a root bridge.

Unfortunately, this won’t scale well. It adds administrative complexity when you have to configure many pairs of links.

References

Cisco Live – BRKCRS-2031 – Enterprise Campus Design: Multilayer Architectures and Design Principles

Cisco – Flex Links

OmniSecu – Spanning Tree Protocol (STP) UplinkFast, Backbone Fast and Portfast