CCNA QoS Traffic Management

Chapter 1 – Introduction

From the last lesson, we know that QoS organizes traffic into classes. It then performs actions on traffic in those classes.

We’re now going to dig a little deeper, and answer questions such as how many classes do we use? How does rate-limiting work? What is a queue, and why does it need a scheduler?

Chapter 2 – Class Models

You know about classes, but you may be thinking, ‘which traffic goes into which class? And How many classes should I configure?’

Well, let me help simplify this for you. You don’t need to use every combination of class and drop probability. You might keep it simple with a 4 or 5 class model like the one shown here. In fact, this is a very good place to start.

There is a real-time class for voice and interactive video. This is a high-priority class.

Next, a class for critical data. This is for your business applications, databases, and website traffic.

We could split this into two classes. The point would be to give call signaling traffic like SIP a higher priority. If you’re not familiar with SIP, it is the traffic that sets up phone and video calls.

The next category is the best effort category. This would be the default category, so any packets that didn’t get marked would end up here. This is the class-default that we configured in the last video.

And finally, the scavenger class. This is traffic that is even less important than the default traffic. This might include something like online games. You know, things that usually don’t have any business relevance. Often we limit traffic in this class to a small amount of bandwidth.

So what markings might we apply to these categories? There are only five categories here, so once again it’s quite simple.

Real-time traffic gets marked with Expedited Forwarding. This indicates that it needs quick and uninterrupted handling.

Call signaling is also pretty important, as it sets up phone calls in the first place. So it would get a marking of either AF31, or CS3. Either would be fine.

Our other critical data would be a little lower than this, so it might use the CS2 marking.

Best Effort traffic would have the BE marking. This is setting the DSCP bits to all zero. That’s why this is the default class; all zero bits mean unmarked traffic.

The scavenger class is lower again, with the CS1 marking.


Remember that these are all examples, and you could use different markings if you want to. But, this makes it easier to add more classes later.

More complex networks with more applications may need more complex class models. Cisco has a few recommended models on its website. I’ll put a link in the description if you’re interested in seeing them.

Chapter 3 – Queuing and Scheduling

Applying actions to a queue often involves a process called queueing. This is true for actions such as reserving bandwidth.

Each physical interface will have several queues, usually eight. We’re showing four here to keep it simple.

When a router is ready to send a packet out, it will add it to one of these queues, based on its class.

When the interface is ready to send, the packet moves from the queue and the router sends it out of the interface.

There will be many packets that a switch or router has to deal with. But a single interface can only put one packet ‘on the wire’ at a time.

So what happens when there are many packets spread among different queues? How does the router decide which packet to send next?

A component called the scheduler handles this. The scheduler looks at the priority of each queue and will decide which queue to service next.

And that gets to the heart of how QoS prioritizes traffic. The scheduler services a high priority queue more often than a low priority queue.

One very simple option is to use round-robin scheduling. This method cycles through each queue, servicing each one in order.

That is, it will take some packets from queue 1, then queue 2, and so on. After the last queue, it begins again back at queue 1.

The downside is, this gives all queues the same amount of attention. We don’t want that, we want the high-priority queues to have more attention than the low-priority ones.

Weighted Round-Robin is a refined version of this method. The scheduler still services these queues in order. But, it spends more time on higher-priority queues and less on lower-priority queues.

Each queue gets attention, but higher priority queues get more attention.

On a Cisco router, there is the Class-Based Weighted Fair Queuing scheduler. This uses the weighted round-robin algorithm.

In effect, this algorithm provides a guaranteed amount of bandwidth to each queue. But this is only when there’s congestion. When there’s no congestion, classes are free to use more than their allocation.

Another option is Strict Priority Queueing. In this method, one queue is THE priority queue. It is completely serviced before even looking at the next queue.

The point of this is to give maximum attention to the highest priority traffic.

To find some balance, you could use Low Latency Queuing. This is a hybrid between Class-Based Weighted Fair Queuing and Strict Priority.

One queue is the priority queue. This is where you put your real-time traffic. The CBWFQ scheduler services the remaining queues as normal.

So, the priority queue bypasses the CBWFQ scheduler, and is always served first.

There are pros and cons to each method of course. The method you use will depend on your environment and the type of traffic you have.

Class-Based Weighted Fair Queuing is good for general traffic but hurts real-time traffic. That’s because packets in the real-time class can queue up while the router services other queues. This delays them and adds jitter.

Strict priority queuing is great for real-time traffic, but is bad for everything else. If there’s a lot going on in the real-time queue, the rest of the queues don’t get serviced as often.

This can lead to queue starvation. This neglects some lower priority traffic, even though it’s still important.

An advanced solution to this is to have a priority queue, but put a rate-limit on it. We’ll see how to configure rate limiting soon.

First, let’s look at configuring the four-class model. We’ll start by configuring the four queues that we’ve been looking at in the examples.

This time, we’re going to match based on existing DSCP markings.

Let’s continue by configuring the priority queue. We do this under a policy-map as normal.

Within the queue, we use the priority statement. This is very much like the bandwidth statement we used in the last video.

The priority statement allocates bandwidth. But it also identifies this queue as the priority queue. 

Now we can create the rest of the queues, in regular Class-Based Weighted Fair Queuing mode.

In the last video we use a percentage of bandwidth. We could also use a fixed value. For example, I can set 1Mbps on Q2.

We could allocate a percentage of the remaining bandwidth if we want to. But as we see here, we can’t do that when we’ve already set a fixed value in another queue. So we’ll allocate 500Kbps here.

We don’t need to explicitly configure Q4. We can leave it to just get whatever bandwidth remains.

As normal, the final step is to apply this policy to an interface with a service-policy. As this policy involves allocating bandwidth, we apply it in the out direction.

What happens if a queue fills up? Packets need to drop.

By default, the router drops the newest packet. This is a tail drop, as the packet drops from the tail of the queue.

Of course, this could be a very important packet. Dropping it only because it’s the last packet to arrive is not ideal.

This is where drop probability can help. Remember, this is a bit like a subclass, within a class. The drop probability can be low, medium, or high.

When a queue fills, the router drops packets marked with a high drop probability first. This makes room for new packets to enter the queue.

If there are no packets with a high drop probability, the router drops medium packets.


Queuing means that a router can spend more time servicing important traffic.

Drop probability lets us be more granular. lt allows us to set priorities within a queue.

Chapter 4 – Policing and Shaping

Two tools often used with QoS are policing and shaping. They’re very similar, as they are both used to rate-limit traffic. But there are still subtle differences. 

This is where we define a limit on how much bandwidth certain traffic can have. This could be a type of traffic, like FTP downloads. Or, it could be traffic between two IP addresses. Or some other combination of conditions.

First, think about the traffic we’ve identified. As shown here, it’s common for traffic to increase and decrease over time. The high peaks in the graph show this.

When we configure policing, we’re setting a hard limit. That is, a limit to the amount of bandwidth that this traffic can have, as shown by this line.

Any amount of traffic under this limit is conforming. Anything above this limit is non-conforming.

The whole point of the policer is to take action on any non-confirming packets.

The simplest, and most common action, is to drop all non-conforming packets. This strictly enforces the rate limit. It would change our traffic pattern to look more like this.

Another option is to re-mark non-conforming packets, rather than dropping them. We could raise the drop probability, or move them into a different class. This might be useful if we’d rather an upstream device decide which packets to drop. 

A policer is able to recognise a small and short burst in traffic that goes over the limit.

In a case like this, the policer will still consider this burst to be conforming. It will not take any additional action.

However, a large increase of traffic or traffic over the limit for an extended duration is not a burst. This is non-conforming.

The point of this is that the policer is a little bit friendly. It won’t penalize your traffic for small infractions. 

We’ll now try to configure a policer. Before we start, let’s check the service policy on this router.

Here, we have Q1 configured as the priority queue. Let’s rate-limit traffic in this queue to 100Mbps.

To start, we edit the policy-map, and then the Q1 class. This is where we’ll define an extra action.

We’ll use the police action. This is the policer or rate-limiter. I often type confirm instead of conform. I’ll fix that up…

… ok, that’s better. Let me explain what this command is doing.

Police is the action, the rate-limiter. I’ve set it to a limit of 100,000 Kbps, or 100Meg.

We then need to set two different actions. What happens to conforming traffic, and what happens to non-confirming traffic.

I’ve chosen some basic actions. Allow (or transmit) conforming traffic, and drop non-confirming traffic.

Now if we look at the policy-map again, we can see that the priority command is still there. We haven’t affected that. But now, we also have a policer.

There is a warning that comes with policing. Some traffic types don’t respond well to drops. Reclassifying can also be tricky, as packets may be out of order, which is bad for some types of traffic.

The point here is that it’s important to understand the effect of policing on your traffic before you apply it.

Shaping is a rate-limiter like policing, but in a way, it’s a bit gentler.

It still looks at non-conforming traffic. But instead of dropping or remarking, it adds non-conforming packets to a queue, or buffer.

When more bandwidth becomes available, the router sends out these buffered packets.

This works well when dealing with small bursts in traffic. Instead of cutting these peaks off, we’re smoothing them out. We’re changing the shape of this graph. We’re smoothing out bursty traffic. 

Let’s go back to our router and add shaping to the Q2 queue. This is an extra action under the policy-map, and under the class.

We can now use the shape average command. Here, I’ve used 10,000Kbps, or 10Meg. Nice and easy!

Now for the warnings. If your traffic bursts are too large, the queue will fill up. A queue will not accept any new packets when it is full. When this happens, tail drops will occur.

Also, not all traffic likes shaping. Shaping delays packets. This is generally not a good thing for real-time traffic. Real-time traffic needs consistent and regular delivery of packets.


We’ve nearly finished our discussion on Quality of Service. Before we move on to IPv6, we have one more thing to say about congestion.

Chapter 5 – Congestion

It’s likely that you’ve noticed that a large part of QoS is managing congestion. Especially how to manage traffic when congestion occurs.

Another aspect of QoS is avoiding congestion as much as possible. We can’t prevent it entirely, but there are a few things we can do to help reduce it.

To understand that, we need to think a little about how TCP works. In particular, acknowledgments and Windowing.

When traffic uses TCP, the receiver will send back an acknowledgment. This confirms that the packet arrived. If a sender does not get an acknowledgment back, it assumes that the data has been lost, and will resend it.

It’s not efficient to send an acknowledgment for every packet, so TCP uses windowing.

The sender and receiver agree on window size. This is the amount of data the device sends before it needs to see an acknowledgment. The receiver then sends a single acknowledgment for all this data.

The sender will pause and wait for the acknowledgment before sending more data. If the acknowledgment is not received, all data in the window needs to be re-sent.

Usually, the window size starts small. Receiving acknowledgments increases confidence that the link is stable. Both parties will then start growing the window size.

If acknowledgments go missing and data needs to be resent, then the window size will start to shrink. This is to reduce the amount of data that needs to be resent over an unreliable link.

The key point is that smaller window sizes mean more acknowledgments. This means the sender spends more time waiting and less time sending.

Now back to queueing… Queues have a limit to how many packets they can store. This is the Queue Limit. The number of packets in the queue is the Queue Depth.

As the queue fills up, indicating congestion, the router drops packets from the queue.

So, before the queue fills up, QoS can drop a few random packets. That sounds weird, but there’s actually a good reason.

A few dropped packets before the queue fills tricks the receiver. It thinks there is data loss. This means that they won’t send an acknowledgment for that window of data. This means the window size will shrink.

The smaller window size means the recipient will send acknowledgments more often. In turn, this means that the sender will slow down, as it needs to wait for acknowledgments more often.

The result is that the sender will not send traffic as quickly, which means less congestion.

That’s not to say that the router will start dropping packets at any time. There’s a threshold for when it is an acceptable time to drop traffic. And of course, we can drop less important packets first.

It is all based on the queue depth, and how full it is.

First, there is a configurable Minimum Threshold. If the number of packets in the queue stays at or below this point, there are no drops at all. This is a completely healthy link.

The next level is the Maximum Threshold. When the queue depth is between the minimum and maximum, the router drops a small number of packets.

When the queue depth is over the maximum threshold, QoS triggers a full drop.

If you’re interested in knowing more about how this works, I recommend having a look at my QoS mini-series.


The process we’ve just described is Random Early Detection, or RED. There’s a more refined version called Weighted Random Early Detection, or WRED. You might see a bit of this in the lab.


In this lab, you’re going to configure the 8-class QoS model. First, you need to fill out the table shown here, with the requirements. This includes the DSCP marking to match, the bandwidth, and any extra requirements.

You’ll find some links in the description that will help you to find this information.

Once that’s done, you need to configure this on the two routers, as shown here.

End Scene

That brings us to the end of this section. Things are sure to be interesting in the next section as we begin looking at IPv6. This is the newer version of IP, which comes with an all-new address format.

I hope to see you there!