How MTU and MSS Affect Your Network

What is MTU?

The MTU, or ‘Maximum Transmission Unit’, is the largest block of data that can be handled at layer-3 of the OSI model. This is usually IP, so the MTU usually refers to the maximum size a packet can be.

Where Does This Limit Come From?

The limit at layer-3 comes directly from layer-2. Layer-2 uses frames, and each frame has a maximum size limit.

The Ethernet standard, for example, sets a maximum frame size at 1518 bytes. Ethernet headers are 18 bytes long, leaving 1500 bytes for the packet to use. Therefore, the packet’s MTU is 1500 bytes.

We don’t always use Ethernet as our layer-2 protocol. Many WAN standards, like PPPoE, use different frame sizes. This results in a different MTU for the packet.

In this article, we’ll assume Ethernet unless specified otherwise.

Some devices support Jumbo Frames. Newer Ethernet standards support frames up to around 9000 bytes. This is more common in campuses and the data centre, but still rare in the WAN.

For the rest of this article, we’ll talk about the non-Jumbo frame limit of 1500 bytes.

The entire packet needs to fit into the MTU limit. This includes the IP headers as well as the payload.

TCP has a limit called Maximum Segment Size, or MSS. This is the size of the layer-4 payload (without the IP and TCP headers).

IP and TCP headers usually add up to 40 bytes in total. So, if we started with an MTU of 1500 bytes, we now have an MSS of 1460 bytes.

The IP MTU is all usable space in the IP packet; The TCP MSS is the packet without IP, TCP, or other headers

Fragmentation

A host will definitely know the MTU of its own connection to the network. But, it has no way of knowing the MTU of a link further up the path.

For example, a computer’s NIC may have an MTU of 1500. However, the connection to the WAN has an MTU of 1400.

A computer will know its own MTU, but not the MTU of a link further up the path

This could leave us in a sticky situation. The workstation may send a 1500 byte packet. The router connected to the WAN would be unable to send the packet, as the packet is larger than the 1400 byte MTU.

The solution is fragmentation. When a packet is larger than the MTU, a device (often a router) will break the packet into smaller fragments.

Each of these fragments is still a packet, just smaller than the original. The packets move along the path to the destination just like normal. The destination device reassembles the fragments into the original packet and processes it as normal.

This might sound like a wonderful solution, but it comes with its drawbacks. For one, each fragment has duplicated IP headers, which results in more traffic begin sent.

Also, this adds processing overhead on the router that fragments the packets, and on the destination that reassembles them.

And what if one fragment goes missing or becomes corrupted? We need to resend the entire packet, not just the fragment.

Firewalls may also have problems with fragments. Some of them will drop fragments if they arrive out of order, seeing them as unsolicited traffic.

TIP: Where possible, avoid fragmentation

What Can Cause Fragmentation?

As we’ve already mentioned, some WAN links use a smaller MTU. But there are other causes as well.

Tunnels, such as a GRE tunnel, will add additional headers to a packet. This may push the packet size over the frame limit and cause fragmentation.

Encryption, IPSec for example, may also add additional headers, with the same effect.

So what do we do about this? In the case of a tunnel (with or without encryption), we would manually lower the MTU on the tunnel’s interface.

We’re effectively restricting the size of the payload, so the payload plus the additional headers do not go over the frame limit.

But that still leaves us with fragmentation. A workstation doesn’t know about the smaller MTU size of the tunnel, sends large packets, and fragmentation occurs. This is not ideal.

We now need to think about how to handle this. One option might be to manually set the MTU on each workstation. That’s a very big job.

Another option is to use Path MTU Discovery (PMTUD).

PMTUD

PMTUD is a method of dynamically discovering low MTU’s along a path. This works by setting the ‘Don’t Fragment’ (DF) bit in the IP header of each packet.

Setting this bit tells devices that fragmentation is not allowed for this packet. If it’s too large, and this bit is set, delivery is not possible and the device drops the packet.

The device that dropped the packet will send an ICMP ‘Destination Unreachable (Fragmentation was Needed and DF was set)’ message back to the sender.

When the sender gets this message, it knows that the packets it is sending are too large for the path.

TIP: Don’t block ICMP type 3 code 4 messages! These are needed for PMTUD to work!

The ICMP message does not tell the sender what the MTU should be. The sender needs to lower it’s MTU for this connection and try again. It will repeat the process until the packet is small enough that these messages are not generated.

PMTUD is only supported by TCP and UDP

PMTUD is not just done when a connection is initially set up. It happens continually. This is because network paths may change, so MTU’s may change. This allows devices to adapt to changes in the network.

It also works independently in both directions. Imagine a case where a client is requesting HTTP from a server. The client will usually send small request packets to the server, not triggering PMTUD. The server would likely send larger packets back, which may trigger PMTUD.

Another case where we see this is with asymmetric routing. Different paths are used in different directions, which may have different MTU’s.

TCP MSS

Another option is to tune the MSS. As mentioned earlier, the MSS is like the MTU, but used with TCP at layer 4.

Put simply, the MSS is the maximum size that the payload can be, after subtracting space for the IP, TCP, and other headers. So, if the MTU is 1500 bytes, and the IP and TCP headers are 20 bytes each, the MSS is 1460 bytes.

While establishing a new TCP connection, a three-way handshake is performed. Each device inserts its MSS into TCP headers, so in this sense, it’s announcing its MSS to the remote device.

The remote device will see the MSS, and if necessary it will adjust the payload size that it uses when it uses this connection.

It’s important to note here that hosts are simply announcing their MSS. There is no negotiation of mutually acceptable values. They simply say ‘This is the largest TCP payload I can handle’. It’s up to the remote end to honour this.

Interim Devices

The MSS is based on the MTU. We face a similar problem as before. A device will only know the MTU, and therefore the MSS, of its local link. They will not know about a lower MSS on a link somewhere along the path.

Think of a network with a GRE tunnel for example. Each host will have an MSS of 1460. However, the tunnel has extra headers, lowering its MSS to 1436 bytes.

A GRE tunnel will have a lower MSS

How can we work with cases like this? We can configure the routers to rewrite the MSS.

A Cisco router, for example, will have this command configured on the tunnel:

ip tcp adjust mss 1436

When TCP traffic, such as the three-way handshake, passes across the tunnel, the router will see the MSS is set to 1460 in the TCP header.

Knowing that the MSS of the tunnel is lower than this, it will rewrite this to 1436.

Now, the host at the other end will see the MSS for the path, and adjust the maximum payload for this connection accordingly.

It’s important to realise that this is a feature of TCP. This won’t work with UDP or other traffic types.

UDP needs to fall back on other methods like PMTUD, manual MTU adjustments, or fragmentation.

References

Wikipedia – Maximum Transmission Unit

Incapsula – MTU and MSS: What You Need to Know

Cisco – Resolve IPv4 Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPSEC