Protocol Overhead

How fast can you really go using a given media and protocol stack? We examine how much bandwidth is left for applications.

Ethernet

Ethernet frame format:
Ethernet overhead bytes:
  12 gap + 8 preamble + 14 header + 4 trailer = 38 bytes/packet w/o 802.1q
  12 gap + 8 preamble + 18 header + 4 trailer = 42 bytes/packet with 802.1q

Ethernet Payload data rates are thus:
  1500/(38+1500) = 97.5293 %   w/o 802.1q tags
  1500/(42+1500) = 97.2763 %   with 802.1q tags
  9000/(38+9000) = 99.5796 %   jumbo w/o 802.1q tags
  9000/(42+9000) = 99.5355 %   jumbo with 802.1q tags

TCP over Ethernet:
 Assuming no header compression (e.g. not PPP)
 Add 20 IPv4 header or 40 IPv6 header (no options)
 Add 20 TCP header
 Add 12 bytes optional TCP timestamps
 Max TCP Payload data rates over ethernet are thus:
  (1500-40)/(38+1500) = 94.9285 %  IPv4, minimal headers
  (1500-52)/(38+1500) = 94.1482 %  IPv4, TCP timestamps
  (1500-52)/(42+1500) = 93.9040 %  802.1q, IPv4, TCP timestamps
  (1500-60)/(38+1500) = 93.6281 %  IPv6, minimal headers
  (1500-72)/(38+1500) = 92.8479 %  IPv6, TCP timestamps
  (1500-72)/(42+1500) = 92.6070 %  802.1q, IPv6, TCP timestamps
  (9000-40)/(38+9000) = 99.1370 %  Jumbo IPv4, minimal headers
  (9000-52)/(38+9000) = 99.0042 %  Jumbo IPv4, TCP timestamps
  (9000-52)/(42+9000) = 98.9604 %  Jumbo 802.1q, IPv4, TCP timestamps
  (9000-60)/(38+9000) = 98.9157 %  Jumbo IPv6, minimal headers
  (9000-72)/(38+9000) = 98.7829 %  Jumbo IPv6, TCP timestamps
  (9000-72)/(42+9000) = 98.7392 %  Jumbo 802.1q, IPv6, TCP timestamps

UDP over Ethernet:
 Add 20 IPv4 header or 40 IPv6 header (no options)
 Add 8 UDP header
 Max UDP Payload data rates over ethernet are thus:
  (1500-28)/(38+1500) = 95.7087 %  IPv4
  (1500-28)/(42+1500) = 95.4604 %  802.1q, IPv4
  (1500-48)/(38+1500) = 94.4083 %  IPv6
  (1500-48)/(42+1500) = 94.1634 %  802.1q, IPv6
  (9000-28)/(38+9000) = 99.2697 %  Jumbo IPv4
  (9000-28)/(42+9000) = 99.2258 %  Jumbo 802.1q, IPv4
  (9000-48)/(38+9000) = 99.0485 %  Jumbo IPv6
  (9000-48)/(42+9000) = 99.0046 %  Jumbo 802.1q, IPv6
An excellent source of ethernet information is Charles Spurgeon's Ethernet Web Site.

Notes:

  1. 48-bit (6 byte) ethernet address have a 24-bit "Organizationally Unique Identifier" (OUI) assigned by IEEE + a 24-bit number assigned by the vendor.
  2. The minimum ethernet payload (data field) is 46 bytes which makes a 64 byte ethernet packet including header and CRC.
  3. The maximum ethernet payload (data field) is 1500 bytes which makes a 1518 byte ethernet packet including header and CRC. When 802.1q added an optional 4-byte VLAN Tag Header, they extended the allowed maximum frame size to 1522 bytes (22 byte header+CRC).
  4. The bit speed of 100 Mbps ethernet on the wire/fiber is actually 125 Mbps due to 4B/5B encoding. Every four data bits gets mapped to one of 16 5-bit symbols. This leaves 16 non-data symbols. This encoding came from FDDI.
  5. The original Ethernet II spec had a two byte type field which 802.3 changed to a length field, and later a length/type field depending on use: values 1536 and over are types, under 1536 lengths.
  6. The Interframe Gap (IFG) between ethernet frames is 96 bit times (= 12 bytes) for all speeds of ethernet (at least up to 10G). This gap is sometimes reduced by repeaters due to clock differences and could be as small as 5 bytes on reception. Also, some NICs allow it to be reduced, but this is not recommended.

Gigabit Ethernet with Jumbo Frames

Gigabit ethernet is exactly 10 times faster than 100 Mbps ethernet, so for standard 1500 byte frames, the numbers above all apply, multiplied by 10 (for 10GE, multiple by 100). Many GigE devices however allow "jumbo frames" larger than 1500 bytes. The most common figure being 9000 bytes. For 9000 byte jumbo frames, potential GigE throughput becomes (from Bill Fink, the author of nuttcp):

Theoretical maximum TCP throughput on GigE using jumbo frames:

	(9000-20-20-12)/(9000+14+4+7+1+12)*1000000000/1000000 = 990.042 Mbps
	  |   |  |  |     |   |  | | | |       |         |
	 MTU  |  |  |    MTU  |  | | | |      GigE      Mbps
	      |  |  |         |  | | | |
	     IP  |  |  Ethernet  | | | |      InterFrame Gap (IFG), aka
	  Header |  |    Header  | | | |      InterPacket Gap (IPG), is
		 |  |            | | | |      a minimum of 96 bit times
	       TCP  |          FCS | | |      from the last bit of the
	    Header  |              | | |      FCS to the first bit of
		    |       Preamble | |      the preamble
		  TCP                | |
	      Options            Start |
	  (Timestamp)            Frame |
			     Delimiter |
				 (SFD) |
				       |
				   Inter
				   Frame
				     Gap
				   (IFG)

Theoretical maximum UDP throughput on GigE using jumbo frames:

	(9000-20-8)/(9000+14+4+7+1+12)*1000000000/1000000 = 992.697 Mbps

Theoretical maximum TCP throughput on GigE without using jumbo frames:

	(1500-20-20-12)/(1500+14+4+7+1+12)*1000000000/1000000 = 941.482 Mbps

Theoretical maximum UDP throughput on GigE without using jumbo frames:

	(1500-20-8)/(1500+14+4+7+1+12)*1000000000/1000000 = 957.087 Mbps

ATM

An excellent paper on ATM overhead was written by John Cavanaugh of MSC. A postscript copy can be found here. Based on that paper:
  -------------------------- DS3 ------------------------------
  Line Rate           44.736 Mbps
  PLCP Payload        40.704                       (avail to ATM)
  ATM Payload         36.864                       (avail to AAL)
                     MTU=576  MTU=9180 MTU=65527
  AAL5 Payload        34.501   36.752   36.845     (avail to LLC/SNAP)
  LLC/SNAP Payload    34.028   36.720   36.841     (avail to IP)
  IP Payload          32.847   36.640   36.830     (avail to transport)
    UDP Payload       32.374   36.608   36.825     (avail to application)
    TCP Payload       31.665   36.560   36.818     (avail to application)
  -------------------------- OC-3c ------------------------------
  Line Rate           155.520 Mbps
  SONET Payload       149.760                      (avail to ATM)
  ATM Payload         135.632                      (avail to AAL)
                     MTU=576  MTU=9180 MTU=65527
  AAL5 Payload        126.937  135.220  135.563    (avail to LLC/SNAP)
  LLC/SNAP Payload    125.198  135.102  135.547    (avail to IP)
  IP Payload          120.851  134.808  135.506    (avail to transport)
    UDP Payload       119.112  134.690  135.489    (avail to application)
    TCP Payload       116.504  134.513  135.464    (avail to application)
  -------------------------- OC-12c -----------------------------
  Line Rate           622.080 Mbps
  SONET Payload       600.768                      (avail to ATM)
  ATM Payload         544.092                      (avail to AAL)
                     MTU=576  MTU=9180 MTU=65527
  AAL5 Payload        509.214  542.439  543.818    (avail to LLC/SNAP)
  LLC/SNAP Payload    502.239  541.966  543.752    (avail to IP)
  IP Payload          484.800  540.786  543.586    (avail to transport)
    UDP Payload       477.824  540.313  543.519    (avail to application)
    TCP Payload       467.361  539.605  543.420    (avail to application)
Notes:
  1. DS3 and SONET frames are 125 usec long (8000/sec).
  2. PLCP packs 12 ATM cells per DS3 frame, for 96 kc/s (8000x12).
  3. An STS-3c frame (OC3c) is 2430 bytes long (270 bytes x 9 rows), 90 of which are consumed by SONET overhead (9 bytes x 9 rows section and line overhead and 1 byte x 9 rows path overhead), 2340 bytes are payload (260 bytes x 9 rows). The payload is called the Synchronous Payload Envelope (SPE).
  4. An STS-12c frame (OC12c) is 9720 bytes long, 333 of which are SONET overhead, 9387 bytes are payload (SPE). Note that this is slightly larger than four STS-3c SPE's (4x2340=9360), the advantage of "concatenated" OC12c vs. OC12.
  5. ATM cells are 53 bytes long: 5 header and 48 payload.
  6. AAL5 adds an 8 byte trailer in the last 8 bytes of the last cell, padding in front of the trailer if necessary. This results in 0-47 bytes of padding in an AAL5 frame. In the worse case, you have seven bytes of padding in one cell, and 40 bytes of padding plus the 8 byte AAL5 trailer in the following cell.
  7. RFC1483 defines two types of protocol encapsulation in AAL5
  8. IPv4 usually adds 20 bytes. IPv6 would add 40 bytes. Plus any options but assumed zero here.
  9. UDP adds an 8 byte header. (ICMP is also an 8 byte header)
  10. TCP adds a 20 byte header plus any options. A common option on high performance flows is timestamps which consume an additional 12 bytes per packet.

On the physical layer (single pt-to-pt hop), one out of every 27 cells is an OAM cell. The above calculations don't take that into account, but that's another 3.7% reduction!

We should add calculations for ping packets and 1500 byte packets.

So what is the largest packet that we can fit in a single ATM cell? If you are using AAL5, you have a 40 byte payload to work with. For IPv4, you could have a 20 byte header + a 20 byte IP payload. A UDP or ICMP payload could be up to 12 bytes (both use 8 bytes after the IP header). So a "ping -s8" through "ping -s12" should fit in one ATM cell and still give you a round trip time.


Packet Over SONET (POS)

Packet over SONET (POS) uses PPP with HDLC to frame IP packets. These add a five byte header and a four byte trailer under normal circumstances. No padding is required, except for any possible idle time between packets. Byte stuffing is used (see notes below) which can expand the length of the POS frame.
       Flag Byte (0x7e)
       Address Byte (0xff = all stations)
       Control Byte (0x03 = Unnumbered Information)
          Protocol - 2 bytes, 1 byte if compressed      +
          Payload - 0-MRU bytes                         | PPP part
          Padding - 0+ bytes                            +
       Frame Check Sequence (FCS) - 4 bytes (2 in limited cases)
       Flag Byte (0x7e)
       [Interframe fill or next Address]
HDLC has no set frame size limit, nor does PPP specify the payload size, you just keep reading until you see a Flag byte. PPP however specifies that the Maximum Receive Unit (MRU) default is 1500 bytes and that other sizes can be negotiated using LCP. These LCP messages have a 16-bit length field, so a properly negotiated maximum payload would be 65535 bytes. [It would be possible to configure a sender/receiver pair to go beyond 65535 and simply not negotiate a size with LCP. No one does this however.]

Most POS hardware seems to have a 4470 or 9180 byte MRU. Some Cisco documentation says the MRU can only be set between 64 and 17940 (go figure), and recommends a setting of 1492. Juniper documentation says they support up to 65535. The RFC says you must be able to receive at least 1500 even if you set this lower.

So we get:

  -------------------------- OC-3c ------------------------------
  Line Rate           155.520 Mbps
  SONET Payload       149.760                      (avail to POS)
  POS Payload         *** to do ***                (avail to IP)
  etc.

  -------------------------- OC-12c -----------------------------
  Line Rate               622.080 Mbps
  SONET Payload           600.768                      (avail to POS)
                         MTU=1500   MTU=9000
  POS Payload (no stuff)  597.185    600.168           (avail to IP)  9 overhead
  POS Payload (rnd stuff) 592.583    595.520                          20.71875 overhead
  POS Payload (max stuff) 299.486    300.234                          1509 overhead

  ~TCP Payload w/ts rnd   572.040    592.079

Notes:

  1. Only one flag byte is required between frames, i.e. the flag byte that ends one frame can also begin the next.
  2. It is possible for the HDLC Address and Control fields to be "compressed", i.e. non-existent. This is negotiated by PPP's Link Control Protocol (LCP). The RFC's however recommend that they be present on high speed links and POS.
  3. The protocol field can be compressed to one byte (negotiated by LCP), but this is also discouraged on high speed links and POS.
  4. IP -> PPP -> FCS generation -> Byte stuffing -> Scrambling -> SONET/SDH framing
  5. The Frame Check Sequence (FCS) for POS should be 32-bits. RFC2615 allows for 16-bits (the PPP default) only when required for backward compatibility, and only on OC3c. Even on OC3c 32-bit is recommended. The FCS length is configured, not negotiated. The FCS-32 uses the exponents x**0, 1, 2, 4, 5, 7, 8, 10, 11, 12, 16, 22, 23, 26, 32.
  6. Byte stuffing escapes any Flag (0x7e) and Escape (0x7d) bytes by inserting an Escape byte and xoring the original byte with 0x20. [PPP can also escape negotiated control characters but this is not used in POS.] Byte stuffing can at worse double the payload size (e.g. data of all 0x7e). For uniform random data one in every 128 bytes would be stuffed, for an overhead of 0.775%.
  7. The stuffed data is then scrambled with 1+x**43 (the same used for ATM) to prevent certain data patterns from interfering with SONET.

References:

POS with Frame Relay encapsulation

Frame Relay (FR) encapsulation can be used on POS instead of HDLC/PPP. There are not any RFC's about Frame Relay over SONET, nor does the Multiprotocol over Frame Relay RFC1490 discuss SONET or POS, but Cisco starting doing this and others have followed.

References:


Generic Framing Proceedure

A new way to do POS uses PPP over GFP-F (Generic Framing Proceedure, Framed) instead of HDLC. In both the HDLC and GFP-F cases, SONET / SDH VCAT (Virtual Concatenation) is used. GFP-F also allows Ethernet frames (100, GE and 10GE) and Resilient Packet Ring (RPR) frames to be sent over SONET/SDH VCAT. GFP can also map to G.709 (part of the Optical Transport Network (OTN) series).

A GFP User Frame:

A PLI of 0-3 indicates a GFP control frame. cHEC is a CRC-16 that protects the core header only (single bit error correction, multi bit error detection).

References:

Multi Protocol Label Switching (MPLS)

Multi-Protocol Label Switching (MPLS) adds four bytes to every frame. As described in RFC3032 the 32-bit label includes:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Label
|                Label                  | Exp |S|       TTL     | Stack
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Entry

     Label:  Label Value, 20 bits
     Exp:    Experimental Use, 3 bits
     S:      Bottom of Stack, 1 bit
     TTL:    Time to Live, 8 bits

Serial Lines (T1,T3)

To do
P. Dykstra, phil@sd.wareonearth.com, March 2001, last update Aug 2013