TCP Header Analysis

[!NOTE] This module explores the core principles of TCP Header Analysis, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Problem: The Chaos of the Open Internet

Imagine you are a logistics manager at a shipping company. You need to send a 10,000-page manuscript to a publisher across the country, but you are only allowed to send it via postcards that fit exactly 500 characters each.

Furthermore, the postal service makes no guarantees:

  • Postcards might arrive out of order.
  • Postcards might get lost in a storm.
  • Postcards might arrive completely corrupted.

How do you guarantee the publisher receives the exact manuscript, in perfect order, without missing a single character?

The solution: You write a sequence number on every postcard. If the publisher receives #1 and #3 but misses #2, they send a message back saying, “I got up to #1, I’m waiting for #2.” You hold a copy of every postcard until the publisher explicitly acknowledges receiving it.

This is exactly what Transmission Control Protocol (TCP) does over the chaotic, unreliable Internet Protocol (IP) layer.

2. What is TCP?

Transmission Control Protocol (TCP) is a connection-oriented, reliable protocol that ensures data is delivered in order and without errors.

Unlike stateless protocols (like UDP), TCP requires the OS kernel to maintain complex state for every active connection. This involves allocating Sockets, maintaining send and receive memory buffers, and tracking sequence numbers and timers. At scale (e.g., millions of concurrent connections), this state machine demands significant RAM and CPU overhead, known as the C10K (or C10M) problem.

3. The TCP Header Anatomy (20-60 Bytes)

To achieve reliability, TCP wraps every chunk of Application data in a TCP Header before handing it down to the IP layer. This header acts as the metadata for the connection.

Field Bits Purpose
Source Port 16 The port of the sending application (e.g., 54321).
Dest Port 16 The port of the receiving application (e.g., 443 for HTTPS).
Sequence # 32 The byte offset of the data in this packet. Used to reassemble data in the correct order.
ACK # 32 The next byte the receiver expects. Tells the sender: “I received everything up to this minus 1.”
Data Offset 4 The size of the TCP header (in 32-bit words), indicating where the data begins.
Flags 9 Control bits: SYN, ACK, FIN, RST, PSH, URG.
Window Size 16 Flow control mechanism. The sender says: “I can accept X more bytes before my buffer is full.”
Checksum 16 Error-checking of the header and data.
Urgent Pointer 16 Points to urgent data (rarely used in modern applications).

Concrete Example: Sequence and ACK Numbers

If a server sends 1,000 bytes of data starting at Sequence # 5000:

  1. The packet contains: Seq = 5000, Length = 1000.
  2. The client successfully receives the packet.
  3. The client replies with an Acknowledgement: ACK = 6000. This implicitly means: “I have successfully received bytes up to 5999. Please send starting at 6000.”

4. The 3-Way Handshake (Connection Setup)

Before sending data, TCP must establish a reliable session. This is known as the 3-Way Handshake. It ensures both parties are ready, agree on initial sequence numbers, and allocate necessary kernel buffers.

  1. SYN (Synchronize): Client says, “Let’s open a connection. My starting sequence number is x.”
  2. SYN-ACK: Server replies, “I acknowledge your x (I expect x+1 next). Let’s open a connection from my side too. My starting sequence number is y.”
  3. ACK (Acknowledge): Client replies, “I acknowledge your y (I expect y+1 next).”

Now the connection is ESTABLISHED. Both sides have allocated memory and agreed on the initial state.


5. Interactive: The Handshake

Watch the flags fly across the network.

CLIENT
Port: 54321
SYN
SERVER
Port: 443
Waiting for user...

6. Termination (4-Way Wave)

Closing a connection is different from opening one because TCP is Full-Duplex. Data flows in both directions independently. One side might be done sending data, but the other side might still have a backlog to send. Thus, the connection must be closed in each direction independently.

  1. FIN: Client says, “I have no more data to send.” (Client enters FIN-WAIT-1).
  2. ACK: Server says, “Understood, I’ll stop expecting data from you.” (Server enters CLOSE-WAIT, Client enters FIN-WAIT-2).
    • At this point, the Server can still send data to the Client.
  3. FIN: Server eventually finishes sending its remaining data and says, “I’m also done sending.” (Server enters LAST-ACK).
  4. ACK: Client says, “Understood. Connection closed.” (Client enters TIME-WAIT before fully closing to ensure the final ACK isn’t lost).