TCP vs UDP: Reliability vs Speed
Why does Discord use UDP instead of TCP for voice calls, even though UDP can lose packets? Why does Netflix stream video over TCP even though a single dropped packet can pause playback? And how does Google Stadia achieve sub-100ms gaming latency over the internet — the same internet that causes lag spikes in your home? The answer is a 40-year-old protocol debate that every senior engineer must understand.
The choice between TCP and UDP isn’t a trivia question — it’s an architectural decision that determines whether your product feels real-time and alive or reliable but sluggish.
[!TIP] Interview Tip: Never say “UDP is unreliable” and stop there. Say “UDP prioritizes Latency over Completeness, making it ideal for Gaming, VoIP, and DNS. And with QUIC (HTTP/3), we now get TCP-like reliability on top of UDP.”
1. The Core Difference: “Registered Mail” vs “Paper Airplanes”
Imagine you are sending a 100-page manuscript to a publisher.
TCP (Transmission Control Protocol)
The “Registered Mail” approach.
- Connection: You call the publisher first to make sure they are at their desk (Handshake).
- Ordering: You number every page (1/100, 2/100…).
- Reliability: You send Page 1. You wait for a receipt (ACK). If no receipt comes, you send Page 1 again.
- Flow Control: If the publisher says “I’m reading too slow!”, you stop sending (Window Size).
Result: 100% accuracy, ordered, but slower. Used for Web (HTTP), Email (SMTP), File Transfer (FTP). (Foundation for Message Queues).
UDP (User Datagram Protocol)
The “Paper Airplane” approach.
- Connectionless: You just start throwing pages out the window towards the publisher’s office.
- Fire and Forget: Did Page 50 land in a puddle? Too bad. You are already throwing Page 51.
- No Ordering: Page 99 might arrive before Page 2.
Result: Blazing fast, low overhead, but data might be lost or jumbled. Used for Video Streaming, Online Gaming, DNS, VoIP.
2. TCP Deep Dive: Under the Hood
TCP provides a “reliable byte stream” over an unreliable network (IP). How?
A. The 3-Way Handshake (Connection Establishment)
Before data moves, a “virtual pipe” is built. This ensures the Server is alive and has allocated resources for the connection.
- SYN (Synchronize): Client sends
Seq=X. “Let’s talk.” - SYN-ACK: Server receives X, sends
Ack=X+1andSeq=Y. “I hear you (ACK), let’s talk (SYN).” - ACK: Client receives Y, sends
Ack=Y+1. “Connection Established.”
[!IMPORTANT] In this lesson, you will master:
- Fundamental Trade-offs: When to prioritize reliability (TCP) vs instant delivery (UDP).
- Modern Congestion Control: How Google’s BBR outperforms loss-based CUBIC on global links.
- TCP Optimizations: 0-RTT handshakes with TFO and security with SYN Cookies.
- Head-of-Line (HOL) Blocking: Identifying the transport bottleneck that defined HTTP/2’s limits.
F. Elite Deep Dive: TCP Fast Open (TFO)
The standard 3-way handshake costs 1 full RTT (Round Trip Time) before any data is sent. In a 200ms mobile network, that’s 200ms of “dead time”.
- The Cookie: On the first connection, the server sends a “Cookie” to the client.
- The Shortcut: On subsequent connections, the client sends SYN + Cookie + Data in the very first packet.
- The Result: 0-RTT data transfer. The server can start processing the request before the handshake is even finished.
G. Security Deep Dive: SYN Cookies
In a SYN Flood DDoS, an attacker sends thousands of SYN packets from spoofed IPs. The server allocates RAM (TCB - Transmission Control Block) for each, eventually running out of memory.
- The Fix: Instead of storing the connection state in RAM, the server encodes the state into the
ISN(Initial Sequence Number) of theSYN-ACK. - Stationary State: Only when the client responds with the final
ACK(containing the cookie) does the server actually allocate memory.
3. Reliability & Ordering
- Sequence Numbers: Every byte is numbered. If packets arrive as
[1, 3, 2], the receiver’s TCP stack buffers them and reassembles[1, 2, 3]. - ACKs & Retransmission: If Sender sends Packet 5 and doesn’t get
ACK 5within the RTO (Retransmission Timeout), it resends.
C. Flow Control (rwnd - The Receiver’s Limit)
If the Client pumps 10Gbps into a Server that can only write to disk at 1Gbps, the Server’s RAM will overflow.
- RWND (Receive Window): The variable that tracks the receiver’s available buffer.
- Zero Window: If the buffer is full, the sender stops until it receives a “Window Update”.
[!NOTE] Hardware-First Intuition: High-performance servers use a TCP Offload Engine (TOE). This is a dedicated processor on the NIC that handles the TCP handshake and checksums, freeing up the main CPU for business logic. Without TOE, a 10Gbps stream can consume 100% of a CPU core just for networking overhead.
D. Congestion Control (cwnd - The Network’s Limit)
While Flow Control protects the Receiver, Congestion Control protects the Internet. If the sender sees packet loss (No ACK), it assumes the network is congested and slows down.
1. CUBIC (Loss-Based)
The traditional standard (Linux default).
- Logic: “Go faster until I hit a wall (Packet Loss), then back off 50%.”
- Visual: A “Sawtooth” wave. It probes timidly, drops sharply.
- Problem: In modern high-speed networks, packet loss isn’t always congestion (e.g., WiFi interference). CUBIC slows down unnecessarily.
2. BBR (Bottleneck Bandwidth and RTT)
Google’s new algorithm (2016+).
- Logic: “I don’t care about packet loss. I care about Bandwidth and Latency.”
- Method: It periodically “probes” the network.
- ProbeBW: Ramps up speed to find the max bandwidth.
- ProbeRTT: Drops speed to measure the min latency.
- Result: 100x faster throughput on lossy links (like Trans-Atlantic cables).
E. The Silent Killer: Head-of-Line (HOL) Blocking
TCP’s greatest strength is its greatest weakness. Because it guarantees ordering, if Packet #1 is lost, Packets #2, #3, and #4 cannot be processed by the application, even if they arrived perfectly. They sit in the buffer waiting for #1 to be retransmitted.
This is Head-of-Line Blocking. It is the reason why a single lost packet can make a fast 100Mbps connection feel “stuck.”
Congestion Control Race: CUBIC vs BBR
E. The Performance Killer: Nagle’s Algorithm vs Delayed ACK
This is a classic “deadlock” that kills real-time app performance.
- Nagle’s Algorithm (Sender): “I have 1 byte to send. It’s too small. I’ll wait until I have enough data to fill a packet (MSS - Maximum Segment Size, usually 1460 bytes) OR until I get an ACK.”
- Delayed ACK (Receiver): “I received a packet. I won’t ACK immediately. I’ll wait 40ms to see if I can piggyback the ACK on a response.”
The Deadlock: Sender waits for ACK. Receiver waits for Data. Both wait 40ms.
Fix: TCP_NODELAY = 1 (Disables Nagle). Essential for Gaming/SSH.
4. UDP Deep Dive: The Speed Specialist
While TCP is the “Registered Mail” of the internet, UDP is the “Postcard.”
A. The “Postcard” Header (8 Bytes)
UDP is incredibly lightweight. While a TCP header is 20-60 bytes, a UDP header is always exactly 8 bytes.
- Source Port (16 bits)
- Destination Port (16 bits)
- Length (16 bits)
- Checksum (16 bits)
That’s it. It doesn’t track sequence numbers, window sizes, or congestion states.
B. No Congestion Control (Fire and Forget)
If you send 100 UDP packets and 20 are lost, UDP’s job is done. It does not retransmit.
[!IMPORTANT] Why use it? In real-time apps like Voice over IP (VoIP) or Zoom, if a syllable is lost, it’s better to stay “real-time” than to wait 200ms to re-send that old syllable.
C. Modern UDP: The Foundation of QUIC
UDP’s lack of features is actually its greatest strength today. Because it’s so thin, engineers can build custom “Reliability Layers” in User Space (within the app code) instead of waiting for OS Kernel updates.
- HTTP/3 (QUIC): Effectively “TCP-like reliability” implemented on top of UDP.
D. Elite Deep Dive: UDP Hole Punching
How do two players in a game (or Zoom call) talk directly to each other if they are both behind NAT routers?
- The Stunt Server: Both clients talk to a central “STUN” server to discover their Public IP and Port.
- The Punch: Client A sends a UDP packet to Client B’s public port. Client B does the same.
- The Result: NAT routers “witness” outgoing traffic and create a temporary “hole” (mapping) for the return traffic. Direct P2P communication is established without a server relay!
5. The Grand Comparison: TCP vs UDP
| Feature | TCP | UDP |
|---|---|---|
| Connection | Connection-oriented (Handshake) | Connectionless (Fire & Forget) |
| Reliability | Guarantees delivery (ACK/Retransmit) | No guarantee (Loss possible) |
| Ordering | Guarantees order (Seq Numbers) | No order (Jumbled arrival) |
| Speed | Slower (Headers + Handshakes) | Blazing Fast (Minimal overhead) |
| Flow Control | Yes (Sliding Window) | No |
| Header Size | 20 - 60 Bytes | Fixed 8 Bytes |
| Use Cases | HTTP, SMTP, DB, SSH | VoIP, Games, DNS, Video |
6. Interactive Visualizer: TCP Sliding Window
Experience how the Sliding Window allows multiple packets to be “In Flight”.
- Action: Click “Start Sending”.
- Experiment: Click on a moving packet (Blue) to “Kill” it (Simulate Loss). Watch the window stall until timeout!
7. Summary
- TCP = Reliable, Ordered, Heavy. Use for: Web (HTTP), Email (SMTP), Databases.
- UDP = Fast, Lightweight, Lossy. Use for: Games, Video Streaming, DNS, VoIP.
- Flow Control (rwnd): Protects the Receiver from fast senders. (Sliding Window).
- Congestion Control (cwnd): Protects the Internet from all senders.
- CUBIC: Legacy sawtooth (Linux default).
- BBR: Modern bandwidth-based (Google default).
- HOL Blocking: TCP’s Achilles heel — one lost packet blocks everything behind it.
- QUIC (HTTP/3): “TCP-like reliability” on UDP. Solves HOL and enables Connection Migration.
Staff Engineer Tip: The “Deadlock” of Nagle + Delayed ACK. Always disable Nagle (TCP_NODELAY) for real-time services. If your API responses feel “jittery” (random 40ms delays), it’s likely this L4 mismatch. Google measured a significant reduction in P99 latency by simply disabling Nagle on their frontend proxies.