03. Chaining & Open Addressing

When multiple keys map to the same bucket index, a collision occurs. This chapter explores how hash tables resolve this core issue using two fundamentally different memory paradigms.

1. The Collision Reality

Imagine assigning parking spots in a crowded city garage using a predictable mathematical formula based on the license plate. It works perfectly until two cars arrive simultaneously with passes for the exact same spot #42. How do you resolve the conflict? Do you let them stack vertically (Chaining) or tell one to drive to the next available empty spot (Linear Probing)? In distributed systems and in-memory databases, this is known as a Hash Collision, and resolving it efficiently is critical to system performance.

There are two main strategies: Separate Chaining (used by Java HashMap) and Open Addressing (used by Python Dictionary and C++ unordered<sub>map</sub>).

Interactive Comparison

Watch how each strategy handles collisions differently when you add multiple keys that map to the same index.

Chaining vs. Linear Probing

Separate Chaining

Linear Probing

Strategy 1: Separate Chaining

Each bucket in the array holds a Linked List (or a dynamic array). When a collision occurs, simply append the new key to the list at that index.

  • Pros:
  • Simple to implement.
  • Gracefully handles high load factors (performance degrades gradually).
  • Deletion is easy (just remove from linked list).
  • Cons:
  • Requires extra memory for pointers.
  • Poor cache locality (linked list nodes are scattered in memory).

Strategy 2: Open Addressing (Linear Probing)

All elements are stored directly in the array. If the target bucket is full, we search for the next empty slot using a Probing Sequence.

  • Linear Probing: Check index, then index+1, then index+2
  • Pros:
  • Better memory usage (no pointers).
  • Excellent cache locality (sequential access).
  • Cons:
  • Clustering: Groups of occupied slots tend to merge, creating long “runs” that slow down insertions.
  • Performance degrades sharply as load factor approaches 1.
  • Deletion is tricky (requires “tombstones” or re-inserting elements).

War Story: The Primary Key Clustering Outage

In 2018, a major ad-tech company experienced random latency spikes across their distributed caching layer. The root cause? They were using Linear Probing with a poor hash function. As certain IDs hashed to the exact same starting index, massive “clusters” formed. A single lookup, which should be O(1), turned into an O(N) scan across thousands of array slots, saturating CPU usage and causing cascading timeouts. The fix was simply adopting a stronger hash function (MurmurHash3) to uniformly distribute the keys and prevent the clustering effect.


Hardware Truth: CPU Cache vs. Pointers

Modern software is bottlenecked by memory latency, not CPU cycles.

The Chaining Latency

In Separate Chaining, the table contains a pointer to a Node object.

  1. CPU reads bucket → Pointer address.
  2. CPU must fetch that address (Pointer Chase).
  3. Cache Miss Probability: High. Each jump in the linked list likely hits a different “cache line,” forcing a slow trip to main RAM.

The Open Addressing Advantage

In Linear Probing, the data is inline.

  1. CPU reads bucket i.
  2. CPU pre-fetches the next 64 bytes (the “Cache Line”) automatically.
  3. If a collision exists at i+1, the data is already in the L1/L2 Cache.

[!IMPORTANT] Because of this, Linear Probing is often 3x-10x faster than Chaining for small-to-medium datasets, even if it performs more comparisons. In modern systems, a comparison is nearly free; a memory fetch is expensive.


2. Load Factor & Rehashing

The Load Factor (α) is defined as: &alpha; = (Number of Elements) / (Table Size)

  • For Chaining, α can be > 1.
  • For Open Addressing, α must be ≤ 1.

When α exceeds a threshold (e.g., 0.75), the hash table must Rehash:

  1. Create a new array double the size.
  2. Iterate through all old elements.
  3. Re-insert them into the new array (their new index will be different!).

3. Implementation (Chaining)

Here is how you might build a simple Hash Map from scratch using chaining.

Java

class Node {
  int key;
  int value;
  Node next;

  public Node(int key, int value) {
    this.key = key;
    this.value = value;
  }
}

class MyHashMap {
  private final int size = 1000;
  private Node[] buckets;

  public MyHashMap() {
    buckets = new Node[size];
  }

  private int hash(int key) {
    return Integer.hashCode(key) % size;
  }

  public void put(int key, int value) {
    int index = hash(key);
    Node head = buckets[index];

    // Check if key exists (update)
    Node curr = head;
    while (curr != null) {
      if (curr.key == key) {
        curr.value = value;
        return;
      }
      curr = curr.next;
    }

    // Insert new node at head (collision)
    Node newNode = new Node(key, value);
    newNode.next = head;
    buckets[index] = newNode;
  }

  public int get(int key) {
    int index = hash(key);
    Node curr = buckets[index];
    while (curr != null) {
      if (curr.key == key) {
        return curr.value;
      }
      curr = curr.next;
    }
    return -1; // Not found
  }
}

Go

type Node struct {
  key   int
  value int
  next  *Node
}

type MyHashMap struct {
  size    int
  buckets []*Node
}

func Constructor() MyHashMap {
  return MyHashMap{
    size:    1000,
    buckets: make([]*Node, 1000),
  }
}

func (m *MyHashMap) hash(key int) int {
  return key % m.size
}

func (m *MyHashMap) Put(key int, value int) {
  index := m.hash(key)
  head := m.buckets[index]

  // Check if key exists (update)
  curr := head
  for curr != nil {
    if curr.key == key {
      curr.value = value
      return
    }
    curr = curr.next
  }

  // Insert new node at head (collision)
  newNode := &Node{key: key, value: value}
  newNode.next = head
  m.buckets[index] = newNode
}

func (m *MyHashMap) Get(key int) int {
  index := m.hash(key)
  curr := m.buckets[index]
  for curr != nil {
    if curr.key == key {
      return curr.value
    }
    curr = curr.next
  }
  return -1 // Not found
}