03. Chaining & Open Addressing

When multiple keys map to the same bucket index, a collision occurs. This chapter explores how hash tables resolve this core issue using two fundamentally different memory paradigms.

1. The Collision Reality

There are two main strategies: Separate Chaining (used by Java HashMap) and Open Addressing (used by Python Dictionary and C++ unordered<sub>map</sub>).

Interactive Comparison

Watch how each strategy handles collisions differently when you add multiple keys that map to the same index.

Chaining vs. Linear Probing

Separate Chaining

Linear Probing

Strategy 1: Separate Chaining

Each bucket in the array holds a Linked List (or a dynamic array). When a collision occurs, simply append the new key to the list at that index.

  • Pros:
  • Simple to implement.
  • Gracefully handles high load factors (performance degrades gradually).
  • Deletion is easy (just remove from linked list).
  • Cons:
  • Requires extra memory for pointers.
  • Poor cache locality (linked list nodes are scattered in memory).

Strategy 2: Open Addressing (Linear Probing)

All elements are stored directly in the array. If the target bucket is full, we search for the next empty slot using a Probing Sequence.

  • Linear Probing: Check index, then index+1, then index+2
  • Pros:
  • Better memory usage (no pointers).
  • Excellent cache locality (sequential access).
  • Cons:
  • Clustering: Groups of occupied slots tend to merge, creating long “runs” that slow down insertions.
  • Performance degrades sharply as load factor approaches 1.
  • Deletion is tricky (requires “tombstones” or re-inserting elements).

Hardware Truth: CPU Cache vs. Pointers

Modern software is bottlenecked by memory latency, not CPU cycles.

The Chaining Latency

In Separate Chaining, the table contains a pointer to a Node object.

  1. CPU reads bucket → Pointer address.
  2. CPU must fetch that address (Pointer Chase).
  3. Cache Miss Probability: High. Each jump in the linked list likely hits a different “cache line,” forcing a slow trip to main RAM.

The Open Addressing Advantage

In Linear Probing, the data is inline.

  1. CPU reads bucket i.
  2. CPU pre-fetches the next 64 bytes (the “Cache Line”) automatically.
  3. If a collision exists at i+1, the data is already in the L1/L2 Cache.

[!IMPORTANT] Because of this, Linear Probing is often 3x-10x faster than Chaining for small-to-medium datasets, even if it performs more comparisons. In modern systems, a comparison is nearly free; a memory fetch is expensive.


2. Load Factor & Rehashing

The Load Factor (α) is defined as: &alpha; = (Number of Elements) / (Table Size)

  • For Chaining, α can be > 1.
  • For Open Addressing, α must be ≤ 1.

When α exceeds a threshold (e.g., 0.75), the hash table must Rehash:

  1. Create a new array double the size.
  2. Iterate through all old elements.
  3. Re-insert them into the new array (their new index will be different!).

3. Implementation (Chaining)

Here is how you might build a simple Hash Map from scratch using chaining.

Java
Go
```java class Node { int key; int value; Node next; public Node(int key, int value) { this.key = key; this.value = value; } } class MyHashMap { private final int size = 1000; private Node[] buckets; public MyHashMap() { buckets = new Node[size]; } private int hash(int key) { return Integer.hashCode(key) % size; } public void put(int key, int value) { int index = hash(key); Node head = buckets[index]; // Check if key exists (update) Node curr = head; while (curr != null) { if (curr.key == key) { curr.value = value; return; } curr = curr.next; } // Insert new node at head (collision) Node newNode = new Node(key, value); newNode.next = head; buckets[index] = newNode; } public int get(int key) { int index = hash(key); Node curr = buckets[index]; while (curr != null) { if (curr.key == key) { return curr.value; } curr = curr.next; } return -1; // Not found } } ```
```go type Node struct { key int value int next *Node } type MyHashMap struct { size int buckets []*Node } func Constructor() MyHashMap { return MyHashMap{ size: 1000, buckets: make([]*Node, 1000), } } func (m *MyHashMap) hash(key int) int { return key % m.size } func (m *MyHashMap) Put(key int, value int) { index := m.hash(key) head := m.buckets[index] // Check if key exists (update) curr := head for curr != nil { if curr.key == key { curr.value = value return } curr = curr.next } // Insert new node at head (collision) newNode := &Node{key: key, value: value} newNode.next = head m.buckets[index] = newNode } func (m *MyHashMap) Get(key int) int { index := m.hash(key) curr := m.buckets[index] for curr != nil { if curr.key == key { return curr.value } curr = curr.next } return -1 // Not found } ```