B-Tree vs GIN vs GiST: Choosing the Right Index

Postgres is unique among relational databases for its extensible indexing system. While B-Tree covers 90% of use cases, understanding when to reach for GIN or GiST is what separates a database user from a database engineer.

This chapter breaks down the internal structures of these indexes and provides a decision framework for choosing the correct one.

1. B-Tree: The Default Workhorse

If you run CREATE INDEX without specifying a type, you get a B-Tree (Balanced Tree). It is designed to handle equality and range queries on data that can be sorted.

How it Works

A B-Tree index maintains an ordered tree structure. The leaf nodes contain pointers (TIDs) to the actual table rows (Heap).

  • Self-Balancing: The tree depth remains uniform, ensuring O(log n) lookup time.
  • Ordered: Leaf nodes are linked via a Doubly Linked List, allowing efficient bidirectional range scans (BETWEEN, >, <) and ordered retrieval (ORDER BY).

Rigor: Why O(log N)?

B-Trees are “fat” trees. Unlike a binary search tree (branching factor of 2), a B-Tree node can have hundreds of children.

  • Fan-out: If a node holds 300 keys, the tree height grows very slowly.
  • Math: Height h &approx; log_fanout(N). For 1 billion rows with a fan-out of 360, the height is only 3 or 4 (360^4 > 10^9).
  • Conclusion: Almost every lookup requires only 3-4 random I/Os (pages), regardless of table size.

Hardware Reality: The Page Size

Postgres stores data in 8KB Pages.

  • Node Locality: A B-Tree internal node is designed to fit exactly into a single 8KB page.
  • Efficiency: One disk read brings in hundreds of child pointers. This maximizes the utility of CPU cache lines and minimizes expensive disk seeks.

When to Use B-Tree

  • Primary Keys & Unique Constraints: B-Trees are the only index type that can enforce uniqueness.
  • Equality: WHERE id = 55
  • Ranges: WHERE created_at > '2023-01-01'
  • Sorting: ORDER BY created_at DESC
  • Prefix Matching: WHERE name LIKE 'Alice%' (only if the locale supports it)

SQL Example

-- Standard B-Tree creation
CREATE INDEX idx_users_email ON users (email);

-- Supports:
SELECT * FROM users WHERE email = 'alice@example.com';
SELECT * FROM users WHERE email > 'm';
SELECT * FROM users ORDER BY email;

3. GIN: Generalized Inverted Index

GIN indexes are “Inverted Indexes,” similar to those used by search engines (like Elasticsearch). They are designed for composite values where a single row might contain multiple keys.

How it Works

Instead of storing the composite value (like a JSON document or an array) in the tree, GIN extracts the elements (keys in JSON, words in text) and stores them as keys in a B-Tree-like structure. Each key points to a list of TIDs that contain that element.

  • Structure: Element -> List[TID1, TID2, TID3...]
  • Compression: The TID lists are compressed using posting lists, making GIN very space-efficient for repetitive data.
  • Write Overhead: Inserting a new row with a JSON document might require updating many different GIN keys. Postgres uses a “Pending List” buffer to optimize this, but writes are still slower than B-Tree.

When to Use GIN

  • JSONB Containment: WHERE data @> '{"role": "admin"}'
  • Array Intersection: WHERE tags && ARRAY['urgent', 'todo']
  • Full Text Search: WHERE doc_vector @@ to_tsquery('postgres')
  • Trigram Matching: WHERE name LIKE '%ostgre%' (with pg_trgm extension)

SQL Example

-- Create a table with JSONB data
CREATE TABLE logs (
    id SERIAL PRIMARY KEY,
    payload JSONB
);

-- Create a GIN index on the JSONB column
CREATE INDEX idx_logs_payload ON logs USING GIN (payload);

-- Supports containment queries:
-- Find logs where payload contains key "level" with value "error"
SELECT * FROM logs
WHERE payload @> '{"level": "error"}';

5. GiST: Generalized Search Tree

GiST is not a single specific data structure but a framework for building them. It allows indexing of data types that cannot be strictly ordered (like 2D geometric shapes).

How it Works

GiST is a balanced tree, but instead of strict separation (all values < X go left), it uses Lossy Signatures or Bounding Boxes.

  • Overlapping: In a spatial index (R-Tree), parent nodes contain a bounding box that covers all children. These boxes can overlap.
  • Search: The search might need to traverse multiple branches of the tree if the query rectangle overlaps with multiple node boundaries.
  • Lossy: GiST indexes might return “false positives” which the engine must filter out by checking the actual heap tuple.

When to Use GiST

  • Geometric Data (PostGIS): “Find all points within 5km of this location.”
  • Range Types: “Find all reservations that overlap with this time period.” (tsrange, daterange)
  • Nearest Neighbor: ORDER BY location <-> point (K-NN search).
  • Full Text Search: An alternative to GIN. Faster to build, smaller on disk, but slower reads.

SQL Example

-- Enable the btree_gist extension for mixing scalar and geometric types
CREATE EXTENSION IF NOT EXISTS btree_gist;

CREATE TABLE reservations (
    id SERIAL PRIMARY KEY,
    room_id INT,
    span TSRANGE
);

-- Create a GiST index to prevent overlapping bookings (Exclusion Constraint)
ALTER TABLE reservations
ADD CONSTRAINT no_overlap
EXCLUDE USING GIST (
    room_id WITH =,
    span WITH &&
);

7. Comparison Summary

Feature B-Tree GIN GiST
Primary Use Scalars (Int, Text, Date) Composites (JSON, Array, Text) Geometric, Ranges, Custom
Query Types =, <, >, ORDER BY @>, ?, &&, @@ <<, &<, &&, <->
Ordering Yes (Can retrieve sorted) No No (except K-NN distance)
Uniqueness Yes No No (supports Exclusion)
Write Speed Fast Slow (High overhead) Moderate
Read Speed Very Fast Fast Moderate
Size Small Compact (Compressed) Moderate

Note

This module explores the core principles of B-Tree vs GIN vs GiST, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

9. Visualizing Structures

B-Tree Structure (Depth)
[ 50 ]
↙     ↘
[ 10 - 49 ]
[ 51 - 100 ]
↙ ↘   ↙ ↘
Data
Data
Data
Data

Balanced depth, direct pointers to Heap.

GIN Structure (Inverted)
Key
Posting List (TIDs)
"admin"
[1, 5, 8, 12, ...]
"editor"
[2, 5, 9, 20, ...]
"viewer"
[3, 4, 6, 7, ...]

Keys map to lists of IDs. Efficient for containment.

10. Interactive: Index Selector

Select your data type and query pattern below to see the recommended index strategy and implementation code.