B-Tree vs GIN vs GiST: Choosing the Right Index

Postgres is unique among relational databases for its extensible indexing system. While B-Tree covers 90% of use cases, understanding when to reach for GIN or GiST is what separates a database user from a database engineer.

This chapter breaks down the internal structures of these indexes and provides a decision framework for choosing the correct one.

1. B-Tree: The Default Workhorse

If you run CREATE INDEX without specifying a type, you get a B-Tree (Balanced Tree). It is designed to handle equality and range queries on data that can be sorted.

How it Works

A B-Tree index maintains an ordered tree structure. The leaf nodes contain pointers (TIDs) to the actual table rows (Heap).

Self-Balancing: The tree depth remains uniform, ensuring O(log n) lookup time.
Ordered: Leaf nodes are linked via a Doubly Linked List, allowing efficient bidirectional range scans (BETWEEN, >, <) and ordered retrieval (ORDER BY).

Rigor: Why O(log N)?

B-Trees are “fat” trees. Unlike a binary search tree (branching factor of 2), a B-Tree node can have hundreds of children.

Fan-out: If a node holds 300 keys, the tree height grows very slowly.
Math: Height h ≈ log_fanout(N). For 1 billion rows with a fan-out of 360, the height is only 3 or 4 (360^4 > 10^9).
Conclusion: Almost every lookup requires only 3-4 random I/Os (pages), regardless of table size.

Hardware Reality: The Page Size

Postgres stores data in 8KB Pages.

Node Locality: A B-Tree internal node is designed to fit exactly into a single 8KB page.
Efficiency: One disk read brings in hundreds of child pointers. This maximizes the utility of CPU cache lines and minimizes expensive disk seeks.

When to Use B-Tree

Primary Keys & Unique Constraints: B-Trees are the only index type that can enforce uniqueness.
Equality: WHERE id = 55
Ranges: WHERE created_at > '2023-01-01'
Sorting: ORDER BY created_at DESC
Prefix Matching: WHERE name LIKE 'Alice%' (only if the locale supports it)

SQL Example

-- Standard B-Tree creation
CREATE INDEX idx_users_email ON users (email);

-- Supports:
SELECT * FROM users WHERE email = 'alice@example.com';
SELECT * FROM users WHERE email > 'm';
SELECT * FROM users ORDER BY email;

2. B-Tree vs GIN vs GiST: Choosing the Right Index

3. GIN: Generalized Inverted Index

GIN indexes are “Inverted Indexes,” similar to those used by search engines (like Elasticsearch). They are designed for composite values where a single row might contain multiple keys.

How it Works

Instead of storing the composite value (like a JSON document or an array) in the tree, GIN extracts the elements (keys in JSON, words in text) and stores them as keys in a B-Tree-like structure. Each key points to a list of TIDs that contain that element.

Structure: Element -> List[TID1, TID2, TID3...]
Compression: The TID lists are compressed using posting lists, making GIN very space-efficient for repetitive data.
Write Overhead: Inserting a new row with a JSON document might require updating many different GIN keys. Postgres uses a “Pending List” buffer to optimize this, but writes are still slower than B-Tree.

When to Use GIN

JSONB Containment: WHERE data @> '{"role": "admin"}'
Array Intersection: WHERE tags && ARRAY['urgent', 'todo']
Full Text Search: WHERE doc_vector @@ to_tsquery('postgres')
Trigram Matching: WHERE name LIKE '%ostgre%' (with pg_trgm extension)

SQL Example

-- Create a table with JSONB data
CREATE TABLE logs (
    id SERIAL PRIMARY KEY,
    payload JSONB
);

-- Create a GIN index on the JSONB column
CREATE INDEX idx_logs_payload ON logs USING GIN (payload);

-- Supports containment queries:
-- Find logs where payload contains key "level" with value "error"
SELECT * FROM logs
WHERE payload @> '{"level": "error"}';

4. B-Tree vs GIN vs GiST: Choosing the Right Index

5. GiST: Generalized Search Tree

GiST is not a single specific data structure but a framework for building them. It allows indexing of data types that cannot be strictly ordered (like 2D geometric shapes).

How it Works

GiST is a balanced tree, but instead of strict separation (all values < X go left), it uses Lossy Signatures or Bounding Boxes.

Overlapping: In a spatial index (R-Tree), parent nodes contain a bounding box that covers all children. These boxes can overlap.
Search: The search might need to traverse multiple branches of the tree if the query rectangle overlaps with multiple node boundaries.
Lossy: GiST indexes might return “false positives” which the engine must filter out by checking the actual heap tuple.

When to Use GiST

Geometric Data (PostGIS): “Find all points within 5km of this location.”
Range Types: “Find all reservations that overlap with this time period.” (tsrange, daterange)
Nearest Neighbor: ORDER BY location <-> point (K-NN search).
Full Text Search: An alternative to GIN. Faster to build, smaller on disk, but slower reads.

SQL Example

-- Enable the btree_gist extension for mixing scalar and geometric types
CREATE EXTENSION IF NOT EXISTS btree_gist;

CREATE TABLE reservations (
    id SERIAL PRIMARY KEY,
    room_id INT,
    span TSRANGE
);

-- Create a GiST index to prevent overlapping bookings (Exclusion Constraint)
ALTER TABLE reservations
ADD CONSTRAINT no_overlap
EXCLUDE USING GIST (
    room_id WITH =,
    span WITH &&
);

6. B-Tree vs GIN vs GiST: Choosing the Right Index

7. Comparison Summary

Feature	B-Tree	GIN	GiST
Primary Use	Scalars (Int, Text, Date)	Composites (JSON, Array, Text)	Geometric, Ranges, Custom
Query Types	`=`, `<`, `>`, `ORDER BY`	`@>`, `?`, `&&`, `@@`	`<<`, `&<`, `&&`, `<->`
Ordering	Yes (Can retrieve sorted)	No	No (except K-NN distance)
Uniqueness	Yes	No	No (supports Exclusion)
Write Speed	Fast	Slow (High overhead)	Moderate
Read Speed	Very Fast	Fast	Moderate
Size	Small	Compact (Compressed)	Moderate

8. B-Tree vs GIN vs GiST: Choosing the Right Index

[!NOTE] This module explores the core principles of B-Tree vs GIN vs GiST, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

9. Visualizing Structures

B-Tree Structure (Depth)

[ 50 ]

↙ ↘

[ 10 - 49 ]

[ 51 - 100 ]

↙ ↘ ↙ ↘

Data

Balanced depth, direct pointers to Heap.

GIN Structure (Inverted)

Key

Posting List (TIDs)

"admin"

[1, 5, 8, 12, ...]

"editor"

[2, 5, 9, 20, ...]

"viewer"

[3, 4, 6, 7, ...]

Keys map to lists of IDs. Efficient for containment.

10. Interactive: Index Selector

Select your data type and query pattern below to see the recommended index strategy and implementation code.

1. Data Type