B-Tree vs GIN vs GiST: Choosing the Right Index
Postgres is unique among relational databases for its extensible indexing system. While B-Tree covers 90% of use cases, understanding when to reach for GIN or GiST is what separates a database user from a database engineer.
This chapter breaks down the internal structures of these indexes and provides a decision framework for choosing the correct one.
1. B-Tree: The Default Workhorse
If you run CREATE INDEX without specifying a type, you get a B-Tree (Balanced Tree). It is designed to handle equality and range queries on data that can be sorted.
How it Works
A B-Tree index maintains an ordered tree structure. The leaf nodes contain pointers (TIDs) to the actual table rows (Heap).
- Self-Balancing: The tree depth remains uniform, ensuring
O(log n)lookup time. - Ordered: Leaf nodes are linked via a Doubly Linked List, allowing efficient bidirectional range scans (
BETWEEN,>,<) and ordered retrieval (ORDER BY).
Rigor: Why O(log N)?
B-Trees are “fat” trees. Unlike a binary search tree (branching factor of 2), a B-Tree node can have hundreds of children.
- Fan-out: If a node holds 300 keys, the tree height grows very slowly.
- Math: Height
h≈log_fanout(N). For 1 billion rows with a fan-out of 360, the height is only 3 or 4 (360^4 > 10^9). - Conclusion: Almost every lookup requires only 3-4 random I/Os (pages), regardless of table size.
Hardware Reality: The Page Size
Postgres stores data in 8KB Pages.
- Node Locality: A B-Tree internal node is designed to fit exactly into a single 8KB page.
- Efficiency: One disk read brings in hundreds of child pointers. This maximizes the utility of CPU cache lines and minimizes expensive disk seeks.
When to Use B-Tree
- Primary Keys & Unique Constraints: B-Trees are the only index type that can enforce uniqueness.
- Equality:
WHERE id = 55 - Ranges:
WHERE created_at > '2023-01-01' - Sorting:
ORDER BY created_at DESC - Prefix Matching:
WHERE name LIKE 'Alice%'(only if the locale supports it)
SQL Example
-- Standard B-Tree creation
CREATE INDEX idx_users_email ON users (email);
-- Supports:
SELECT * FROM users WHERE email = 'alice@example.com';
SELECT * FROM users WHERE email > 'm';
SELECT * FROM users ORDER BY email;
2. B-Tree vs GIN vs GiST: Choosing the Right Index
3. GIN: Generalized Inverted Index
GIN indexes are “Inverted Indexes,” similar to those used by search engines (like Elasticsearch). They are designed for composite values where a single row might contain multiple keys.
How it Works
Instead of storing the composite value (like a JSON document or an array) in the tree, GIN extracts the elements (keys in JSON, words in text) and stores them as keys in a B-Tree-like structure. Each key points to a list of TIDs that contain that element.
- Structure:
Element -> List[TID1, TID2, TID3...] - Compression: The TID lists are compressed using posting lists, making GIN very space-efficient for repetitive data.
- Write Overhead: Inserting a new row with a JSON document might require updating many different GIN keys. Postgres uses a “Pending List” buffer to optimize this, but writes are still slower than B-Tree.
When to Use GIN
- JSONB Containment:
WHERE data @> '{"role": "admin"}' - Array Intersection:
WHERE tags && ARRAY['urgent', 'todo'] - Full Text Search:
WHERE doc_vector @@ to_tsquery('postgres') - Trigram Matching:
WHERE name LIKE '%ostgre%'(withpg_trgmextension)
SQL Example
-- Create a table with JSONB data
CREATE TABLE logs (
id SERIAL PRIMARY KEY,
payload JSONB
);
-- Create a GIN index on the JSONB column
CREATE INDEX idx_logs_payload ON logs USING GIN (payload);
-- Supports containment queries:
-- Find logs where payload contains key "level" with value "error"
SELECT * FROM logs
WHERE payload @> '{"level": "error"}';
4. B-Tree vs GIN vs GiST: Choosing the Right Index
5. GiST: Generalized Search Tree
GiST is not a single specific data structure but a framework for building them. It allows indexing of data types that cannot be strictly ordered (like 2D geometric shapes).
How it Works
GiST is a balanced tree, but instead of strict separation (all values < X go left), it uses Lossy Signatures or Bounding Boxes.
- Overlapping: In a spatial index (R-Tree), parent nodes contain a bounding box that covers all children. These boxes can overlap.
- Search: The search might need to traverse multiple branches of the tree if the query rectangle overlaps with multiple node boundaries.
- Lossy: GiST indexes might return “false positives” which the engine must filter out by checking the actual heap tuple.
When to Use GiST
- Geometric Data (PostGIS): “Find all points within 5km of this location.”
- Range Types: “Find all reservations that overlap with this time period.” (
tsrange,daterange) - Nearest Neighbor:
ORDER BY location <-> point(K-NN search). - Full Text Search: An alternative to GIN. Faster to build, smaller on disk, but slower reads.
SQL Example
-- Enable the btree_gist extension for mixing scalar and geometric types
CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE TABLE reservations (
id SERIAL PRIMARY KEY,
room_id INT,
span TSRANGE
);
-- Create a GiST index to prevent overlapping bookings (Exclusion Constraint)
ALTER TABLE reservations
ADD CONSTRAINT no_overlap
EXCLUDE USING GIST (
room_id WITH =,
span WITH &&
);
6. B-Tree vs GIN vs GiST: Choosing the Right Index
7. Comparison Summary
| Feature | B-Tree | GIN | GiST |
|---|---|---|---|
| Primary Use | Scalars (Int, Text, Date) | Composites (JSON, Array, Text) | Geometric, Ranges, Custom |
| Query Types | =, <, >, ORDER BY |
@>, ?, &&, @@ |
<<, &<, &&, <-> |
| Ordering | Yes (Can retrieve sorted) | No | No (except K-NN distance) |
| Uniqueness | Yes | No | No (supports Exclusion) |
| Write Speed | Fast | Slow (High overhead) | Moderate |
| Read Speed | Very Fast | Fast | Moderate |
| Size | Small | Compact (Compressed) | Moderate |
8. B-Tree vs GIN vs GiST: Choosing the Right Index
[!NOTE] This module explores the core principles of B-Tree vs GIN vs GiST, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
9. Visualizing Structures
Balanced depth, direct pointers to Heap.
Keys map to lists of IDs. Efficient for containment.
10. Interactive: Index Selector
Select your data type and query pattern below to see the recommended index strategy and implementation code.