Query DSL: Speaking JSON
[!NOTE] This module explores the core principles of Query DSL: Speaking JSON, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Two Contexts: Score vs No-Score
Every clause in Elasticsearch runs in one of two contexts. Mixing them up is the #1 cause of slow clusters.
The “Sommelier vs. Bouncer” Analogy:
- Query Context is like a Sommelier tasting wine: “How good is this wine on a scale of 1-100?” It requires careful evaluation, nuanced calculations (scoring algorithms), and is fundamentally a slower, comparative process.
- Filter Context is like a Bouncer checking IDs at a club: “Are you over 21? Yes or No.” It’s an exact, binary decision that is extremely fast and can be easily remembered (cached) for the rest of the night.
| Feature | Query Context ("query": ...) |
Filter Context ("filter": ...) |
|---|---|---|
| Question | “How well does this match?” | “Does this match? (Yes/No)” |
| Output | _score (Float) |
Boolean (True/False) |
| Performance | Slower (Calculates Relevance) | Fast (Cached in BitSet) |
| Use Case | Full-text search (“best pizza”) | Exact filtering (“status=active”) |
Golden Rule: If you don’t care about ranking (e.g., filtering by Date, Status, ID), ALWAYS use Filter Context.
2. The Compound bool Query
The bool query is the wrapper for combining logic. It has 4 clauses:
must(AND): Must match. Contributes to score.filter(AND): Must match. Ignores score. Cached.should(OR): Nice to have. Boosts score if present.must_not(NOT): Must NOT match. Ignores score. Cached.
Pattern:
{
"query": {
"bool": {
"must": [ { "match": { "title": "pizza" }} ],
"filter": [ { "term": { "city": "NYC" }} ]
}
}
}
3. Interactive: The BitSet Cache
Elasticsearch caches Filters using BitSets (Arrays of 0s and 1s). See how intersecting queries works.
4. Hardware Reality: CPU Instructions & Roaring Bitmaps
Why are Filter Contexts exponentially faster? Under the hood, Elasticsearch (via Apache Lucene) caches filters using Roaring Bitmaps, a highly compressed data structure for sets of integers (like Document IDs).
- Memory Efficiency: Instead of storing a raw array of millions of 0s and 1s, Roaring Bitmaps compress dense regions (where many documents match) and store sparse regions efficiently.
- SIMD Operations: When you combine multiple filters (e.g.,
status=activeANDcategory=tech), the CPU doesn’t iterate through documents one by one. It uses SIMD (Single Instruction, Multiple Data) CPU instructions to execute bitwiseAND/ORoperations on blocks of 256 bits simultaneously in a single CPU cycle. - Math vs. Bits: Filter queries (
BitSetintersections) use integer bitwise arithmetic (fast as light). Query context requires calculating term frequencies, inverse document frequencies (TF-IDF/BM25), and executing heavy floating-point operations for every single matched document (computationally expensive).