Querying & Relevance Engineering — Review & Checklist

[!NOTE] This module explores the core principles of Querying & Relevance Engineering — Review & Checklist, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

Key Takeaways

Filter vs. Query Context: The fundamental dichotomy. Use filter (cached, boolean) for exact matches (status, IDs, dates). Use query (scored, computationally expensive) only when relevance ranking is required (full-text search).
BitSet Caching: In Filter Context, Elasticsearch caches results in highly efficient BitSets, enabling ultra-fast bitwise AND operations for complex boolean logic.
BM25 Scoring Fundamentals: _score is driven by Term Frequency (TF - saturates quickly), Inverse Document Frequency (IDF - rewards rarity), and Field Length Norm (rewards shorter fields).
Aggregations Architecture: Think of aggregations as SQL GROUP BY. They are divided into Buckets (grouping docs) and Metrics (calculating stats within buckets).
Global Ordinals: For fast aggregations on keyword strings, Elasticsearch uses global ordinals (mapping strings to integers). The first aggregation can be slow; use eager_global_ordinals to pre-load for low-latency needs.

Flashcards

Test your understanding of the core concepts.

Query Context

What question does this context answer, and what is the output?

Answers "How well does this match?"

Outputs a calculated `_score` (Float). It is slower because it calculates relevance.

Filter Context

What question does this context answer, and how does it achieve high performance?

Answers "Does this match? (Yes/No)".

It ignores scoring entirely and caches the results in memory-efficient BitSets for rapid boolean operations.

BM25: TF Saturation

How does BM25 handle Term Frequency differently from Classic TF-IDF?

BM25 applies a non-linear saturation curve. Finding a term 100 times is only slightly better than finding it 10 times, preventing spammy documents from dominating.

Global Ordinals

What are they, and why are they critical for Aggregations?

A mapping of unique strings to integer IDs. They allow ES to group by integers rather than comparing string bytes, drastically speeding up bucket aggregations on high-cardinality fields.

Cheat Sheet

Concept	The “Why”	When to Use
`bool` query	Combines logic. `must` (score), `filter` (cache), `should` (boost), `must_not` (exclude).	The foundation of 99% of complex Elasticsearch queries.
BitSets	Arrays of 1s and 0s representing matched documents. Executed via SIMD instructions.	Underpins the blazing speed of `filter` context.
BM25	The math behind `_score`. Relies on TF, IDF, and Field Length.	The default scoring algorithm for full-text relevance.
Buckets	Bins documents (e.g., `terms`, `date_histogram`). Similar to SQL `GROUP BY`.	Creating faceted navigation or segmenting data.
Metrics	Calculates numbers (e.g., `avg`, `sum`) inside buckets. Similar to SQL `SELECT AVG()`.	Extracting statistics from grouped data.

Quick Revision

Always prefer Filter Context unless you explicitly need documents ranked by relevance.
The bool query is your orchestrator: use filter for hard constraints and must/should for relevance.
BM25 rewards rarity and brevity: A rare word in a short field yields the highest score.
Aggregations are dual-purpose: They return the search results AND the analytical summary in a single round-trip.
Beware the first aggregation penalty: If latency is critical, use eager_global_ordinals to pre-build the string-to-int mappings for aggregations.

Next Steps

Now that you understand how to query and rank documents efficiently, it’s time to learn how to scale the system that handles these requests.

→ Continue to Scaling & Operations

Glossary Link

Need a refresher on specific terminology? View the Elasticsearch Glossary

Querying & Relevance Engineering — Review & Checklist

Querying & Relevance Engineering — Review & Checklist

Key Takeaways

Flashcards

Query Context

Filter Context

BM25: TF Saturation

Global Ordinals

Cheat Sheet

Quick Revision

Next Steps

Glossary Link

Found this lesson helpful?