Observability: Logs, Metrics, Traces
[!NOTE] This module explores the core principles of Observability: Logs, Metrics, Traces, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.
1. The Three Pillars
Elasticsearch is the “L” in ELK (Elasticsearch, Logstash, Kibana). But now it does all three:
- Logs: Structured JSON logs. (Strength: Highest).
- Metrics: CPU usage, latency. (Strength: Good, but Prometheus is better for pure counters).
- Traces: Distributed Request IDs. (Strength: Great analysis).
2. ECS: Elastic Common Schema
If Team A logs {"user": "john"} and Team B logs {"username": "john"}, you cannot correlate.
ECS standardizes field names:
user.namehost.ipevent.durationhttp.request.method
Key Benefit: A single Kibana dashboard works for ALL services.
3. Interactive: Log Correlation
How do you debug a 500 error? By linking Logs to Traces.
4. Hardware Reality: High Cardinality
Metrics Warning:
If you log metrics.response_time with tags for user_id, and you have 100M users…
You create 100M unique time series.
This blows up the Cluster State and Memory.
Rule: Put high-cardinality data in Logs (Index), not Metrics (Aggregations).