Observability: Logs, Metrics, Traces

[!NOTE] This module explores the core principles of Observability: Logs, Metrics, Traces, deriving solutions from first principles and hardware constraints to build world-class, production-ready expertise.

1. The Three Pillars

Elasticsearch is the “L” in ELK (Elasticsearch, Logstash, Kibana). But now it does all three:

  1. Logs: Structured JSON logs. (Strength: Highest).
  2. Metrics: CPU usage, latency. (Strength: Good, but Prometheus is better for pure counters).
  3. Traces: Distributed Request IDs. (Strength: Great analysis).

2. ECS: Elastic Common Schema

If Team A logs {"user": "john"} and Team B logs {"username": "john"}, you cannot correlate. ECS standardizes field names:

  • user.name
  • host.ip
  • event.duration
  • http.request.method

Key Benefit: A single Kibana dashboard works for ALL services.


3. Interactive: Log Correlation

How do you debug a 500 error? By linking Logs to Traces.

2023-10-01 12:00:01 INFO [Frontend] Request Received id=89f4a1
2023-10-01 12:00:02 INFO [Backend] Processing... id=89f4a1
2023-10-01 12:00:02 INFO [DB] Query executed id=89f4a1
2023-10-01 12:00:03 ERROR [Backend] NullPointerEx id=89f4a1

4. Hardware Reality: High Cardinality

Metrics Warning: If you log metrics.response_time with tags for user_id, and you have 100M users… You create 100M unique time series. This blows up the Cluster State and Memory. Rule: Put high-cardinality data in Logs (Index), not Metrics (Aggregations).