Monitoring with Prometheus
In a monolithic world, you could just SSH into a server and run top.
In Kubernetes, with 500 pods appearing and disappearing dynamically, this is impossible.
Prometheus is the standard for Kubernetes monitoring. Unlike traditional systems that wait for agents to Push data, Prometheus Pulls (scrapes) metrics from your applications.
1. The Architecture: Pull vs. Push
Traditional Push Model (e.g., Datadog, NewRelic)
- Agent: Runs on the host or inside the app.
- Action: Sends data to a central server.
- Pros: Good for short-lived jobs, easy to setup behind firewalls.
- Cons: The agent can overwhelm the server (DDoS yourself).
Prometheus Pull Model
- Application: Exposes an HTTP endpoint (usually
/metrics). - Prometheus: Scrapes this endpoint every interval (e.g., 15s).
- Pros: Prometheus controls the load. If the app is down, the scrape fails (built-in up/down check).
- Service Discovery: Prometheus queries the Kubernetes API to find new Pods automatically.
2. Dimensional Metrics & PromQL
Prometheus stores data as Time Series. Each series is identified by a metric name and a set of key-value pairs called Labels.
Example: http_requests_total{method="POST", handler="/api/checkout", status="200"}
This allows for powerful querying using PromQL (Prometheus Query Language).
Interactive: PromQL Simulator
Visualize how PromQL functions transform raw counter data into meaningful rates.
PromQL Query Simulator
3. Instrumentation (Go & Java)
To make your application visible to Prometheus, you must instrument it.
Go (Using promhttp)
package main
import (
"net/http"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
var (
opsProcessed = prometheus.NewCounter(prometheus.CounterOpts{
Name: "myapp_processed_ops_total",
Help: "The total number of processed events",
})
)
func main() {
// Record metrics
opsProcessed.Inc()
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":2112", nil)
}
Java (Using Micrometer)
Micrometer is the “SLF4J for metrics”.
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
@Service
public class MyService {
private final Counter requestCounter;
public MyService(MeterRegistry registry) {
this.requestCounter = Counter.builder("myapp_requests_total")
.description("Total requests")
.tag("region", "us-east-1")
.register(registry);
}
public void handleRequest() {
requestCounter.increment();
}
}
4. Alerting with AlertManager
Prometheus is not just for graphs; it’s for alerting.
You define rules in YAML:
groups:
- name: example
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status="500"}[5m]) > 0.5
for: 10m
labels:
severity: page
annotations:
summary: "High error rate detected"
If the expression expr is true for 10m, AlertManager fires an alert to Slack, PagerDuty, or Email.
5. Summary
- Pull Model: Prometheus scrapes
/metricsendpoints. - Time Series: Data is stored as
Metric + Labels + Time + Value. - PromQL: Powerful language to aggregate and analyze metrics (Rate, Sum, Histogram).
- Instrumentation: Use standard libraries (Micrometer, Prometheus Client) to expose metrics.