Healthchecks: The Pulse of your Application

Imagine a Night Watchman guarding a factory. If he only checks whether the building is standing (process is running), he might miss that all the machinery inside has caught fire.

A container can be Running (PID 1 exists) but Broken (Deadlock, infinite loop, database connection lost). To Docker, everything looks fine because the process is still there. To your users, the site is down.

Docker’s HEALTHCHECK is the Night Watchman actually walking inside, checking the pressure gauges, and ensuring the machinery is fully operational.

1. The HEALTHCHECK Instruction

You can teach Docker how to test your application.

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost/healthz || exit 1
  • interval: How often to run the check.
  • timeout: If the check takes longer than this, it fails.
  • start-period: Grace period for the app to boot up (failures here are ignored).
  • retries: How many consecutive failures before marking the container unhealthy.

Crucial Reality Check: HEALTHCHECK does not restart your container by itself.

If a container fails its healthcheck, Docker simply changes its status in docker ps from (healthy) to (unhealthy).

To achieve Self-Healing (automatic restarts), you must combine HEALTHCHECK with orchestration tools like Docker Swarm, Kubernetes, or specific Docker Compose configurations (like depends_on: condition: service_healthy or Autoheal containers).


2. Interactive: Health Probe Simulator

Simulate a production scenario. Break the app and watch how Docker detects the failure after the configured retries.

Docker Daemon
Idle
My App
healthy
Retries Failed: 0 / 3

3. Code Example: Implementing /healthz

Your application needs to expose an endpoint that returns HTTP 200 only if the app is truly ready to accept traffic.

Spring Boot provides this out-of-the-box with Spring Boot Actuator.

  1. Add dependency: spring-boot-starter-actuator
  2. Configure application.properties:
management.endpoints.web.exposure.include=health
management.endpoint.health.probes.enabled=true

Result: Docker can poll http://localhost:8080/actuator/health.

In Go, you implement a simple handler.

package main

import (
    "database/sql"
    "fmt"
    "net/http"
)

var db *sql.DB

func healthHandler(w http.ResponseWriter, r *http.Request) {
    // 1. Check DB connection
    if err := db.Ping(); err != nil {
        w.WriteHeader(http.StatusInternalServerError)
        w.Write([]byte("Unhealthy: DB Down"))
        return
    }

    // 2. Success
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("OK"))
}

func main() {
    http.HandleFunc("/healthz", healthHandler)
    http.ListenAndServe(":8080", nil)
}

4. Advanced: Liveness vs. Readiness Probes

As you transition from basic Docker to orchestrators like Kubernetes, you must split your healthchecks into two distinct concepts:

  1. Liveness Probe: “Are you dead?”
    • Checks: Basic process health (e.g., no deadlocks).
    • Action if Failed: The orchestrator kills and restarts the container.
    • Endpoint Example: /ping (Just returns 200 OK immediately).
  2. Readiness Probe: “Are you ready to receive traffic?”
    • Checks: Dependencies, database connections, cache warmup.
    • Action if Failed: The orchestrator stops sending network traffic to this container, but leaves it running so it has time to recover.
    • Endpoint Example: /healthz (Checks DB connection like our example above).

If you use a Readiness check (DB connection) for a Liveness probe, a temporary database blip will cause your orchestrator to aggressively kill and restart all your app containers, making the outage much worse!