Immortality: Restart Policies & Healing
In a distributed system, failure is inevitable. Processes crash. Memory leaks happen. The network blips. A robust system isn’t one that never crashes; it’s one that recovers automatically.
The “Heart Monitor” Analogy
Think of the Docker Daemon (specifically, containerd) as a hospital’s central monitoring station. When a container (the patient) is running, the daemon watches its vital signs (the root process PID 1). If that process exits, the daemon checks the container’s designated Restart Policy to decide whether to deploy the defibrillator (restart) or leave it stopped.
1. The 4 Policies
Docker provides four restart policies to control container resurrection. Choosing the wrong one can lead to silent failures or infinite crash loops that burn through CPU.
| Policy | Description | When to use it |
|---|---|---|
no |
Do not restart automatically. (Default) | One-off scripts, local debugging, or when an orchestrator (like Kubernetes) handles restarts. |
on-failure[:max-retries] |
Restart only if the process exits with a non-zero exit code (indicating an error). Optionally limit retries. | Batch jobs, data migrations, or background workers that might fail transiently but succeed on retry. |
always |
Always restart the container if it stops. If manually stopped, it will restart when the Docker daemon restarts. | Critical infrastructure components, web servers, databases. |
unless-stopped |
Like always, but if you manually docker stop it, the daemon remembers this state and won’t wake it up after a system reboot. |
Production services you might want to purposefully sideline for maintenance. |
A junior engineer once set a faulty database migration container to
always. It crashed immediately, but Docker kept resurrecting it. Later, they manually stopped it to fix the issue. During a routine server patch that weekend, the server rebooted. The Docker daemon started back up, saw the always policy, and happily resurrected the broken migration script, corrupting the production database before anyone noticed. Use unless-stopped unless you have a specific reason to override manual stops!
2. The Technical Reality: Exit Codes
To understand when Docker triggers an on-failure restart, you must understand exit codes. When process PID 1 terminates, it leaves behind an integer indicating why it died.
- Exit
0: Graceful shutdown. The process finished successfully or was asked to stop cleanly (e.g., viadocker stop). - Exit
1: Catch-all for application errors. The code threw an unhandled exception. - Exit
137: Fatal error (128 + 9SIGKILL). Typically means OOMKilled (Out of Memory). The host OS killed the container to protect the system. - Exit
143: Graceful termination (128 + 15SIGTERM). The container received a stop signal but took too long, so it was forced out.
Docker uses these codes to enforce the policy. on-failure ignores Exit 0 but acts on Exit 1 or 137.
3. Exponential Backoff
If your app crashes immediately on startup (a phenomenon known as CrashLoopBackOff in Kubernetes), Docker is smart enough not to restart it in a tight, CPU-burning loop.
Instead, Docker adds a multiplying delay between restart attempts: 100ms, 200ms, 400ms, 800ms… up to a maximum limit (typically 1 minute). If the container manages to stay alive for a while, this timer resets.
4. Interactive: Resurrection Lab
Test how different policies react to different exit scenarios. Notice how on-failure treats a simulated crash differently than a graceful stop.
5. Code Example: Defining Policies
You can enforce restart policies either during the imperative docker run command or declaratively via Docker Compose.
version: '3.8'
services:
web-server:
image: nginx
restart: always # Will always attempt to keep Nginx alive
ports:
- "80:80"
worker:
image: my-worker
restart: on-failure:3 # Try to restart up to 3 times if it crashes
command: ["./process-jobs"]
# Run a new container with a restart policy
docker run -d --restart unless-stopped redis
# Update an existing, already-running container's policy dynamically
docker update --restart always my-container