AOF (Append Only File)

The Problem: Imagine a high-traffic e-commerce checkout using Redis. You are relying on RDB snapshots configured to save every 5 minutes. If the server crashes 4 minutes and 59 seconds after the last snapshot, you lose all orders placed in that window. For a caching layer, this is an annoyance; for a primary database or critical session store, it is catastrophic.

While RDB is great for point-in-time backups, it lacks the durability required for zero-data-loss applications. For true durability, Redis provides AOF (Append Only File) Persistence.

1. How AOF Works

Instead of dumping the entire memory state, AOF saves every Write Operation as a command log.

The Anatomy of an AOF Log (RESP)

Under the hood, AOF does not just write raw string commands like SET session:1 "active". It writes them in RESP (Redis Serialization Protocol) format. This ensures fast, deterministic parsing during recovery.

For example, the command SET key1 val1 is translated into the AOF file like this:

*3\r\n
$3\r\n
SET\r\n
$4\r\n
key1\r\n
$4\r\n
val1\r\n
  • *3: Indicates an array of 3 arguments (SET, key1, val1).
  • $3, $4: Indicates the byte length of the upcoming string.
  • \r\n: The CRLF terminator used by RESP to parse boundaries.

This exact byte-level representation makes AOF extremely resilient. Even if an AOF file is partially corrupted (e.g., truncated due to a sudden power loss), the redis-check-aof tool can easily discard the incomplete trailing transaction by looking at the protocol boundaries.

The Bank Statement Analogy: Think of RDB as taking a photograph of your bank account balance once a day (e.g., “$100”). Think of AOF as the transaction ledger recording every single activity (e.g., “Deposited $50”, “Withdrew $20”, “Deposited $70”). If you lose your current balance, you can simply replay the transaction ledger from zero to reconstruct the exact amount.

  • Whenever a client runs a mutating command like SET key val, Redis executes it in memory and then appends that command to the AOF file on disk.
  • If Redis crashes, upon restart, it sequentially replays the entire log from the AOF file to perfectly reconstruct the state of the database.

2. The fsync Strategy (Hardware Realities)

Writing to a physical disk on every single command introduces massive I/O overhead. To understand why, we must look at the hardware pipeline of a write operation:

  1. Application Buffer (Redis): Redis appends the RESP command to its internal AOF buffer.
  2. OS Page Cache: Redis calls the write() system call, moving the data into the Linux kernel’s page cache. At this point, if the Redis process crashes, the data is safe (the OS will flush it eventually). However, if the entire server loses power, the data in RAM is lost.
  3. Disk Controller (fsync): To guarantee durability, Redis must issue an fsync() system call. This forces the OS to flush the page cache to the physical SSD/HDD controller, ensuring the data is magnetically or electronically committed.

Because fsync blocks the thread until the hardware acknowledges the write, you must configure a policy to balance durability and performance.

Policy Behavior Durability Performance Use Case
always Fsyncs after every single write operation. Maximum (Zero data loss) Slowest (Disk I/O bottleneck) Financial ledgers where absolute data integrity is non-negotiable.
everysec Fsyncs once per second in a background thread. High (Max 1s data loss) Fast (Near in-memory speed) Recommended. The default for most applications. Balances safety and speed.
no Lets the OS decide when to flush (usually every 30s). Lowest (High risk on crash) Fastest When AOF is only used for slow, passive backups, not crash recovery.

3. AOF Rewrite (Compaction)

Because AOF logs every operation, the file grows infinitely. If you increment a counter 100 times, you get 100 INCR commands in the file, taking up disk space and significantly slowing down recovery time.

To solve this, Redis periodically performs an AOF Rewrite (BGREWRITEAOF).

How it works (The Consolidation):

  1. Redis forks a child process.
  2. The child process reads the current in-memory data (not the old AOF file).
  3. It writes a brand new, minimal AOF file using the shortest possible commands. (e.g., Instead of 100 INCR commands, it just writes SET counter 100).
  4. The Double Write: Any new writes that happen during the rewrite are appended to the regular AOF buffer (so the old file remains safe) AND also temporarily held in an AOF Rewrite Buffer.
  5. Once the child process finishes writing the new compact AOF, it flushes the AOF Rewrite Buffer into the new file to catch it up.
  6. Redis atomically swaps the old file for the new one.

4. Interactive: Log Replayer

Simulate a server crash and watch how Redis reconstructs memory by replaying the AOF log.

AOF File
SET session:1 "active"
INCR page_views
SET user:99 "alice"
REDIS UP

5. RDB vs AOF: Which to choose?

Feature RDB (Snapshots) AOF (Append Only)
Data Safety Low (Loss of minutes of data) High (Zero to 1s data loss)
File Size Small & Compact Large (Even with Rewrites)
Recovery Speed Fast (Directly load memory) Slow (Replaying log sequentially)
Performance Impact Minimal (Background fork) Moderate (Disk I/O from fsync)

The Verdict (Use Both): Redis is designed to work with both persistence methods simultaneously. Use RDB for daily/hourly backups and disaster recovery to a secondary location. Use AOF for high durability and immediate crash recovery on the primary node. Upon restart, Redis will automatically prioritize the AOF file to reconstruct the most complete state.