Backup & Restore Strategies

Data loss can happen due to hardware failure, human error (accidental truncation), or software bugs. A robust backup strategy is non-negotiable for production environments. Cassandra provides built-in mechanisms for backing up data without downtime.

1. Snapshots

A snapshot in Cassandra is a directory containing hard links to the SSTables that existed at the time the snapshot was taken.

First Principles: How Snapshots Work (Hard Links)

To understand why snapshots are nearly instant and space-efficient, you must understand the filesystem concept of an Inode.

Inode: The data structure on disk that stores file metadata (size, permissions) and pointers to the actual data blocks.
Filename: Just a pointer to an Inode.
Hard Link: Creating a second filename that points to the same Inode.

When you run nodetool snapshot:

Cassandra flushes all Memtables to disk (SSTables).
It creates a snapshots directory.
It creates hard links for every existing SSTable in that directory.

Result: You have two filenames pointing to the same data blocks.

Cost: Microseconds (just creating directory entries).
Space: Zero (initially).
Growth: Space usage only grows when the original files are compacted (deleted). The hard link keeps the old data blocks alive on disk until the snapshot is deleted.

Snapshot Storage Calculator

Visualize how storage usage grows as data is compacted (modified) after a snapshot.

Initial Data Size (GB)

Data Churn / Compaction (%)

              0% (Static)
              50%
              100% (Full Churn)
          

Current Live Data: 100 GB

Snapshot Hold (Old Inodes): 50 GB

Total Disk Usage: 150 GB

2. Incremental Backups

While snapshots are point-in-time, incremental backups capture changes between snapshots.

Mechanism

Enabled via incremental_backups: true in cassandra.yaml.
Whenever a memtable is flushed to an SSTable, a hard link to that SSTable is also created in a backups directory.
Pros: Capture all data between snapshots.
Cons: Can generate many small files, filling up inodes. Requires an external script to move these files to backup storage (e.g., S3) and delete them locally.

3. Restore Process

Restoring data in Cassandra is a manual process involving file manipulation.

Steps to Restore

Truncate Table: TRUNCATE keyspace.table (optional, if wiping clean).
Stop Node: Ensure no writes occur during restore.
Clear Commit Logs: Remove files in commitlog directory to prevent replay of old data.
Copy SSTables: Copy the snapshot SSTables into the table’s data directory.
Refresh: Run nodetool refresh to make Cassandra load the new SSTables without restarting.

[!CAUTION] If you restore data from a snapshot, ensure the schema (table structure) matches exactly what it was when the snapshot was taken.

4. Automating Backups (Code Examples)

Java (JMX) Go (Exec)

import javax.management.MBeanServerConnection;
import javax.management.ObjectName;
import javax.management.remote.JMXConnector;
import javax.management.remote.JMXConnectorFactory;
import javax.management.remote.JMXServiceURL;

public class BackupManager {
    private static final String JMX_URL = "service:jmx:rmi:///jndi/rmi://localhost:7199/jmxrmi";
    private static final String SS_MBEAN = "org.apache.cassandra.db:type=StorageService";

    public static void takeSnapshot(String tagName, String[] keyspaces) throws Exception {
        JMXServiceURL url = new JMXServiceURL(JMX_URL);
        try (JMXConnector jmxc = JMXConnectorFactory.connect(url, null)) {
            MBeanServerConnection mbsc = jmxc.getMBeanServerConnection();
            ObjectName ssName = new ObjectName(SS_MBEAN);

            System.out.println("Taking snapshot: " + tagName);

            // Invoke takeSnapshot(String tag, Map<String, String> options, String... keyspaceNames)
            // Or simplified: takeSnapshot(String tag, String... keyspaceNames)

            mbsc.invoke(ssName, "takeSnapshot",
                new Object[]{tagName, keyspaces},
                new String[]{"java.lang.String", "[Ljava.lang.String;"}
            );
            System.out.println("Snapshot complete.");
        }
    }

    public static void main(String[] args) throws Exception {
        takeSnapshot("daily_backup_20231027", new String[]{"my_keyspace"});
    }
}

package main

import (
    "fmt"
    "os/exec"
    "log"
)

// Since Cassandra doesn't have a native Go management API,
// using os/exec to call nodetool is a standard pattern for sidecars.

func takeSnapshot(tag string, keyspace string) error {
    // nodetool snapshot -t <tag> <keyspace>
    cmd := exec.Command("nodetool", "snapshot", "-t", tag, keyspace)

    output, err := cmd.CombinedOutput()
    if err != nil {
        return fmt.Errorf("nodetool snapshot failed: %s\nOutput: %s", err, string(output))
    }

    fmt.Printf("Snapshot %s taken for keyspace %s\n", tag, keyspace)
    return nil
}

func main() {
    err := takeSnapshot("daily_backup_20231027", "my_keyspace")
    if err != nil {
        log.Fatalf("Backup failed: %v", err)
    }
}

5. Diagram: Snapshot Hard Links

Figure 2: Hard links act as aliases for the same physical data.