Backup & Restore Strategies
Data loss can happen due to hardware failure, human error (accidental truncation), or software bugs. A robust backup strategy is non-negotiable for production environments. Cassandra provides built-in mechanisms for backing up data without downtime.
1. Snapshots
A snapshot in Cassandra is a directory containing hard links to the SSTables that existed at the time the snapshot was taken.
First Principles: How Snapshots Work (Hard Links)
To understand why snapshots are nearly instant and space-efficient, you must understand the filesystem concept of an Inode.
- Inode: The data structure on disk that stores file metadata (size, permissions) and pointers to the actual data blocks.
- Filename: Just a pointer to an Inode.
- Hard Link: Creating a second filename that points to the same Inode.
When you run nodetool snapshot:
- Cassandra flushes all Memtables to disk (SSTables).
- It creates a
snapshotsdirectory. - It creates hard links for every existing SSTable in that directory.
Result: You have two filenames pointing to the same data blocks.
- Cost: Microseconds (just creating directory entries).
- Space: Zero (initially).
- Growth: Space usage only grows when the original files are compacted (deleted). The hard link keeps the old data blocks alive on disk until the snapshot is deleted.
Snapshot Storage Calculator
Visualize how storage usage grows as data is compacted (modified) after a snapshot.
2. Incremental Backups
While snapshots are point-in-time, incremental backups capture changes between snapshots.
Mechanism
- Enabled via
incremental_backups: trueincassandra.yaml. - Whenever a memtable is flushed to an SSTable, a hard link to that SSTable is also created in a
backupsdirectory. - Pros: Capture all data between snapshots.
- Cons: Can generate many small files, filling up inodes. Requires an external script to move these files to backup storage (e.g., S3) and delete them locally.
3. Restore Process
Restoring data in Cassandra is a manual process involving file manipulation.
Steps to Restore
- Truncate Table:
TRUNCATE keyspace.table(optional, if wiping clean). - Stop Node: Ensure no writes occur during restore.
- Clear Commit Logs: Remove files in
commitlogdirectory to prevent replay of old data. - Copy SSTables: Copy the snapshot SSTables into the table’s data directory.
- Refresh: Run
nodetool refreshto make Cassandra load the new SSTables without restarting.
[!CAUTION] If you restore data from a snapshot, ensure the schema (table structure) matches exactly what it was when the snapshot was taken.
4. Automating Backups (Code Examples)
import javax.management.MBeanServerConnection;
import javax.management.ObjectName;
import javax.management.remote.JMXConnector;
import javax.management.remote.JMXConnectorFactory;
import javax.management.remote.JMXServiceURL;
public class BackupManager {
private static final String JMX_URL = "service:jmx:rmi:///jndi/rmi://localhost:7199/jmxrmi";
private static final String SS_MBEAN = "org.apache.cassandra.db:type=StorageService";
public static void takeSnapshot(String tagName, String[] keyspaces) throws Exception {
JMXServiceURL url = new JMXServiceURL(JMX_URL);
try (JMXConnector jmxc = JMXConnectorFactory.connect(url, null)) {
MBeanServerConnection mbsc = jmxc.getMBeanServerConnection();
ObjectName ssName = new ObjectName(SS_MBEAN);
System.out.println("Taking snapshot: " + tagName);
// Invoke takeSnapshot(String tag, Map<String, String> options, String... keyspaceNames)
// Or simplified: takeSnapshot(String tag, String... keyspaceNames)
mbsc.invoke(ssName, "takeSnapshot",
new Object[]{tagName, keyspaces},
new String[]{"java.lang.String", "[Ljava.lang.String;"}
);
System.out.println("Snapshot complete.");
}
}
public static void main(String[] args) throws Exception {
takeSnapshot("daily_backup_20231027", new String[]{"my_keyspace"});
}
}
package main
import (
"fmt"
"os/exec"
"log"
)
// Since Cassandra doesn't have a native Go management API,
// using os/exec to call nodetool is a standard pattern for sidecars.
func takeSnapshot(tag string, keyspace string) error {
// nodetool snapshot -t <tag> <keyspace>
cmd := exec.Command("nodetool", "snapshot", "-t", tag, keyspace)
output, err := cmd.CombinedOutput()
if err != nil {
return fmt.Errorf("nodetool snapshot failed: %s\nOutput: %s", err, string(output))
}
fmt.Printf("Snapshot %s taken for keyspace %s\n", tag, keyspace)
return nil
}
func main() {
err := takeSnapshot("daily_backup_20231027", "my_keyspace")
if err != nil {
log.Fatalf("Backup failed: %v", err)
}
}
5. Diagram: Snapshot Hard Links
Figure 2: Hard links act as aliases for the same physical data.