Seccomp & AppArmor: Kernel Hardening

Imagine you are running a web application in a container. An attacker finds a Remote Code Execution (RCE) vulnerability in your code. They get a shell inside your container. Now what?

Because a container is just a process sharing the host’s kernel, the attacker’s next move is Container Escape. They will try to talk directly to the Linux kernel using obscure System Calls (syscalls) to exploit kernel bugs, mount host filesystems, or manipulate kernel keys.

To prevent this, we must build a defense-in-depth strategy around the kernel. We use Seccomp to filter what actions the container can perform, and AppArmor to restrict what resources it can access.

1. Seccomp: The Bouncer at the Kernel Door

Seccomp (Secure Computing Mode) is a Linux kernel feature that acts as a strict firewall for system calls.

The Analogy: Think of the Linux Kernel as a highly secure corporate building, and System Calls as the doors to different departments. Seccomp is the security bouncer at the main entrance. When a process (the container) tries to enter a specific door (make a syscall like ptrace or reboot), the bouncer checks a strict whitelist (the Seccomp profile). If the door isn’t on the list, the bouncer immediately kicks the process out (kills it) or denies entry (returns an error).

How Seccomp Works Internally

Under the hood, Docker translates your JSON Seccomp profiles into BPF (Berkeley Packet Filter) bytecode. This bytecode is loaded into the kernel. Every time your container makes a syscall, the kernel runs this extremely fast BPF program to evaluate whether to ALLOW or ERRNO (block) the call.

  • Default Profile: Docker applies a default profile that blocks approximately 44 dangerously obscure syscalls (out of ~300+), such as reboot, acct, and keyctl.
  • Custom Profile: For high-security environments, you create a stricter JSON profile using the Principle of Least Privilege—allowing only the specific syscalls your application needs to function.

2. Interactive: Syscall Firewall Simulator

Act as the Seccomp Bouncer. Decide which system calls to Allow or Block based on the security policy. Hint: Only allow safe, standard operations like reading and writing.

Incoming Syscall:
waiting...
Click Start to begin
[KERNEL] System ready. Seccomp BPF filter active.

3. Implementing Seccomp Profiles

A Seccomp profile is defined as a JSON document. It specifies a default action (usually to block everything) and a list of specific syscalls to allow.

The Anatomy of a Seccomp Profile (seccomp.json)

This strict profile whitelists essential network and execution calls, and denies everything else by returning SCMP_ACT_ERRNO (Operation not permitted).

{
    "defaultAction": "SCMP_ACT_ERRNO",
    "architectures": [
        "SCMP_ARCH_X86_64",
        "SCMP_ARCH_X86",
        "SCMP_ARCH_X32"
    ],
    "syscalls": [
        {
            "names": [
                "accept",
                "bind",
                "clone",
                "execve",
                "exit",
                "exit_group",
                "listen",
                "read",
                "socket",
                "write"
            ],
            "action": "SCMP_ACT_ALLOW"
        }
    ]
}

Applying the Profile

You attach the profile to a container at runtime using the --security-opt flag:

docker run --rm -it --security-opt seccomp=seccomp.json alpine sh

If the container tries to execute a command that requires a blocked syscall (like mkdir, which relies on the mkdirat syscall), the kernel intercepts it:

# Inside the container:
/ $ mkdir test_dir
mkdir: can't create directory 'test_dir': Operation not permitted

4. AppArmor: The Filesystem VIP Badge

While Seccomp restricts actions (syscalls), AppArmor restricts resources (file paths, network access, and capabilities). It uses Mandatory Access Control (MAC) profiles loaded directly into the kernel.

The Analogy: If Seccomp is the bouncer at the club entrance checking your ID, AppArmor is the electronic VIP badge. Even if you made it inside the club (the container is running), the badge determines which specific rooms you can enter. If you try to open the door to the “Server Room” (like /etc/shadow) without the right permissions on your badge, the door stays locked.

AppArmor operates on file paths, making it generally easier to write profiles for than SELinux (which operates on inode labels).

Step-by-Step: Crafting an AppArmor Profile

Let’s look at the anatomy of an AppArmor profile designed to lock down an Nginx container.

Save this file as /etc/apparmor.d/docker-nginx:

#include <tunables/global>

# Define the profile name and its flags
profile docker-nginx flags=(attach_disconnected,mediate_deleted) {
  # Include base Docker abstractions (common safe operations)
  #include <abstractions/base>

  # 1. Network Access: Allow standard web traffic
  network inet tcp,
  network inet udp,
  network inet icmp,

  # 2. Hard Deny: Explicitly block writing to sensitive host paths
  deny /etc/** w,
  deny /proc/** w,
  deny /sys/** w,

  # 3. Read Access: Allow reading the static web files
  /usr/share/nginx/html/ r,
  /usr/share/nginx/html/** r,

  # 4. Write Access: Allow writing strictly to log and PID files
  /var/log/nginx/* w,
  /run/nginx.pid w,
}

Applying the AppArmor Profile

  1. Load the profile into the Kernel: AppArmor profiles must be compiled and loaded into the host kernel using apparmor_parser.
    sudo apparmor_parser -r -W /etc/apparmor.d/docker-nginx
    
  2. Run the container with the loaded profile:
    docker run --security-opt apparmor=docker-nginx nginx
    

If an attacker gains RCE and tries to echo a malicious payload into a system file, AppArmor immediately blocks it, logging the attempt in the host’s /var/log/syslog or dmesg.


5. Defense in Depth: Seccomp vs. AppArmor

Why do we need both? Because they solve different problems at different layers of the kernel interface.

Feature Seccomp AppArmor
Primary Scope Kernel System Calls (Actions) File Paths, Capabilities, Network (Resources)
Granularity Low (allows/blocks entire syscall globally) High (allows/blocks access to specific file paths)
Underlying Mechanism BPF (Berkeley Packet Filter) Programs MAC (Mandatory Access Control) Path-based
Complexity to Write Hard (requires deep understanding of syscalls) Moderate (requires knowing which files the app touches)
Example Scenario “Block ptrace() to prevent process injection.” “Block writes to /etc/shadow to prevent credential theft.”
Best Practice: Combine Them!

Defense in Depth means assuming one layer will fail. Seccomp prevents attackers from exploiting the kernel itself. AppArmor prevents them from tampering with the host filesystem. Together, they create a formidable, nearly inescapable sandbox.