Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Reaper

Reaper is a lightweight Kubernetes container-less runtime that executes commands directly on cluster nodes without traditional container isolation.

Think of it as a way to run host-native processes through Kubernetes’ orchestration layer — standard Kubernetes API (Pods, kubectl logs, kubectl exec) with full host access.

What Reaper Provides

  • Standard Kubernetes API (Pods, kubectl logs, kubectl exec)
  • Process lifecycle management (start, stop, restart)
  • Shared overlay filesystem for workload isolation from host changes
  • Kubernetes volumes (ConfigMap, Secret, hostPath, emptyDir)
  • Sensitive host file filtering (SSH keys, passwords, SSL keys)
  • Interactive sessions (PTY support)
  • UID/GID switching with securityContext
  • Per-pod configuration via Kubernetes annotations
  • Custom Resource Definitions: ReaperPod (simplified workloads), ReaperOverlay (overlay lifecycle management), ReaperDaemonJob (run jobs on every node with dependency ordering)
  • Helm chart for one-command installation and configuration

What Reaper Does NOT Provide

  • Container isolation (namespaces, cgroups)
  • Resource limits (CPU, memory)
  • Network isolation (uses host networking)
  • Container image pulling

Use Cases

  • HPC workloads: Slurm worker daemons that need direct CPU/GPU access
  • Cluster maintenance: Ansible playbooks and system configuration tasks
  • Privileged system utilities: Direct hardware access, device management
  • Node monitoring: Host-level metric exporters (node_exporter, etc.)
  • Legacy applications: Programs that require host-level access
  • Development and debugging: Interactive host access via kubectl

Disclaimer

Reaper is an experimental, personal project built to explore what’s possible with AI-assisted development. It is under continuous development with no stability guarantees. Use entirely at your own risk.

Source Code

The source code is available at github.com/miguelgila/reaper.

Installation

The recommended way to install Reaper on a Kubernetes cluster is via the Helm chart:

helm upgrade --install reaper deploy/helm/reaper/ \
  --namespace reaper-system --create-namespace \
  --wait --timeout 120s

This installs:

  • Node DaemonSet: Copies shim + runtime binaries to every node
  • CRD Controller: Watches ReaperPod resources and creates Pods
  • Agent DaemonSet: Health monitoring and Prometheus metrics
  • RuntimeClass: Registers reaper-v2 with Kubernetes
  • RBAC: Required roles and bindings

See Helm Chart Reference for configuration values.

Playground (Local Testing)

Spin up a 3-node Kind cluster with Reaper pre-installed. No local Rust toolchain needed — compilation happens inside Docker:

# Build from source
./scripts/setup-playground.sh

# Or use pre-built images from GHCR
./scripts/setup-playground.sh --release

# Use a specific release version
./scripts/setup-playground.sh --release v0.2.14

# Clean up
./scripts/setup-playground.sh --cleanup

Building from Source

Reaper requires Rust. The toolchain version is pinned in rust-toolchain.toml and installed automatically.

git clone https://github.com/miguelgila/reaper
cd reaper
cargo build --release

Binaries are output to target/release/.

Cross-Compilation (macOS to Linux)

Since Reaper runs on Linux Kubernetes nodes, cross-compile static musl binaries:

# For x86_64 nodes
docker run --rm -v "$(pwd)":/work -w /work \
  messense/rust-musl-cross:x86_64-musl \
  cargo build --release --target x86_64-unknown-linux-musl

# For aarch64 nodes
docker run --rm -v "$(pwd)":/work -w /work \
  messense/rust-musl-cross:aarch64-musl \
  cargo build --release --target aarch64-unknown-linux-musl

Requirements

Runtime (cluster nodes):

  • Linux kernel with overlayfs support (standard since 3.18)
  • Kubernetes cluster with containerd runtime
  • Root access on cluster nodes

Playground:

Building from source:

  • All of the above, plus Rust

Quick Start

This guide assumes you have a Reaper-enabled cluster (see Installation).

Run a Command on the Host

The simplest way to run a command on the host is with a ReaperPod:

apiVersion: reaper.io/v1alpha1
kind: ReaperPod
metadata:
  name: my-task
spec:
  command: ["/bin/sh", "-c", "echo Hello from $(hostname) && uname -a"]
kubectl apply -f my-task.yaml
kubectl logs my-task
kubectl get reaperpods

With Volumes

apiVersion: reaper.io/v1alpha1
kind: ReaperPod
metadata:
  name: config-reader
spec:
  command: ["/bin/sh", "-c", "cat /config/settings.yaml"]
  volumes:
    - name: config
      mountPath: /config
      configMap: "my-config"
      readOnly: true

With Node Selector

apiVersion: reaper.io/v1alpha1
kind: ReaperPod
metadata:
  name: compute-task
spec:
  command: ["/bin/sh", "-c", "echo Running on $(hostname)"]
  nodeSelector:
    workload-type: compute

See ReaperPod CRD Reference for the full spec.


Using Raw Pods

For use cases that need the full Kubernetes Pod API — interactive sessions, DaemonSets, Deployments, exec, etc. — you can use standard Pods with runtimeClassName: reaper-v2.

Note: The image field is required by Kubernetes but ignored by Reaper. Use a small image like busybox.

Run a Command

apiVersion: v1
kind: Pod
metadata:
  name: my-task
spec:
  runtimeClassName: reaper-v2
  restartPolicy: Never
  containers:
    - name: task
      image: busybox
      command: ["/bin/sh", "-c"]
      args: ["echo Hello from host && uname -a"]
kubectl apply -f my-task.yaml
kubectl logs my-task        # See output
kubectl get pod my-task     # Status: Completed

Interactive Shell

kubectl run -it debug --rm --image=busybox --restart=Never \
  --overrides='{"spec":{"runtimeClassName":"reaper-v2"}}' \
  -- /bin/bash

Exec into Running Containers

kubectl exec -it my-pod -- /bin/sh

Volumes

ConfigMaps, Secrets, hostPath, emptyDir, and projected volumes all work:

apiVersion: v1
kind: Pod
metadata:
  name: my-task
spec:
  runtimeClassName: reaper-v2
  restartPolicy: Never
  volumes:
    - name: config
      configMap:
        name: my-config
  containers:
    - name: task
      image: busybox
      command: ["/bin/sh", "-c", "cat /config/settings.yaml"]
      volumeMounts:
        - name: config
          mountPath: /config
          readOnly: true

See Pod Compatibility for the full list of supported and ignored fields.

What’s Next

Reaper provides Custom Resource Definitions for higher-level workflows:

  • ReaperPod — Simplified pod spec without container boilerplate
  • ReaperOverlay — PVC-like overlay lifecycle management
  • ReaperDaemonJob — Run jobs to completion on every matching node, with dependency ordering and shared overlays

See the CRD Reference for full documentation and the examples for runnable demos.

Architecture Overview

Reaper consists of three components arranged in a three-tier system:

Kubernetes/containerd
        ↓ (ttrpc)
containerd-shim-reaper-v2  (long-lived shim, implements Task trait)
        ↓ (exec: create/start/state/delete/kill)
reaper-runtime  (short-lived OCI runtime CLI)
        ↓ (fork FIRST, then spawn)
monitoring daemon → spawns workload → wait() → captures exit code

Components

containerd-shim-reaper-v2

The shim is a long-lived process (one per container) that communicates with containerd via ttrpc. It implements the containerd Task service interface and delegates OCI operations to the runtime binary.

reaper-runtime

The runtime is a short-lived CLI tool called by the shim for OCI operations (create, start, state, kill, delete). It implements the fork-first architecture for process monitoring.

Monitoring Daemon

The daemon is forked by the runtime during start. It spawns the workload as its child, calls wait() to capture the real exit code, and updates the state file.

Fork-First Architecture

This is the most critical design decision in Reaper:

  1. Runtime forks → creates monitoring daemon
  2. Parent (CLI) exits immediately (OCI spec requires this)
  3. Daemon calls setsid() to detach
  4. Daemon spawns workload (daemon becomes parent)
  5. Daemon calls wait() on workload → captures real exit code
  6. Daemon updates state file, then exits

Why fork-first? Only a process’s parent can call wait() on it. The daemon must be the workload’s parent to capture exit codes. Spawning first, then forking, would leave the Child handle invalid in the forked process.

State Management

Process lifecycle state is stored in /run/reaper/<container-id>/state.json:

{
  "id": "abc123...",
  "bundle": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/abc123...",
  "status": "stopped",
  "pid": 12345,
  "exit_code": 0
}

The shim polls this file to detect state changes and publishes containerd events (e.g., TaskExit).

Further Reading

Shim v2 Protocol

Shim v2 Implementation Design

Overview

This document outlines the implementation plan for containerd Runtime v2 API (shim protocol) support in Reaper, enabling Kubernetes integration for command execution on the host system.

Important Clarification: Reaper does not create traditional containers. Instead, it executes commands directly on the Kubernetes cluster nodes, providing a lightweight alternative to full containerization for specific use cases.

Background

What is the Shim v2 Protocol?

The containerd Runtime v2 API is the interface between:

  • containerd (Kubernetes container runtime)
  • Container runtime shim (our code)
  • Command executor (reaper-runtime running commands on host)
Kubernetes → CRI → containerd → [Shim v2 API] → reaper-shim → host command execution

Why Do We Need It?

Without shim v2:

  • ❌ Kubernetes can’t execute commands via reaper
  • ❌ No process lifecycle management
  • ❌ No command output streaming

With shim v2:

  • ✅ Kubernetes can run/start/stop commands
  • ✅ Stream command output and exec into running processes
  • ✅ Monitor command execution status
  • ✅ Full process lifecycle support

Architecture

Three-Tier Design (Implemented)

containerd-shim-reaper-v2    ← Shim binary (ttrpc server, long-lived)
    ↓ (subprocess calls)
reaper-runtime               ← OCI runtime CLI (create/start/state/kill/delete)
    ↓ (fork)
monitoring daemon            ← Spawns and monitors workload
    ↓ (spawn)
workload process             ← The actual command being run

Key Points:

  • Shim is long-lived (one per container, communicates with containerd via ttrpc)
  • Runtime is short-lived CLI (called by shim for OCI operations)
  • Monitoring daemon is forked by runtime to watch workload
  • Workload is spawned BY the daemon (daemon is parent)

Why Fork-First Architecture?

The Problem:

  • OCI spec requires runtime CLI to exit immediately after start
  • Someone needs to wait() on the workload to capture exit code
  • Only a process’s parent can call wait() on it

Previous Bug (FIXED): We originally spawned the workload first, then forked. After fork(), the std::process::Child handle was invalid in the forked child because it was created by the parent process.

Solution: Fork FIRST, then spawn

  1. Runtime forks → creates monitoring daemon
  2. Parent (CLI) exits immediately
  3. Daemon spawns workload (daemon becomes parent)
  4. Daemon can now wait() on workload

Shim v2 API Implementation

Task Service Methods

service Task {
    rpc Create(CreateTaskRequest) returns (CreateTaskResponse);
    rpc Start(StartTaskRequest) returns (StartTaskResponse);
    rpc Delete(DeleteTaskRequest) returns (DeleteTaskResponse);
    rpc Pids(PidsRequest) returns (PidsResponse);
    rpc Pause(PauseRequest) returns (google.protobuf.Empty);
    rpc Resume(ResumeRequest) returns (google.protobuf.Empty);
    rpc Checkpoint(CheckpointTaskRequest) returns (google.protobuf.Empty);
    rpc Kill(KillRequest) returns (google.protobuf.Empty);
    rpc Exec(ExecProcessRequest) returns (google.protobuf.Empty);
    rpc ResizePty(ResizePtyRequest) returns (google.protobuf.Empty);
    rpc CloseIO(CloseIORequest) returns (google.protobuf.Empty);
    rpc Update(UpdateTaskRequest) returns (google.protobuf.Empty);
    rpc Wait(WaitRequest) returns (WaitResponse);
    rpc Stats(StatsRequest) returns (StatsResponse);
    rpc Connect(ConnectRequest) returns (ConnectResponse);
    rpc Shutdown(ShutdownRequest) returns (google.protobuf.Empty);
}

Implementation Status

MethodStatusNotes
CreateCalls reaper-runtime create, handles sandbox detection
StartCalls reaper-runtime start, fork-first architecture
DeleteCalls reaper-runtime delete, cleans up state
KillCalls reaper-runtime kill, handles ESRCH gracefully
WaitPolls state file, publishes TaskExit event
StateCalls reaper-runtime state, returns proper protobuf status
PidsReturns workload PID from state
StatsBasic implementation (no cgroup metrics)
ConnectReturns shim and workload PIDs
ShutdownTriggers shim exit
Pause/Resume⚠️Returns OK but no-op (no cgroup freezer)
Checkpoint⚠️Not implemented (no CRIU)
ExecImplemented with PTY support
ResizePtyShim writes dimensions to resize file, runtime daemon applies via TIOCSWINSZ
CloseIO⚠️Not implemented
Update⚠️Not implemented (no cgroups)

Implementation Milestones

✅ Milestone 1: Project Setup - COMPLETED

  • Add dependencies: containerd-shim, containerd-shim-protos, tokio, async-trait
  • Generate protobuf code from containerd definitions (via containerd-shim-protos)
  • Create containerd-shim-reaper-v2 binary crate
  • Set up basic TTRPC server with Shim and Task traits

✅ Milestone 2: Core Task API - COMPLETED

  • Implement Create - parse bundle, call reaper-runtime create
  • Implement Start - call reaper-runtime start, capture PID
  • Implement Delete - call reaper-runtime delete, cleanup state
  • Implement Kill - call reaper-runtime kill with signal
  • Implement Wait - poll state file for completion
  • Implement State - return container status with proper protobuf enums
  • Implement Pids - list container processes

✅ Milestone 3: Process Monitoring - COMPLETED

  • Fork-first architecture in reaper-runtime
  • Monitoring daemon as parent of workload
  • Real exit code capture via child.wait()
  • State file updates from monitoring daemon
  • Zombie process prevention (proper reaping)
  • Shim polling of state file for completion detection

✅ Milestone 4: Containerd Integration - COMPLETED

  • TaskExit event publishing with timestamps
  • Proper exited_at timestamps in WaitResponse
  • Proper exited_at timestamps in StateResponse
  • ESRCH handling in kill (already-exited processes)
  • Sandbox container detection and faking
  • Timing delay for fast processes

✅ Milestone 5: Kubernetes Integration - COMPLETED

  • RuntimeClass configuration
  • End-to-end pod lifecycle testing
  • Pod status transitions to “Completed”
  • Exit code capture and reporting
  • No zombie processes
  • PTY support for interactive containers
  • Exec implementation with PTY support
  • File descriptor leak fix
  • Overlay namespace improvements

Critical Bug Fixes (January 2026)

1. Fork Order Bug

File: src/bin/reaper-runtime/main.rs:188-311

Problem: std::process::Child handle invalid after fork

Fix: Fork first, then spawn workload in the forked child

#![allow(unused)]
fn main() {
match unsafe { fork() }? {
    ForkResult::Parent { .. } => {
        // CLI exits, daemon will update state
        sleep(100ms);
        exit(0);
    }
    ForkResult::Child => {
        setsid();  // Detach
        let child = Command::new(program).spawn()?;  // We're the parent!
        update_state("running", child.id());
        sleep(500ms);  // Let containerd observe "running"
        child.wait()?;  // This works!
        update_state("stopped", exit_code);
        exit(0);
    }
}
}

2. Fast Process Timing

File: src/bin/reaper-runtime/main.rs:264-270

Problem: Fast commands (echo) completed before containerd observed “running” state

Fix: Added 500ms delay after setting “running” state

3. Kill ESRCH Error

File: src/bin/reaper-runtime/main.rs:347-365

Problem: containerd’s kill() failed with ESRCH for already-dead processes

Fix: Treat ESRCH as success (process not running = goal achieved)

4. TaskExit Event Publishing

File: src/bin/containerd-shim-reaper-v2/main.rs:162-199

Problem: containerd wasn’t recognizing container exits

Fix: Publish TaskExit event with proper exited_at timestamp

5. Response Timestamps

File: src/bin/containerd-shim-reaper-v2/main.rs:545-552, 615-625

Problem: Missing timestamps in WaitResponse and StateResponse

Fix: Include exited_at timestamp in all responses for stopped containers

Technical Details

ReaperShim Structure

#![allow(unused)]
fn main() {
#[derive(Clone)]
struct ReaperShim {
    exit: Arc<ExitSignal>,
    runtime_path: String,
    namespace: String,
}
}

ReaperTask Structure

#![allow(unused)]
fn main() {
#[derive(Clone)]
struct ReaperTask {
    runtime_path: String,
    sandbox_state: Arc<Mutex<HashMap<String, (bool, u32)>>>,
    publisher: Arc<RemotePublisher>,
    namespace: String,
}
}

State File Format

{
  "id": "abc123...",
  "bundle": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/abc123...",
  "status": "stopped",
  "pid": 12345,
  "exit_code": 0
}

Sandbox Container Detection

Sandbox (pause) containers are detected by checking:

  1. Image name contains “pause”
  2. Command is /pause
  3. Process args contain “pause”

Sandboxes return fake responses immediately (no actual process).

Dependencies

Cargo Dependencies

[dependencies]
containerd-shim = { version = "0.10", features = ["async", "tracing"] }
containerd-shim-protos = { version = "0.10", features = ["async"] }
tokio = { version = "1", features = ["full"] }
async-trait = "0.1"
tracing = "0.1"
tracing-subscriber = "0.3"

Testing

Run Integration Tests

./scripts/run-integration-tests.sh

This orchestrates all testing including Rust unit tests, Kubernetes infrastructure setup, and comprehensive integration tests (DNS, overlay, host protection, UID/GID switching, privilege dropping, zombies, exec, etc.).

For options and troubleshooting, see TESTING.md.

Security Features

UID/GID Switching and Privilege Dropping

Implemented: February 2026

The runtime supports OCI user specification for credential switching, allowing workloads to run as non-root users. This integrates with Kubernetes securityContext:

spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 1000
    fsGroup: 1000
  containers:
  - name: app
    securityContext:
      runAsUser: 1001

Implementation

File: src/bin/reaper-runtime/main.rs

Privilege dropping follows the standard Unix sequence in pre_exec hooks:

#![allow(unused)]
fn main() {
// 1. Set supplementary groups (requires CAP_SETGID)
if !user.additional_gids.is_empty() {
    let gids: Vec<gid_t> = user.additional_gids.iter().map(|&g| g).collect();
    safe_setgroups(&gids)?;
}

// 2. Set GID (requires CAP_SETGID)
if setgid(user.gid) != 0 {
    return Err(std::io::Error::last_os_error());
}

// 3. Set UID (irreversible privilege drop)
if setuid(user.uid) != 0 {
    return Err(std::io::Error::last_os_error());
}

// 4. Apply umask (if specified)
if let Some(mask) = user.umask {
    umask(mask as mode_t);
}
}

Platform Compatibility: The setgroups() syscall signature differs across platforms. We provide a platform-specific wrapper:

  • Linux: size_t (usize) for length parameter
  • macOS/BSD: c_int (i32) for length parameter

Execution Paths

User switching is implemented in all four execution paths:

  1. PTY mode (interactive containers): do_start() with terminal=true
  2. Non-PTY mode (batch containers): do_start() with terminal=false
  3. Exec with PTY (kubectl exec -it): do_exec() with terminal=true
  4. Exec without PTY (kubectl exec): do_exec() with terminal=false

Integration Tests

Unit Tests (tests/integration_user_management.rs):

  • test_run_with_current_user - Validates UID/GID from config
  • test_privilege_drop_root_to_user - Tests root → non-root transition
  • test_non_root_cannot_switch_user - Permission denial for non-root
  • test_supplementary_groups_validation - additionalGids support
  • test_umask_affects_file_permissions - umask application

Kubernetes Integration Tests (scripts/run-integration-tests.sh):

  • test_uid_gid_switching - securityContext UID/GID (runAsUser: 1000)
  • test_privilege_drop - Unprivileged execution (runAsUser: 1001)

All tests validate actual runtime credentials (not just config parsing) via id -u and id -g commands in the container.

Resources


Document Version: 2.1 Last Updated: February 2026 Status: Core Implementation Complete with Exec and PTY Support

Overlay Filesystem

Overlay Filesystem Design

Overview

Reaper uses a shared mount namespace with an overlayfs to protect the host filesystem while allowing cross-deployment file sharing. All workloads on a node share a single writable overlay layer; the host root is the read-only lower layer.

How It Works

Host Root (/) ─── read-only lower layer
                      │
              ┌───────┴────────┐
              │   OverlayFS    │
              │  merged view   │
              └───────┬────────┘
                      │
    /run/reaper/overlay/upper ─── shared writable layer
  • Reads fall through to the host root (lower layer)
  • Writes go to the upper layer (/run/reaper/overlay/upper)
  • All Reaper workloads see the same upper layer
  • The host filesystem is never modified

Architecture

Namespace Creation (First Workload)

The first workload to start creates the shared namespace:

reaper-runtime do_start()
  └─ fork() (daemon child)
       └─ setsid()
       └─ enter_overlay()
            └─ acquire_lock(/run/reaper/overlay.lock)
            └─ create_namespace()
                 └─ fork() (inner child - helper)
                 │    ├─ unshare(CLONE_NEWNS)
                 │    ├─ mount("", "/", MS_PRIVATE | MS_REC)
                 │    ├─ mount overlay on /run/reaper/merged
                 │    ├─ bind-mount /proc, /sys, /dev, /run
                 │    ├─ bind-mount /etc → /run/reaper/merged/etc
                 │    ├─ pivot_root(/run/reaper/merged, .../old_root)
                 │    ├─ umount(/old_root, MNT_DETACH)
                 │    └─ signal parent "ready", sleep forever (kept alive)
                 │
                 └─ inner parent (host ns):
                      ├─ wait for "ready"
                      ├─ bind-mount /proc/<child>/ns/mnt → /run/reaper/shared-mnt-ns
                      ├─ keep child alive (helper persists namespace)
                      └─ setns(shared-mnt-ns)  # join the namespace

Namespace Joining (Subsequent Workloads)

reaper-runtime do_start()
  └─ enter_overlay()
       └─ acquire_lock()
       └─ namespace_exists(/run/reaper/shared-mnt-ns) → true
       └─ join_namespace()
            └─ setns(fd, CLONE_NEWNS)

Why Inner Fork?

The bind-mount of /proc/<pid>/ns/mnt to a host path must be done from the HOST mount namespace. After unshare(CLONE_NEWNS), the process is in the new namespace and bind-mounts don’t propagate to the host. The inner parent stays in the host namespace to perform this operation.

Why Keep Helper Alive?

The helper process (inner child) is kept alive to persist the namespace. While the bind-mount of /proc/<pid>/ns/mnt keeps the namespace reference, keeping the helper alive ensures /etc files and other bind-mounts remain accessible. The helper sleeps indefinitely until explicitly terminated.

Why pivot_root?

Mounting overlay directly on / hides all existing submounts (/proc, /sys, /dev). With pivot_root, we mount overlay on a new point, bind-mount special filesystems into it, then switch root. This preserves real host /proc, /sys, and /dev.

Configuration

VariableDefaultDescription
REAPER_OVERLAY_BASE/run/reaper/overlayBase dir for upper/work layers

Overlay is always enabled on Linux. There is no option to disable it — workloads must not modify the host filesystem.

Bind-Mounted Directories

Only kernel-backed special filesystems and /run are bind-mounted from the host into the overlay:

  • /proc — process information (kernel-backed)
  • /sys — kernel/device information (kernel-backed)
  • /dev — device nodes (kernel-backed)
  • /run — runtime state (needed for daemon↔shim communication via state files)

/tmp is NOT bind-mounted — writes to /tmp go through the overlay upper layer, protecting the host’s /tmp from modification.

Directory Structure

/run/reaper/
├── overlay/
│   ├── upper/        # shared writable layer
│   └── work/         # overlayfs internal
├── merged/           # pivot_root target (temporary during setup)
├── shared-mnt-ns     # bind-mounted namespace reference
├── overlay.lock      # file lock for namespace creation
└── <container-id>/   # per-container state (existing)

Lifecycle

  1. Boot: /run is tmpfs, starts empty (ephemeral by design)
  2. First workload: Creates overlay dirs, namespace, and overlay mount
  3. Subsequent workloads: Join existing namespace via setns()
  4. Reboot: Everything under /run is cleared; fresh start

Mandatory Isolation

Overlay is mandatory on Linux. If overlay setup fails (e.g., not running as root, kernel lacks overlay support), the workload is refused — it will not run on the host filesystem. The daemon exits with code 1 and updates the container state to stopped.

Requirements

  • Linux kernel with overlayfs support (standard since 3.18)
  • CAP_SYS_ADMIN (required for unshare, setns, mount, pivot_root)
  • Reaper runtime runs as root on the node (standard for container runtimes)
  • Not available on macOS (code gated with #[cfg(target_os = "linux")])

Sensitive File Filtering

Reaper automatically filters sensitive host files to prevent workloads from accessing credentials, SSH keys, and other sensitive data. Filtering is implemented by bind-mounting empty placeholders over sensitive paths after pivot_root.

Default Filtered Paths

  • /root/.ssh - root user SSH keys
  • /etc/shadow, /etc/gshadow - password hashes
  • /etc/ssh/ssh_host_*_key - SSH host private keys
  • /etc/ssl/private - SSL/TLS private keys
  • /etc/sudoers, /etc/sudoers.d - sudo configuration
  • /var/lib/docker - Docker internal state
  • /run/secrets - container secrets

Configuration

VariableDefaultDescription
REAPER_FILTER_ENABLEDtrueEnable/disable filtering
REAPER_FILTER_PATHS""Colon-separated custom paths
REAPER_FILTER_MODEappendappend or replace
REAPER_FILTER_ALLOWLIST""Paths to exclude from filtering
REAPER_FILTER_DIR/run/reaper/overlay-filtersPlaceholder directory

Example: Add custom paths while keeping defaults:

REAPER_FILTER_PATHS="/custom/secret:/home/user/.aws/credentials"

Example: Replace default list entirely:

REAPER_FILTER_MODE=replace
REAPER_FILTER_PATHS="/etc/shadow:/etc/gshadow"

Example: Disable a specific default filter:

REAPER_FILTER_ALLOWLIST="/etc/shadow"

Security Guarantees

  • Filters are immutable (workloads cannot unmount them)
  • Applied once during namespace creation
  • Inherited by all workloads joining the namespace
  • Non-existent paths are silently skipped
  • Individual filter failures are logged but non-fatal

How It Works

After pivot_root completes in the shared namespace:

  1. Read filter configuration from environment variables
  2. Build filter list (defaults + custom, minus allowlist)
  3. Create empty placeholder files/directories in /run/reaper/overlay-filters/
  4. For each sensitive path:
    • If path exists, create matching placeholder (file or directory)
    • Bind-mount placeholder over the sensitive path
    • Log success/failure

This makes sensitive files appear empty or missing to workloads, while the actual host files remain untouched.

Namespace Isolation

By default (REAPER_OVERLAY_ISOLATION=namespace), each Kubernetes namespace gets its own isolated overlay. This means workloads in production cannot see writes from workloads in dev, matching Kubernetes’ namespace-as-trust-boundary expectation.

How It Works

  1. The containerd shim reads io.kubernetes.pod.namespace from OCI annotations
  2. It passes --namespace <ns> to reaper-runtime create
  3. The runtime stores the namespace in ContainerState (state.json)
  4. On start and exec, the runtime reads the namespace from state and computes per-namespace paths for overlay dirs, mount namespace, and lock

Per-Namespace Path Layout

/run/reaper/
  overlay/
    default/upper/          # K8s "default" namespace
    default/work/
    kube-system/upper/      # K8s "kube-system" namespace
    kube-system/work/
  merged/
    default/                # pivot_root target per namespace
    kube-system/
  ns/
    default                 # persisted mount namespace bind-mount
    kube-system
  overlay-default.lock      # per-namespace flock
  overlay-kube-system.lock

Legacy Node-Wide Mode

Set REAPER_OVERLAY_ISOLATION=node to use the old flat layout where all workloads share a single overlay regardless of their K8s namespace. This is useful for cross-deployment file sharing or backward compatibility.

Upgrade Path

Existing containers created before the upgrade have namespace: None in their state files. With the default namespace isolation mode, their start will fail. Drain nodes before upgrading to ensure no in-flight containers are affected.

Limitations

  • /run is typically a small tmpfs; for write-heavy workloads, configure REAPER_OVERLAY_BASE to point to a larger filesystem
  • Within a single K8s namespace, workloads still share the same overlay (no per-pod isolation)
  • Overlay does not protect against processes that directly modify kernel state via /proc or /sys writes
  • Sensitive file filtering does not support glob patterns (use explicit paths)

Node Configuration

Configuration

Reaper is configured through a combination of node-level configuration files, environment variables, and per-pod Kubernetes annotations.

Node Configuration

Reaper reads configuration from /etc/reaper/reaper.conf on each node. The Helm chart creates this file automatically via the node DaemonSet init container.

Config File Format

# /etc/reaper/reaper.conf (KEY=VALUE, one per line)
REAPER_DNS_MODE=kubernetes
REAPER_RUNTIME_LOG=/run/reaper/runtime.log
REAPER_OVERLAY_BASE=/run/reaper/overlay
REAPER_OVERLAY_ISOLATION=namespace

Load Order

  1. Config file defaults (/etc/reaper/reaper.conf)
  2. Environment variables override file values

Settings Reference

VariableDefaultDescription
REAPER_CONFIG/etc/reaper/reaper.confOverride config file path
REAPER_DNS_MODEhostDNS resolution: host (node’s resolv.conf) or kubernetes/k8s (CoreDNS)
REAPER_OVERLAY_ISOLATIONnamespaceOverlay isolation: namespace (per-K8s-namespace) or node (shared)
REAPER_OVERLAY_BASE/run/reaper/overlayBase directory for overlay upper/work layers
REAPER_RUNTIME_LOG(none)Runtime log file path
REAPER_SHIM_LOG(none)Shim log file path
REAPER_ANNOTATIONS_ENABLEDtrueMaster switch for pod annotation overrides
REAPER_FILTER_ENABLEDtrueEnable sensitive file filtering in overlay
REAPER_FILTER_PATHS(none)Additional colon-separated paths to filter
REAPER_FILTER_MODEappendFilter mode: append (add to defaults) or replace
REAPER_FILTER_ALLOWLIST(none)Paths to exclude from filtering

Pod Annotations

Users can override certain Reaper configuration parameters per-pod using Kubernetes annotations with the reaper.runtime/ prefix.

Supported Annotations

AnnotationValuesDefaultDescription
reaper.runtime/dns-modehost, kubernetes, k8sNode config (REAPER_DNS_MODE)DNS resolution mode for this pod
reaper.runtime/overlay-nameDNS label (e.g., pippo)(none — uses namespace overlay)Named overlay group within the namespace

Example

apiVersion: v1
kind: Pod
metadata:
  name: my-task
  annotations:
    reaper.runtime/dns-mode: "kubernetes"
    reaper.runtime/overlay-name: "my-group"
spec:
  runtimeClassName: reaper-v2
  restartPolicy: Never
  containers:
    - name: task
      image: busybox
      command: ["/bin/sh", "-c", "nslookup kubernetes.default"]

Security Model

  • Only annotations in the allowlist above are honored. Unknown annotation keys are silently ignored.
  • Administrator-controlled parameters (overlay paths, filter settings, isolation mode) cannot be overridden via annotations.
  • Administrators can disable all annotation processing: REAPER_ANNOTATIONS_ENABLED=false

How It Works

  1. The shim extracts reaper.runtime/* annotations from the OCI config (populated by kubelet from pod metadata).
  2. Annotations are stored in the container state during create.
  3. During start, annotations are validated against the allowlist and applied. Invalid values are logged and ignored.
  4. If no annotation is set, the node-level configuration is used as the default.

Helm Chart Values

The Helm chart (deploy/helm/reaper/) configures most settings automatically. Key values:

# Node configuration written to /etc/reaper/reaper.conf
config:
  dnsMode: kubernetes
  runtimeLog: /run/reaper/runtime.log

# Image settings (tag defaults to Chart.AppVersion)
node:
  image:
    repository: ghcr.io/miguelgila/reaper-node
    tag: ""
controller:
  image:
    repository: ghcr.io/miguelgila/reaper-controller
    tag: ""
agent:
  enabled: true
  image:
    repository: ghcr.io/miguelgila/reaper-agent
    tag: ""

See deploy/helm/reaper/values.yaml for the full reference.

Pod Annotations

metadata: name: my-task annotations: reaper.runtime/dns-mode: “kubernetes” reaper.runtime/overlay-name: “my-group” spec: runtimeClassName: reaper-v2 restartPolicy: Never containers: - name: task image: busybox command: [“/bin/sh”, “-c”, “nslookup kubernetes.default”]


### Security Model

- Only annotations in the allowlist above are honored. Unknown annotation keys are silently ignored.
- Administrator-controlled parameters (overlay paths, filter settings, isolation mode) **cannot** be overridden via annotations.
- Administrators can disable all annotation processing: `REAPER_ANNOTATIONS_ENABLED=false`

### How It Works

1. The shim extracts `reaper.runtime/*` annotations from the OCI config (populated by kubelet from pod metadata).
2. Annotations are stored in the container state during `create`.
3. During `start`, annotations are validated against the allowlist and applied. Invalid values are logged and ignored.
4. If no annotation is set, the node-level configuration is used as the default.

## Helm Chart Values

The Helm chart (`deploy/helm/reaper/`) configures most settings automatically. Key values:

```yaml
# Node configuration written to /etc/reaper/reaper.conf
config:
  dnsMode: kubernetes
  runtimeLog: /run/reaper/runtime.log

# Image settings (tag defaults to Chart.AppVersion)
node:
  image:
    repository: ghcr.io/miguelgila/reaper-node
    tag: ""
controller:
  image:
    repository: ghcr.io/miguelgila/reaper-controller
    tag: ""
agent:
  enabled: true
  image:
    repository: ghcr.io/miguelgila/reaper-agent
    tag: ""

See deploy/helm/reaper/values.yaml for the full reference.

Pod Field Compatibility

Pod Field Compatibility

Reaper implements the Kubernetes Pod API but ignores or doesn’t support certain container-specific fields since it runs processes directly on the host without traditional container isolation.

Field Reference

Pod FieldBehavior
spec.containers[].imageIgnored by Reaper — Kubelet pulls the image before the runtime runs, so a valid image is required. Use a lightweight image like busybox. Reaper does not use it.
spec.containers[].resources.limitsIgnored — No cgroup enforcement; processes use host resources.
spec.containers[].resources.requestsIgnored — Scheduling hints not used.
spec.containers[].volumeMountsSupported — Bind mounts for ConfigMap, Secret, hostPath, emptyDir.
spec.containers[].securityContext.capabilitiesIgnored — Processes run with host-level capabilities.
spec.containers[].livenessProbeIgnored — No health checking.
spec.containers[].readinessProbeIgnored — No readiness checks.
spec.containers[].commandSupported — Program path on host (must exist).
spec.containers[].argsSupported — Arguments to the command.
spec.containers[].envSupported — Environment variables.
spec.containers[].workingDirSupported — Working directory for the process.
spec.runtimeClassNameRequired — Must be set to reaper-v2.

Best Practice

Use a small, valid image like busybox. Kubelet pulls the image before handing off to the runtime, so the image must exist in a registry. Reaper itself ignores the image entirely — it runs the command directly on the host.

Supported Features Summary

FeatureStatus
command / argsSupported
env / envFromSupported
volumeMounts (ConfigMap, Secret, hostPath, emptyDir)Supported
workingDirSupported
securityContext.runAsUser / runAsGroupSupported
restartPolicySupported (by kubelet)
runtimeClassNameRequired (reaper-v2)
Resource limits/requestsIgnored
Probes (liveness, readiness, startup)Ignored
CapabilitiesIgnored
Image pullingHandled by kubelet, ignored by Reaper

Development Guide

Development Guide

This document contains information for developers working on the Reaper project.

Table of Contents

Development Setup

Prerequisites

  • Rust toolchain (we pin stable via rust-toolchain.toml)
  • Docker (optional, for Linux-specific testing on macOS)
  • Ansible (for deploying to clusters)

Clone and Build

git clone https://github.com/miguelgila/reaper
cd reaper
cargo build

The repository includes rust-toolchain.toml which automatically pins the Rust toolchain version and enables rustfmt and clippy components.

Building

Local Build (Debug)

cargo build

Binaries are output to target/debug/.

Release Build

cargo build --release

Binaries are output to target/release/.

Static Musl Build (for Kubernetes deployment)

For deployment to Kubernetes clusters, we build static musl binaries:

# Install musl target (one-time setup)
rustup target add x86_64-unknown-linux-musl

# Build static binary
docker run --rm \
  -v "$(pwd)":/work \
  -w /work \
  messense/rust-musl-cross:x86_64-musl \
  cargo build --release --target x86_64-unknown-linux-musl

This produces binaries at target/x86_64-unknown-linux-musl/release/ that work in containerized environments (like Kind nodes).

For aarch64:

rustup target add aarch64-unknown-linux-musl

docker run --rm \
  -v "$(pwd)":/work \
  -w /work \
  messense/rust-musl-cross:aarch64-musl \
  cargo build --release --target aarch64-unknown-linux-musl

Testing

See TESTING.md for comprehensive testing documentation.

Quick Reference

# Unit tests (fast, recommended for local development)
cargo test

# Full integration tests (Kubernetes + unit tests)
./scripts/run-integration-tests.sh

# Integration tests (K8s only, skip cargo tests)
./scripts/run-integration-tests.sh --skip-cargo

# Coverage report (requires Docker)
./scripts/docker-coverage.sh

Test Modules

  • tests/integration_basic_binary.rs - Basic runtime functionality (create/start/state/delete)
  • tests/integration_user_management.rs - User/group ID handling, umask
  • tests/integration_shim.rs - Shim-specific tests
  • tests/integration_io.rs - FIFO stdout/stderr redirection
  • tests/integration_exec.rs - Exec into running containers
  • tests/integration_overlay.rs - Overlay filesystem tests

Run a specific test suite:

cargo test --test integration_basic_binary

Code Quality

Formatting

Format all code before committing:

cargo fmt --all

Check formatting without making changes:

cargo fmt --all -- --check

Linting

Run clippy to catch common mistakes and improve code quality:

# Quick check
cargo clippy --all-targets --all-features

# Match CI exactly (treats warnings as errors)
cargo clippy -- -D warnings

CI runs clippy with -D warnings, so any warning is a hard failure. The pre-push hook runs this automatically if you’ve installed hooks via ./scripts/install-hooks.sh.

Linux Cross-Check (macOS only)

The overlay module (src/bin/reaper-runtime/overlay.rs) is gated by #[cfg(target_os = "linux")] and doesn’t compile on macOS. To catch compilation errors in Linux-only code:

# One-time setup
rustup target add x86_64-unknown-linux-gnu

# Check compilation for Linux target
cargo clippy --target x86_64-unknown-linux-gnu --all-targets --all-features

Git Hooks

We provide git hooks in .githooks/ to catch issues before they reach CI.

Enable Hooks

./scripts/install-hooks.sh

This sets core.hooksPath to .githooks/ and marks the hooks executable. Since the hooks are checked into the repo, every contributor gets the same setup.

Available Hooks

HookRunsPurpose
pre-commitcargo fmt --allAuto-formats code and stages changes before each commit
pre-pushcargo clippy -- -D warningsCatches lint issues before pushing (matches CI)

The pre-push hook mirrors the exact clippy invocation used in CI, so pushes that pass locally will pass the CI clippy check too.

Customization

  • pre-commit: To fail on unformatted code instead of auto-fixing, change cargo fmt --all to cargo fmt --all -- --check and remove the re-staging logic.
  • pre-push: To skip clippy for a one-off push, use git push --no-verify.

Docker (Optional)

Docker is not required for local development on macOS. Prefer cargo test locally for speed.

Use Docker when you need:

  • Code coverage via cargo-tarpaulin (Linux-first tool)
  • CI failure reproduction specific to Linux
  • Static musl binary builds for Kubernetes

Run Coverage in Docker

./scripts/docker-coverage.sh

This runs cargo-tarpaulin in a Linux container with appropriate capabilities.

VS Code Setup

  • rust-analyzer — Main Rust language support
  • CodeLLDB (vadimcn.vscode-lldb) — Debug adapter for Rust
  • Test Explorer UI — Unified test UI

Configure rust-analyzer to run clippy on save and enable CodeLens for inline run/debug buttons.

CI/CD

GitHub Actions workflows run on pushes and pull requests to main:

CI Workflow (ci.yml)

A single unified pipeline that runs:

  • cargo fmt -- --check (formatting)
  • cargo clippy --workspace --all-targets -- -D warnings (linting)
  • cargo audit (dependency vulnerability scan)
  • cargo test --verbose (unit tests)
  • cargo tarpaulin → Codecov upload (coverage)
  • Cross-compile static musl binaries (all binaries)
  • Kind integration tests (run-integration-tests.sh --skip-cargo)
  • Example validation (test-examples.sh --skip-cluster)

Coverage

Local Coverage (Linux)

If running on Linux, you can use tarpaulin directly:

cargo install cargo-tarpaulin
cargo tarpaulin --out Xml --timeout 600

Coverage via Docker (macOS/Windows)

Run the included Docker script:

./scripts/docker-coverage.sh

Configuration lives in tarpaulin.toml. Functions requiring root + Linux namespaces (tested by kind-integration) are excluded via #[cfg(not(tarpaulin_include))] so coverage reflects what unit tests can actually reach.

Contributing

Before Opening a PR

  1. Format code:

    cargo fmt --all
    
  2. Run linting:

    cargo clippy --all-targets --all-features
    
  3. Run tests:

    cargo test
    
  4. Optional: Run integration tests:

    ./scripts/run-integration-tests.sh
    
  5. Install git hooks (auto-formats on commit, runs clippy before push):

    ./scripts/install-hooks.sh
    

Development Workflow

For fast feedback during development:

# Quick iteration cycle
cargo test              # Unit tests (seconds)
cargo clippy            # Linting

# Before pushing
cargo fmt --all         # Format code
cargo test              # All unit tests
./scripts/run-integration-tests.sh  # Full validation

Integration Test Iteration

If you’re iterating on overlay or shim logic:

# First run (build cluster, binaries, tests)
./scripts/run-integration-tests.sh --no-cleanup

# Make code changes...

# Rebuild and test (skip cargo, reuse cluster)
cargo build --release --bin containerd-shim-reaper-v2 --bin reaper-runtime
./scripts/run-integration-tests.sh --skip-cargo --no-cleanup

# Repeat until satisfied...

# Final cleanup run
./scripts/run-integration-tests.sh --skip-cargo

Project Structure

reaper/
├── src/
│   ├── bin/
│   │   ├── containerd-shim-reaper-v2/  # Shim binary
│   │   │   └── main.rs                 # Shim implementation
│   │   └── reaper-runtime/             # Runtime binary
│   │       ├── main.rs                 # OCI runtime CLI
│   │       ├── state.rs                # State persistence
│   │       └── overlay.rs              # Overlay filesystem (Linux)
├── tests/                              # Integration tests
├── scripts/                            # Installation and testing scripts
├── deploy/
│   ├── ansible/                        # Ansible playbooks for deployment
│   └── kubernetes/                     # Kubernetes manifests
├── docs/                               # Documentation
└── .githooks/                          # Git hooks (pre-commit, pre-push)

Common Tasks

Add a New Binary

  1. Create directory under src/bin/<binary-name>/
  2. Add main.rs in that directory
  3. Add entry to Cargo.toml:
    [[bin]]
    name = "binary-name"
    path = "src/bin/binary-name/main.rs"
    

Add a New Test Suite

  1. Create tests/integration_<name>.rs
  2. Use #[test] or #[tokio::test] for async tests
  3. Run with cargo test --test integration_<name>

Update Dependencies

# Check for outdated dependencies
cargo outdated

# Update to latest compatible versions
cargo update

# Update Cargo.lock and check tests still pass
cargo test

Debug a Test

Use VS Code’s debug launch configurations or run with logging:

RUST_LOG=debug cargo test <test-name> -- --nocapture

Troubleshooting

Clippy Errors on macOS for Linux-only Code

Run clippy with Linux target:

cargo clippy --target x86_64-unknown-linux-gnu --all-targets

Tests Fail with “Permission Denied”

Some tests require root for namespace operations. Run:

sudo cargo test

Or use integration tests which run in Kind (isolated environment):

./scripts/run-integration-tests.sh

Docker Build Fails

Ensure Docker is running:

docker ps

If Docker daemon is not accessible, start Docker Desktop or the Docker daemon.

Integration Tests Timeout

Increase timeout or check cluster resources:

kubectl get nodes
kubectl describe pod <pod-name>

Additional Resources

Testing & Integration

Testing & Integration

This document consolidates all information about running tests, integration tests, and development workflows for the Reaper project.

Quick Reference

All common tasks are available via make. Run make help for the full list.

TaskCommand
Full CI check (recommended before push)make ci
Unit testsmake test
Clippy (macOS)make clippy
Clippy (Linux cross-check)make check-linux
Coverage (Docker, CI-parity)make coverage
Integration tests (full suite)make integration
Integration tests (K8s only, skip cargo)make integration-quick

Unit Tests

Run Rust tests natively on your machine:

cargo test

Tests run in a few seconds and provide immediate feedback. Use this for development iteration.

Test Modules

  • integration_basic_binary - Basic runtime functionality
  • integration_user_management - User/group handling (UID/GID switching, privilege dropping, umask)
  • integration_shim - Shim-specific tests
  • integration_io - FIFO stdout/stderr redirection
  • integration_exec - Exec into running containers
  • integration_overlay - Overlay filesystem tests

Run a specific test:

cargo test --test integration_basic_binary

Integration Tests (Kubernetes)

The main integration test suite runs against a kind (Kubernetes in Docker) cluster. It validates:

  • ✓ DNS resolution in container
  • ✓ Basic command execution (echo)
  • ✓ Overlay filesystem sharing across pods
  • ✓ Host filesystem protection (no leakage to host)
  • ✓ UID/GID switching with securityContext
  • ✓ Privilege drop to non-root user
  • ✓ Shim cleanup after pod deletion
  • ✓ No defunct (zombie) processes
  • kubectl exec support

Runs cargo tests, builds binaries, creates a kind cluster, and runs all integration tests:

./scripts/run-integration-tests.sh

Options:

  • --skip-cargo — Skip Rust unit tests (useful for rapid K8s-only reruns)
  • --no-cleanup — Keep the kind cluster running after tests (for debugging)
  • --verbose — Also print debug output to stdout (in addition to log file)
  • --agent-only — Only run agent tests (skip cargo, integration, and controller tests)
  • --crd-only — Only run CRD controller tests (skip cargo, integration, and agent tests)
  • --help — Show usage

Examples

Rerun K8s tests against an existing cluster:

./scripts/run-integration-tests.sh --skip-cargo --no-cleanup

Run only CRD controller tests (fast iteration on ReaperPod CRD):

./scripts/run-integration-tests.sh --crd-only --no-cleanup

Run only agent tests:

./scripts/run-integration-tests.sh --agent-only --no-cleanup

Keep cluster for interactive debugging:

./scripts/run-integration-tests.sh --no-cleanup

Then interact with the cluster:

kubectl get pods
kubectl logs <pod-name>
kubectl describe pod <pod-name>

Test Output & Logs

  • Console output: Test results with pass/fail badges
  • Log file: /tmp/reaper-integration-logs/integration-test.log (detailed diagnostics)
  • GitHub Actions: Results posted to job summary when run in CI

How It Works

The test harness orchestrates:

  1. Phase 1: Rust cargo tests (integration_* tests)
  2. Phase 2: Kubernetes infrastructure setup
    • Create or reuse kind cluster
    • Build static musl binaries (matches node architecture)
    • Deploy shim and runtime binaries to cluster node
    • Configure containerd with the Reaper runtime
  3. Phase 3: Kubernetes readiness
    • Wait for API server and nodes
    • Create RuntimeClass
    • Wait for default ServiceAccount
  4. Phase 4: Integration tests
    • DNS, echo, overlay, host protection, UID/GID switching, privilege drop, exec, zombie check
  5. Phase 4b: Controller tests (ReaperPod CRD)
    • CRD installation, controller deployment, ReaperPod lifecycle, status mirroring, exit code propagation, annotations, custom printer columns, garbage collection
  6. Phase 5: Summary & reporting

Coverage

Generate code coverage report using Docker:

./scripts/docker-coverage.sh

This runs cargo-tarpaulin (Linux-first tool) in a container with appropriate capabilities.

Containerd Configuration

Configure a containerd instance to use the Reaper shim runtime:

./scripts/configure-containerd.sh <context> <node-id>
  • <context>: kind or minikube (determines config locations)
  • <node-id>: Docker container ID (e.g., from docker ps)

This script is automatically run by run-integration-tests.sh.

Development Workflow

Before Pushing (CI-parity on macOS)

Run the full CI-equivalent check locally:

make ci

This runs, in order: fmt check, clippy (macOS), clippy (Linux cross-check), cargo test, and coverage (Docker + tarpaulin). If this passes, CI will pass.

Quick Iteration

For fast feedback during development:

make test              # Unit tests only (seconds)
make clippy            # macOS clippy
make check-linux       # Catches #[cfg(linux)] compilation issues

Linux Cross-Check

The overlay module (overlay.rs) is gated by #[cfg(target_os = "linux")] and doesn’t compile on macOS. make check-linux cross-checks clippy against the x86_64-unknown-linux-gnu target to catch compilation errors in Linux-only code without leaving macOS.

Requires the target (one-time setup):

rustup target add x86_64-unknown-linux-gnu

Coverage (CI-parity)

Coverage runs tarpaulin inside Docker to match CI exactly:

make coverage

Configuration lives in tarpaulin.toml. Functions requiring root + Linux namespaces (tested by kind-integration) are excluded via #[cfg(not(tarpaulin_include))] so coverage reflects what unit tests can actually reach.

Integration Test Iteration

If you’re iterating on overlay or shim logic:

# First run (build cluster, binaries, tests)
./scripts/run-integration-tests.sh --no-cleanup

# Make code changes...

# Rebuild and test (skip cargo, reuse cluster)
cargo build --release --bin containerd-shim-reaper-v2 --bin reaper-runtime --target x86_64-unknown-linux-musl
./scripts/run-integration-tests.sh --skip-cargo --no-cleanup

# Repeat until satisfied...

# Final cleanup run
./scripts/run-integration-tests.sh --skip-cargo

Troubleshooting

No kind cluster available

The test harness automatically creates one. If it fails, check:

  • Docker is running: docker ps
  • kind is installed: kind --version
  • Sufficient disk space: df -h

Pod stuck in Pending

Check containerd logs on the node:

docker exec <node-id> journalctl -u containerd -n 50 --no-pager

Check Kubelet logs:

docker exec <node-id> journalctl -u kubelet -n 50 --no-pager

Test times out

Increase timeout in test function or check node resources:

docker exec <node-id> top -b -n 1
docker exec <node-id> df -h

RuntimeClass not found

Wait a few seconds after applying the RuntimeClass, as it takes time to propagate.

Directory Structure

reaper/
├── scripts/
│   ├── run-integration-tests.sh      [MAIN] Orchestrates all integration tests
│   ├── install-reaper.sh             Ansible-based installation (DEPRECATED)
│   ├── build-node-image.sh           Build reaper-node installer image for Kind
│   ├── build-controller-image.sh     Build reaper-controller image for Kind
│   ├── install-node.sh               Init container script for node DaemonSet
│   ├── generate-kind-inventory.sh    Auto-generate Kind inventory for Ansible (DEPRECATED)
│   ├── configure-containerd.sh       Helper to configure containerd
│   ├── install-hooks.sh              Setup git hooks (optional)
│   └── docker-coverage.sh            Run coverage in Docker
├── tests/
│   ├── integration_basic_binary.rs
│   ├── integration_user_management.rs
│   ├── integration_shim.rs
│   ├── integration_io.rs
│   ├── integration_exec.rs
│   └── integration_overlay.rs
├── deploy/
│   └── kubernetes/                   [K8s cluster config examples]
├── examples/                         [Runnable Kind-based demos]
└── docs/
    └── TESTING.md                    [This file]

CI Integration

The CI pipeline (.github/workflows/ci.yml) runs automatically on:

  • Push to main or fix/** branches
  • Pull requests targeting main

Changes to documentation (*.md, docs/**), LICENSE*, and .gitignore are excluded from triggering runs.

Jobs

The pipeline runs these jobs:

JobDescription
Formatcargo fmt -- --check
Clippycargo clippy --workspace --all-targets -- -D warnings
Security Auditcargo audit
Testscargo test --verbose
Coveragecargo tarpaulin → Codecov upload
Build and CacheCross-compile static musl binaries (all 4 binaries)
kind-integrationFull integration test suite (run-integration-tests.sh --skip-cargo)
Example Validationtest-examples.sh --skip-cluster

Results

Results are posted to the GitHub Actions job summary. If any test fails, the workflow reports the failure with diagnostics.

Archived / Deprecated Scripts

The following scripts have been consolidated into run-integration-tests.sh and are no longer maintained:

  • kind-integration.sh — Replaced by run-integration-tests.sh (more features, better test reporting)
  • minikube-setup-runtime.sh — Minikube support deprecated
  • minikube-test.sh — Minikube support deprecated
  • test-k8s-integration.sh — Replaced by run-integration-tests.sh
  • docker-test.sh — Optional helper; use cargo test for speed or docker-coverage.sh for coverage

Next Steps

Contributing

Contributing to Reaper

Thanks for your interest in contributing! Here are some guidelines:

Code Style

  • Run cargo fmt before committing
  • Run cargo clippy to check for common mistakes
  • Write tests for new functionality

Pull Request Process

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes and commit them
  4. Push to your fork
  5. Open a Pull Request

Testing

Please ensure all tests pass:

cargo test

And check code quality:

cargo fmt
cargo clippy

License

By contributing, you agree that your contributions will be licensed under the MIT License.

Examples

Examples

Runnable examples demonstrating Reaper’s capabilities. Each example includes a setup.sh script that creates a Kind cluster with Reaper pre-installed.

Prerequisites

All examples require:

  • Docker
  • kind
  • kubectl
  • Helm (for examples using setup-playground.sh)

Note: Examples 01–08 use the legacy Ansible-based installer. Newer examples (09+) use the Helm-based setup-playground.sh pattern.

Run all scripts from the repository root.

Examples

01-scheduling/ — Node Scheduling Patterns

Demonstrates running workloads on all nodes vs. a labeled subset using DaemonSets with nodeSelector.

  • 3-node cluster (1 control-plane + 2 workers)
  • All-node DaemonSet (load/memory monitor on every node)
  • Subset DaemonSet (login-node monitor only on node-role=login nodes)
./examples/01-scheduling/setup.sh
kubectl apply -f examples/01-scheduling/all-nodes-daemonset.yaml
kubectl apply -f examples/01-scheduling/subset-nodes-daemonset.yaml

02-client-server/ — TCP Client-Server Communication

Demonstrates cross-node networking with a socat TCP server on one node and clients connecting from other nodes over host networking.

  • 4-node cluster (1 control-plane + 3 workers)
  • Server on role=server node, clients on role=client nodes
  • Clients discover the server IP via a ConfigMap
./examples/02-client-server/setup.sh
kubectl apply -f examples/02-client-server/server-daemonset.yaml
kubectl apply -f examples/02-client-server/client-daemonset.yaml
kubectl logs -l app=demo-client --all-containers --prefix -f

03-client-server-runas/ — Client-Server with Non-Root User

Same as client-server, but all workloads run as a shared non-root user (demo-svc, UID 1500 / GID 1500), demonstrating Reaper’s securityContext.runAsUser / runAsGroup support. The setup script creates the user on every node with identical IDs, mimicking an LDAP environment.

  • 4-node cluster (1 control-plane + 3 workers)
  • Shared demo-svc user created on all nodes (UID 1500, GID 1500)
  • All log output includes uid= to prove privilege drop
./examples/03-client-server-runas/setup.sh
kubectl apply -f examples/03-client-server-runas/server-daemonset.yaml
kubectl apply -f examples/03-client-server-runas/client-daemonset.yaml
kubectl logs -l app=demo-client-runas --all-containers --prefix -f

04-volumes/ — Kubernetes Volume Mounts

Demonstrates Reaper’s volume mount support across four volume types: ConfigMap, Secret, hostPath, and emptyDir. Showcases package installation (nginx) inside the overlay namespace without modifying the host.

  • 2-node cluster (1 control-plane + 1 worker)
  • ConfigMap-configured nginx, read-only Secrets, hostPath file serving, emptyDir scratch workspace
  • Software installed inside pod commands via overlay (host unmodified)
./examples/04-volumes/setup.sh
kubectl apply -f examples/04-volumes/configmap-nginx.yaml
kubectl logs configmap-nginx -f

05-kubemix/ — Kubernetes Workload Mix

Demonstrates running Jobs, DaemonSets, and Deployments simultaneously on a 10-node cluster. Each workload type targets a different set of labeled nodes, showcasing Reaper across diverse Kubernetes workload modes. All workloads read configuration from dedicated ConfigMap volumes.

  • 10-node cluster (1 control-plane + 9 workers)
  • Workers partitioned: 3 batch (Jobs), 3 daemon (DaemonSets), 3 service (Deployments)
  • Each workload reads config from its own ConfigMap volume
./examples/05-kubemix/setup.sh
kubectl apply -f examples/05-kubemix/
kubectl get pods -o wide

06-ansible-jobs/ — Ansible Jobs

Demonstrates overlay persistence by running sequential Jobs: the first installs Ansible via apt, the second runs an Ansible playbook (from a ConfigMap) to install and verify nginx. Packages installed by Job 1 persist in the shared overlay for Job 2.

  • 10-node cluster (1 control-plane + 9 workers)
  • Job 1: installs Ansible on all workers (persists in overlay)
  • Job 2: runs Ansible playbook from ConfigMap to install nginx
./examples/06-ansible-jobs/setup.sh
kubectl apply -f examples/06-ansible-jobs/install-ansible-job.yaml
kubectl wait --for=condition=Complete job/install-ansible --timeout=300s
kubectl apply -f examples/06-ansible-jobs/nginx-playbook-job.yaml

07-ansible-complex/ — Ansible Complex (Reboot-Resilient)

Fully reboot-resilient Ansible deployment using only DaemonSets. A bootstrap DaemonSet installs Ansible, then role-specific DaemonSets run playbooks (nginx on login nodes, htop on compute nodes). Init containers create implicit dependencies so a single kubectl apply -f deploys everything in the right order. All packages survive node reboots.

  • 10-node cluster (1 control-plane + 9 workers: 2 login, 7 compute)
  • 3 DaemonSets: Ansible bootstrap (all), nginx (login), htop (compute)
  • Init container dependencies — no manual ordering needed
./examples/07-ansible-complex/setup.sh
kubectl apply -f examples/07-ansible-complex/
kubectl rollout status daemonset/nginx-login --timeout=300s

08-mix-container-runtime-engines/ — Mixed Runtime Engines

Demonstrates mixed runtime engines in the same cluster: a standard containerized OpenLDAP server (default containerd/runc) alongside Reaper workloads that configure SSSD on every node. Reaper pods consume the LDAP service via a fixed ClusterIP, enabling getent passwd to resolve LDAP users on the host.

  • 4-node cluster (1 control-plane + 3 workers: 1 login, 2 compute)
  • OpenLDAP Deployment (default runtime) with 5 posixAccount users
  • Reaper DaemonSets: Ansible bootstrap + SSSD configuration (all workers)
  • Init containers handle dependency ordering (Ansible + LDAP readiness)
./examples/08-mix-container-runtime-engines/setup.sh
kubectl apply -f examples/08-mix-container-runtime-engines/
kubectl rollout status daemonset/base-config --timeout=300s

09-reaperpod/ — ReaperPod CRD

Demonstrates the ReaperPod Custom Resource Definition — a simplified, Reaper-native way to run workloads without container boilerplate. A reaper-controller watches ReaperPod resources and creates real Pods with runtimeClassName: reaper-v2 pre-configured.

  • No image: field needed (busybox placeholder handled automatically)
  • Reaper-specific fields: dnsMode, overlayName, simplified volumes
  • Status tracks phase, podName, nodeName, exitCode
# Prerequisites: install CRD and controller
kubectl create namespace reaper-system
kubectl apply -f deploy/kubernetes/crds/reaperpods.reaper.io.yaml
kubectl apply -f deploy/kubernetes/reaper-controller.yaml

# Run a simple task
kubectl apply -f examples/09-reaperpod/simple-task.yaml
kubectl get reaperpods
kubectl describe reaperpod hello-world

# With volumes (create ConfigMap first)
kubectl create configmap app-config --from-literal=greeting="Hello from ConfigMap"
kubectl apply -f examples/09-reaperpod/with-volumes.yaml

# With node selector (label a node first)
kubectl label node <name> workload-type=compute
kubectl apply -f examples/09-reaperpod/with-node-selector.yaml

10-slurm-hpc/ — Slurm HPC (Mixed Runtimes)

Demonstrates a Slurm HPC cluster using mixed Kubernetes runtimes: slurmctld (scheduler) runs as a standard container, while slurmd (worker daemons) run on compute nodes via Reaper with direct host access for CPU pinning and device management.

  • 4-node cluster (1 control-plane + 1 slurmctld + 2 compute)
  • slurmctld Deployment (default runtime) with munge authentication
  • slurmd DaemonSet (Reaper) on compute nodes with shared overlay
./examples/10-slurm-hpc/setup.sh
kubectl apply -f examples/10-slurm-hpc/
kubectl rollout status daemonset/slurmd --timeout=300s

11-node-monitoring/ — Node Monitoring (Prometheus + Reaper)

Demonstrates host-level node monitoring: Prometheus node_exporter runs as a Reaper DaemonSet for accurate host metrics, while a containerized Prometheus server (default runtime) scrapes them.

  • 3-node cluster (1 control-plane + 2 workers)
  • node_exporter DaemonSet (Reaper) — downloads and runs on host
  • Prometheus Deployment (default runtime) with Kubernetes service discovery
./examples/11-node-monitoring/setup.sh
kubectl apply -f examples/11-node-monitoring/
kubectl port-forward svc/prometheus 9090:9090

12-daemon-job/ — ReaperDaemonJob CRD (Node Configuration)

Demonstrates the ReaperDaemonJob Custom Resource Definition — a “DaemonSet for Jobs” that runs commands to completion on every matching node. Designed for node configuration tasks like Ansible playbooks that compose via shared overlays.

  • Dependency ordering via after field (second job waits for first)
  • Shared overlays via overlayName (composable node config)
  • Per-node status tracking with retry support
# Prerequisites: Reaper + controller running (via Helm or setup-playground.sh)
kubectl apply -f examples/12-daemon-job/simple-daemon-job.yaml
kubectl get reaperdaemonjobs
kubectl describe reaperdaemonjob node-info

# Composable example with dependencies
kubectl apply -f examples/12-daemon-job/composable-node-config.yaml
kubectl get rdjob -w   # watch until both jobs complete

Cleanup

Examples with setup.sh scripts can be cleaned up independently:

./examples/01-scheduling/setup.sh --cleanup
./examples/02-client-server/setup.sh --cleanup
./examples/03-client-server-runas/setup.sh --cleanup
./examples/04-volumes/setup.sh --cleanup
./examples/05-kubemix/setup.sh --cleanup
./examples/06-ansible-jobs/setup.sh --cleanup
./examples/07-ansible-complex/setup.sh --cleanup
./examples/08-mix-container-runtime-engines/setup.sh --cleanup
./examples/10-slurm-hpc/setup.sh --cleanup
./examples/11-node-monitoring/setup.sh --cleanup

For CRD-based examples (09, 12), delete the resources directly:

kubectl delete reaperpod --all
kubectl delete reaperdaemonjob --all

Custom Resource Definitions

Reaper provides three CRDs for managing workloads, overlay filesystems, and node-wide configuration tasks.

ReaperPod

A simplified, Reaper-native way to run workloads without standard container boilerplate.

  • Group: reaper.io
  • Version: v1alpha1
  • Kind: ReaperPod
  • Short name: rpod (kubectl get rpod)

Spec

FieldTypeRequiredDescription
commandstring[]YesCommand to execute on the host
argsstring[]NoArguments to the command
envEnvVar[]NoEnvironment variables (simplified format)
volumesVolume[]NoVolume mounts (simplified format)
nodeSelectormap[string]stringNoNode selection constraints
dnsModestringNoDNS resolution mode (host or kubernetes)
overlayNamestringNoNamed overlay group (requires matching ReaperOverlay)

Status

FieldTypeDescription
phasestringCurrent phase: Pending, Running, Succeeded, Failed
podNamestringName of the backing Pod
nodeNamestringNode where the workload runs
exitCodeintProcess exit code (when completed)
startTimestringWhen the workload started
completionTimestringWhen the workload completed

Simplified Volumes

ReaperPod volumes use a flat format instead of the nested Kubernetes volume spec:

volumes:
  - name: config
    mountPath: /etc/config
    configMap: "my-configmap"     # ConfigMap name (string)
    readOnly: true
  - name: secret
    mountPath: /etc/secret
    secret: "my-secret"           # Secret name (string)
  - name: host
    mountPath: /data
    hostPath: "/opt/data"         # Host path (string)
  - name: scratch
    mountPath: /tmp/work
    emptyDir: true                # EmptyDir (bool)

Examples

Simple Task

apiVersion: reaper.io/v1alpha1
kind: ReaperPod
metadata:
  name: hello-world
spec:
  command: ["/bin/sh", "-c", "echo Hello from $(hostname) at $(date)"]

With Volumes

apiVersion: reaper.io/v1alpha1
kind: ReaperPod
metadata:
  name: with-config
spec:
  command: ["/bin/sh", "-c", "cat /config/greeting"]
  volumes:
    - name: config
      mountPath: /config
      configMap: "app-config"
      readOnly: true

With Node Selector

apiVersion: reaper.io/v1alpha1
kind: ReaperPod
metadata:
  name: compute-task
spec:
  command: ["/bin/sh", "-c", "echo Running on $(hostname)"]
  nodeSelector:
    workload-type: compute

Controller

The reaper-controller watches ReaperPod resources and creates backing Pods with runtimeClassName: reaper-v2. It translates the simplified ReaperPod spec into a full Pod spec.

  • Pod name matches ReaperPod name (1:1 mapping)
  • Owner references enable automatic garbage collection
  • Status is mirrored from the backing Pod
  • If overlayName is set, the Pod stays Pending until a matching ReaperOverlay is Ready

ReaperOverlay

A PVC-like resource that manages named overlay filesystem lifecycles independently from ReaperPod workloads. Enables Kubernetes-native overlay creation, reset, and deletion without requiring direct node access.

  • Group: reaper.io
  • Version: v1alpha1
  • Kind: ReaperOverlay
  • Short name: rovl (kubectl get rovl)

Spec

FieldTypeDefaultDescription
resetPolicystringManualWhen to reset: Manual, OnFailure, OnDelete
resetGenerationint0Increment to trigger a reset on all nodes

Status

FieldTypeDescription
phasestringCurrent phase: Pending, Ready, Resetting, Failed
observedResetGenerationintLast resetGeneration fully applied
nodes[]arrayPer-node overlay state
nodes[].nodeNamestringNode name
nodes[].readyboolWhether the overlay is available
nodes[].lastResetTimestringISO 8601 timestamp of last reset
messagestringHuman-readable status message

PVC-like Behavior

ReaperOverlay works like a PersistentVolumeClaim:

  • Blocking: ReaperPods with overlayName stay Pending until the matching ReaperOverlay exists and is Ready
  • Cleanup on delete: A finalizer ensures on-disk overlay data is cleaned up on all nodes when the ReaperOverlay is deleted
  • Reset: Increment spec.resetGeneration to trigger overlay teardown and recreation on all nodes

Examples

Create an Overlay

apiVersion: reaper.io/v1alpha1
kind: ReaperOverlay
metadata:
  name: slurm
spec:
  resetPolicy: Manual

Use with a ReaperPod

apiVersion: reaper.io/v1alpha1
kind: ReaperPod
metadata:
  name: install-slurm
spec:
  overlayName: slurm
  command: ["bash", "-c", "apt-get update && apt-get install -y slurm-wlm"]

Reset a Corrupt Overlay

kubectl patch rovl slurm --type merge -p '{"spec":{"resetGeneration":1}}'
kubectl get rovl slurm -w   # watch until phase returns to Ready

Delete an Overlay

kubectl delete rovl slurm   # finalizer cleans up on-disk data on all nodes

ReaperDaemonJob

A “DaemonSet for Jobs” that runs a command to completion on every matching node, with support for dependency ordering, retry policies, and shared overlays. Designed for node configuration tasks like Ansible playbooks that compose via shared overlays.

  • Group: reaper.io
  • Version: v1alpha1
  • Kind: ReaperDaemonJob
  • Short name: rdjob (kubectl get rdjob)

Spec

FieldTypeDefaultDescription
commandstring[](required)Command to execute on each node
argsstring[]Arguments to the command
envEnvVar[]Environment variables (same format as ReaperPod)
workingDirstringWorking directory for the command
overlayNamestringNamed overlay group for shared filesystem
nodeSelectormap[string]stringTarget specific nodes by labels (all nodes if empty)
dnsModestringDNS resolution mode (host or kubernetes)
runAsUserintUID for the process
runAsGroupintGID for the process
volumesVolume[]Volume mounts (same format as ReaperPod)
tolerationsToleration[]Tolerations for the underlying Pods
triggerOnstringNodeReadyTrigger events: NodeReady or Manual
afterstring[]Dependency ordering — names of other ReaperDaemonJobs that must complete first
retryLimitint0Maximum retries per node on failure
concurrencyPolicystringSkipWhat to do on re-trigger while running: Skip or Replace

Status

FieldTypeDescription
phasestringOverall phase: Pending, Running, Completed, PartiallyFailed
readyNodesintNumber of nodes that completed successfully
totalNodesintTotal number of targeted nodes
observedGenerationintLast spec generation reconciled
nodeStatuses[]arrayPer-node execution status
nodeStatuses[].nodeNamestringNode name
nodeStatuses[].phasestringPer-node phase: Pending, Running, Succeeded, Failed
nodeStatuses[].reaperPodNamestringName of the ReaperPod created for this node
nodeStatuses[].exitCodeintExit code on this node
nodeStatuses[].retryCountintNumber of retries so far
messagestringHuman-readable status message

Controller Layering

ReaperDaemonJobReaperPodPod. The DaemonJob controller creates one ReaperPod per matching node, pinned via nodeName. The existing ReaperPod controller then creates the backing Pods. No changes to the runtime or shim.

Dependency Ordering

The after field lists other ReaperDaemonJobs that must reach Completed phase before this job starts on any node. This enables composable workflows where one job’s output is another’s input (via shared overlays).

Examples

Simple Node Info

apiVersion: reaper.io/v1alpha1
kind: ReaperDaemonJob
metadata:
  name: node-info
spec:
  command: ["/bin/sh", "-c"]
  args:
    - |
      echo "Node: $(hostname)"
      echo "Kernel: $(uname -r)"

Composable Node Config with Dependencies

apiVersion: reaper.io/v1alpha1
kind: ReaperDaemonJob
metadata:
  name: mount-filesystems
spec:
  command: ["/bin/sh", "-c"]
  args: ["mkdir -p /mnt/shared && mount -t nfs server:/export /mnt/shared"]
  overlayName: node-config
  nodeSelector:
    role: compute
---
apiVersion: reaper.io/v1alpha1
kind: ReaperDaemonJob
metadata:
  name: install-packages
spec:
  command: ["/bin/sh", "-c"]
  args: ["apt-get update && apt-get install -y htop"]
  overlayName: node-config
  after:
    - mount-filesystems
  nodeSelector:
    role: compute
  retryLimit: 2

Helm Chart Reference

The Reaper Helm chart is located at deploy/helm/reaper/.

Installation

helm upgrade --install reaper deploy/helm/reaper/ \
  --namespace reaper-system --create-namespace \
  --wait --timeout 120s

Values

Node Installer DaemonSet

ValueDefaultDescription
node.image.repositoryghcr.io/miguelgila/reaper-nodeNode installer image
node.image.tag"" (uses appVersion)Image tag
node.image.pullPolicyIfNotPresentPull policy
node.installPath/usr/local/binBinary install path on host
node.configureContainerdfalseWhether to configure and restart containerd

CRD Controller Deployment

ValueDefaultDescription
controller.image.repositoryghcr.io/miguelgila/reaper-controllerController image
controller.image.tag"" (uses appVersion)Image tag
controller.image.pullPolicyIfNotPresentPull policy
controller.replicas1Number of controller replicas
controller.resources.requests.cpu10mCPU request
controller.resources.requests.memory32MiMemory request
controller.resources.limits.cpu100mCPU limit
controller.resources.limits.memory64MiMemory limit

Agent DaemonSet

ValueDefaultDescription
agent.enabledtrueEnable the agent DaemonSet
agent.image.repositoryghcr.io/miguelgila/reaper-agentAgent image
agent.image.tag"" (uses appVersion)Image tag
agent.image.pullPolicyIfNotPresentPull policy
agent.resources.requests.cpu10mCPU request
agent.resources.requests.memory32MiMemory request
agent.resources.limits.cpu100mCPU limit
agent.resources.limits.memory64MiMemory limit

RuntimeClass

ValueDefaultDescription
runtimeClass.namereaper-v2RuntimeClass name
runtimeClass.handlerreaper-v2Containerd handler name

Reaper Configuration

ValueDefaultDescription
config.dnsModekubernetesDNS resolution mode
config.runtimeLog/run/reaper/runtime.logRuntime log path

What Gets Installed

The chart installs:

  1. CRDs (deploy/helm/reaper/crds/) — ReaperPod CRD definition
  2. Namespacereaper-system (created by --create-namespace)
  3. Node DaemonSet — Init container copies shim + runtime binaries to host
  4. Controller Deployment — Watches ReaperPod CRDs, creates Pods
  5. Agent DaemonSet — Health monitoring and Prometheus metrics
  6. RuntimeClass — Registers reaper-v2 with Kubernetes
  7. RBAC — ServiceAccount, ClusterRole, ClusterRoleBinding for controller and agent