rollout train + rollout snapshot CLI

Phase-4 user-facing entry points for supervised fine-tuning, reward-model training, and snapshot management. Mirrors the Phase-3 rollout infer batch clap derive shape, run_id lifecycle, and backend-selection precedence.

Subcommand overview

SubcommandPurpose
rollout train sftSupervised fine-tuning. Validates + runs an SftAlgo budget.
rollout train rmReward-model (Bradley-Terry) training via RmAlgo.
rollout snapshot listList snapshots (optionally filtered by run / kind).
rollout snapshot showPrint one snapshot's metadata by content-id.
rollout snapshot pruneDelete snapshots per a retention policy.

rollout train sft

rollout train sft \
    --config examples/sft-tiny.toml \
    [--resume <snapshot_id>] \
    [--dry-run]
FlagDefaultPurpose
--configrequiredPath to the run TOML (schema below).
--resumenoneSnapshot content-id to restore from before the algorithm runs.
--dry-runfalseValidate config + dataset path; never construct backend. Works with no backend feature.

SFT TOML schema

schema_version = 1

[storage]
backend = "embedded"
[storage.embedded]
path = "./rollout.db"

[algorithm]
kind = "sft"

[algorithm.sft]
minibatch_size       = 1
gradient_accumulation = 1

[algorithm.sft.base_model]
uri = "Qwen/Qwen2.5-0.5B-Instruct"

[algorithm.sft.optimizer]
kind = "sgd"
lr   = 0.01

[algorithm.sft.budget]
max_steps = 10

[algorithm.sft.dataset]
kind = "jsonl_path"
path = "./data/sft.jsonl"

[algorithm.sft.packing]
kind        = "off"
max_seq_len = 64

[algorithm.sft.loss_on]
kind = "full"

The schema is owned by rollout-core::config::RunConfig; the CLI is just a TOML-loader + clap surface (spec 11 single-source-of-truth). Unknown fields fail load with a deterministic locator.

rollout train rm

rollout train rm \
    --config examples/rm-tiny.toml \
    [--resume <snapshot_id>] \
    [--dry-run]

Same flags as sft. The TOML differs only in the algorithm block:

schema_version = 1
[storage]
backend = "embedded"
[storage.embedded]
path = "./rollout.db"

[algorithm]
kind = "rm"

[algorithm.rm]
minibatch_size = 1

[algorithm.rm.base_model]
uri = "Qwen/Qwen2.5-0.5B-Instruct"

[algorithm.rm.optimizer]
kind = "sgd"
lr   = 0.01

[algorithm.rm.budget]
max_steps = 10

[algorithm.rm.dataset]
kind = "jsonl_path"
path = "./data/pairs.jsonl"

[algorithm.rm.head]
kind = "bradley_terry"

JSONL input contract: each line { "prompt", "chosen", "rejected" }.

--dry-run semantics

In order, with the backend NEVER constructed:

  1. Read + parse TOML against RunConfig (deny_unknown_fields).
  2. Match [algorithm].kind against the subcommand (sft vs rm).
  3. Validate minibatch_size >= 1 and optimizer.lr > 0.
  4. Confirm dataset.path exists on disk (for jsonl_path form).
  5. Print dry-run OK: algorithm=<sft|rm> model=… minibatch=… dataset=… and exit 0.

Because --dry-run short-circuits before select_backend, it runs cleanly on builds with NEITHER --features train NOR --features test-mock-backend.

--resume <snapshot_id> lifecycle

If --resume <hex> is set, the CLI scans the local rollout.db for a Snapshot whose id matches the supplied content-id (32-byte blake3, 64-char hex). The matching row is fed into algorithm.snapshot_restore(snap) before algorithm.run(&ctx) begins. Determinism then follows the per-algorithm contract: SftAlgo proves byte-identical resume via tests/snapshot_resume.rs::bit_identical_resume_at_step_5; RmAlgo proves parity via tests/snapshot_resume.rs::rm_bit_identical_resume.

Missing snapshot → Fatal(ConfigInvalid) with the failed hex echoed back.

Backend selection

rollout-cli builds with at most one training backend at a time, by Cargo feature. Selection at runtime is precedence-ordered:

OrderBuild flagsEnvBackend
1--features test-mock-backendROLLOUT_TEST_MOCK_BACKEND=1rollout_runtime_batch::MockBackend (deterministic, no GPU/Python).
2--features vllm,train(any)rollout_backend_vllm::VllmBackend in train mode (live HF / accelerate).
3none(any)Fatal(ConfigInvalid) with a build-mode hint; --dry-run still works.

Build recipes:

# CI / tests (no Python, no GPU)
cargo build -p rollout-cli --features test-mock-backend

# Production (live HuggingFace + accelerate over Python)
cargo build -p rollout-cli --features vllm,train

# Both — vllm takes precedence at runtime unless ROLLOUT_TEST_MOCK_BACKEND=1.
cargo build -p rollout-cli --features test-mock-backend,vllm,train

The train feature implies vllm per Cargo.toml; the two are kept distinct so the existing Phase-3 infer batch users get vllm without pulling the training Python deps.

rollout snapshot list

rollout snapshot list \
    [--storage-path ./rollout.db] \
    [--object-path ./object-store] \
    [--run-id <ULID>] \
    [--kind train_state|buffer|process|episodic_memory] \
    [--limit N]

Output: pretty-printed JSON array of Snapshot rows from rollout-core. Sort order is newest-first by created_at.

FlagDefaultNotes
--storage-path./rollout.dbOpens an EmbeddedStorage read-write.
--object-path./object-storeOpens a FsObjectStore (read-only for list, but consistently surfaced).
--run-idnoneCrockford ULID; restrict to one run. Without it, every run's snapshots are scanned.
--kindnonesnake_case match against SnapshotKind variants.
--limitnoneCap result length.

rollout snapshot show <snapshot_id>

rollout snapshot show \
    [--storage-path ./rollout.db] \
    [--object-path ./object-store] \
    <SNAPSHOT_ID>

SNAPSHOT_ID is the 64-char hex blake3 digest from Snapshot.id. Prints the full Snapshot row as pretty-printed JSON. Missing id → exit 2 with snapshot not found: <id>.

rollout snapshot prune

rollout snapshot prune \
    --run-id <ULID> \
    [--storage-path ./rollout.db] \
    [--object-path ./object-store] \
    [--keep-last N=3] \
    [--keep-labeled]

Applies a RetentionPolicy scoped to a single run (--run-id is required to avoid cross-run accidents). --keep-last N retains the N newest snapshots regardless of label; --keep-labeled (default true) further retains every labeled snapshot. Metadata rows are deleted; blob bytes stay in the object store (the Phase-2 ObjectStore trait has no delete — pending Phase-5). Prints pruned <N> snapshots.

Storage path conventions

PathOwnerPurpose
./rollout.dbEmbeddedStorageredb on-disk DB (always-fsync).
./object-store/FsObjectStoreContent-addressed two-level sharded FS.
<config_dir>/rollout.dbtraining runs`train sft
<config_dir>/object-store/training runsSame sibling-of-config convention.

snapshot list|show|prune accept explicit --storage-path / --object-path so out-of-band tooling and tests can target any directory pair.

Exit codes

CodeMeaning
0Success (or successful --dry-run).
2Config-invalid, missing dataset / snapshot, substrate error, or algorithm error.

Observability

RUST_LOG=info rollout train sft … produces structured tracing events. Each SFT / RM run emits at minimum train_start, train_step, train_end; the exact fields are pinned by the per-algorithm chapters.