rollout train + rollout snapshot CLI
Phase-4 user-facing entry points for supervised fine-tuning, reward-model
training, and snapshot management. Mirrors the Phase-3 rollout infer batch
clap derive shape, run_id lifecycle, and backend-selection precedence.
Subcommand overview
| Subcommand | Purpose |
|---|---|
rollout train sft | Supervised fine-tuning. Validates + runs an SftAlgo budget. |
rollout train rm | Reward-model (Bradley-Terry) training via RmAlgo. |
rollout snapshot list | List snapshots (optionally filtered by run / kind). |
rollout snapshot show | Print one snapshot's metadata by content-id. |
rollout snapshot prune | Delete snapshots per a retention policy. |
rollout train sft
rollout train sft \
--config examples/sft-tiny.toml \
[--resume <snapshot_id>] \
[--dry-run]
| Flag | Default | Purpose |
|---|---|---|
--config | required | Path to the run TOML (schema below). |
--resume | none | Snapshot content-id to restore from before the algorithm runs. |
--dry-run | false | Validate config + dataset path; never construct backend. Works with no backend feature. |
SFT TOML schema
schema_version = 1
[storage]
backend = "embedded"
[storage.embedded]
path = "./rollout.db"
[algorithm]
kind = "sft"
[algorithm.sft]
minibatch_size = 1
gradient_accumulation = 1
[algorithm.sft.base_model]
uri = "Qwen/Qwen2.5-0.5B-Instruct"
[algorithm.sft.optimizer]
kind = "sgd"
lr = 0.01
[algorithm.sft.budget]
max_steps = 10
[algorithm.sft.dataset]
kind = "jsonl_path"
path = "./data/sft.jsonl"
[algorithm.sft.packing]
kind = "off"
max_seq_len = 64
[algorithm.sft.loss_on]
kind = "full"
The schema is owned by rollout-core::config::RunConfig; the CLI is just a
TOML-loader + clap surface (spec 11 single-source-of-truth). Unknown fields
fail load with a deterministic locator.
rollout train rm
rollout train rm \
--config examples/rm-tiny.toml \
[--resume <snapshot_id>] \
[--dry-run]
Same flags as sft. The TOML differs only in the algorithm block:
schema_version = 1
[storage]
backend = "embedded"
[storage.embedded]
path = "./rollout.db"
[algorithm]
kind = "rm"
[algorithm.rm]
minibatch_size = 1
[algorithm.rm.base_model]
uri = "Qwen/Qwen2.5-0.5B-Instruct"
[algorithm.rm.optimizer]
kind = "sgd"
lr = 0.01
[algorithm.rm.budget]
max_steps = 10
[algorithm.rm.dataset]
kind = "jsonl_path"
path = "./data/pairs.jsonl"
[algorithm.rm.head]
kind = "bradley_terry"
JSONL input contract: each line { "prompt", "chosen", "rejected" }.
--dry-run semantics
In order, with the backend NEVER constructed:
- Read + parse TOML against
RunConfig(deny_unknown_fields). - Match
[algorithm].kindagainst the subcommand (sftvsrm). - Validate
minibatch_size >= 1andoptimizer.lr > 0. - Confirm
dataset.pathexists on disk (forjsonl_pathform). - Print
dry-run OK: algorithm=<sft|rm> model=… minibatch=… dataset=…and exit0.
Because --dry-run short-circuits before select_backend, it runs cleanly on
builds with NEITHER --features train NOR --features test-mock-backend.
--resume <snapshot_id> lifecycle
If --resume <hex> is set, the CLI scans the local rollout.db for a
Snapshot whose id matches the supplied content-id (32-byte blake3, 64-char
hex). The matching row is fed into algorithm.snapshot_restore(snap) before
algorithm.run(&ctx) begins. Determinism then follows the per-algorithm
contract: SftAlgo proves byte-identical resume via
tests/snapshot_resume.rs::bit_identical_resume_at_step_5; RmAlgo proves
parity via tests/snapshot_resume.rs::rm_bit_identical_resume.
Missing snapshot → Fatal(ConfigInvalid) with the failed hex echoed back.
Backend selection
rollout-cli builds with at most one training backend at a time, by Cargo
feature. Selection at runtime is precedence-ordered:
| Order | Build flags | Env | Backend |
|---|---|---|---|
| 1 | --features test-mock-backend | ROLLOUT_TEST_MOCK_BACKEND=1 | rollout_runtime_batch::MockBackend (deterministic, no GPU/Python). |
| 2 | --features vllm,train | (any) | rollout_backend_vllm::VllmBackend in train mode (live HF / accelerate). |
| 3 | none | (any) | Fatal(ConfigInvalid) with a build-mode hint; --dry-run still works. |
Build recipes:
# CI / tests (no Python, no GPU)
cargo build -p rollout-cli --features test-mock-backend
# Production (live HuggingFace + accelerate over Python)
cargo build -p rollout-cli --features vllm,train
# Both — vllm takes precedence at runtime unless ROLLOUT_TEST_MOCK_BACKEND=1.
cargo build -p rollout-cli --features test-mock-backend,vllm,train
The train feature implies vllm per Cargo.toml; the two are kept distinct
so the existing Phase-3 infer batch users get vllm without pulling the
training Python deps.
rollout snapshot list
rollout snapshot list \
[--storage-path ./rollout.db] \
[--object-path ./object-store] \
[--run-id <ULID>] \
[--kind train_state|buffer|process|episodic_memory] \
[--limit N]
Output: pretty-printed JSON array of Snapshot rows from rollout-core. Sort
order is newest-first by created_at.
| Flag | Default | Notes |
|---|---|---|
--storage-path | ./rollout.db | Opens an EmbeddedStorage read-write. |
--object-path | ./object-store | Opens a FsObjectStore (read-only for list, but consistently surfaced). |
--run-id | none | Crockford ULID; restrict to one run. Without it, every run's snapshots are scanned. |
--kind | none | snake_case match against SnapshotKind variants. |
--limit | none | Cap result length. |
rollout snapshot show <snapshot_id>
rollout snapshot show \
[--storage-path ./rollout.db] \
[--object-path ./object-store] \
<SNAPSHOT_ID>
SNAPSHOT_ID is the 64-char hex blake3 digest from Snapshot.id. Prints the
full Snapshot row as pretty-printed JSON. Missing id → exit 2 with
snapshot not found: <id>.
rollout snapshot prune
rollout snapshot prune \
--run-id <ULID> \
[--storage-path ./rollout.db] \
[--object-path ./object-store] \
[--keep-last N=3] \
[--keep-labeled]
Applies a RetentionPolicy scoped to a single run (--run-id is required to
avoid cross-run accidents). --keep-last N retains the N newest snapshots
regardless of label; --keep-labeled (default true) further retains every
labeled snapshot. Metadata rows are deleted; blob bytes stay in the object
store (the Phase-2 ObjectStore trait has no delete — pending Phase-5).
Prints pruned <N> snapshots.
Storage path conventions
| Path | Owner | Purpose |
|---|---|---|
./rollout.db | EmbeddedStorage | redb on-disk DB (always-fsync). |
./object-store/ | FsObjectStore | Content-addressed two-level sharded FS. |
<config_dir>/rollout.db | training runs | `train sft |
<config_dir>/object-store/ | training runs | Same sibling-of-config convention. |
snapshot list|show|prune accept explicit --storage-path / --object-path
so out-of-band tooling and tests can target any directory pair.
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success (or successful --dry-run). |
| 2 | Config-invalid, missing dataset / snapshot, substrate error, or algorithm error. |
Observability
RUST_LOG=info rollout train sft … produces structured tracing events. Each
SFT / RM run emits at minimum train_start, train_step, train_end; the
exact fields are pinned by the per-algorithm chapters.