Inference

Phase 3 delivers rollout's first end-user-visible building block: a batch-inference pipeline that loads a model into vLLM via PyO3, pulls content-addressed prompts off the Phase-2 in-memory queue, and writes JSONL completions to disk. The pipeline is resumablerollout infer batch --resume <run_id> scans the per-sample state in rollout-storage and only enqueues outstanding work, producing zero duplicates on restart.

Phase 3 ships two new crates plus a CLI subcommand:

CrateLayerPhase-3 responsibility
rollout-backend-vllm2InferenceBackend impl over vLLM's AsyncLLMEngine via PyO3 (in-process).
rollout-runtime-batch3CAS sample-state machine, JSONL I/O, plan-time validation, mock backend.
rollout-cli (extended)4rollout infer batch --config <toml> [--resume <run_id>] subcommand.

Per-component chapters land in plan 03-05 (smoke + docs + bench). For now, see the Wave-0 trait extension in spec 02 §2a and the WorkerRole addition in spec 01 §3a.

TODO (plan 03-05): vLLM backend chapter · batch-runtime chapter · CPU-mode chapter · macOS Docker dev workflow · resume semantics walkthrough.