Inference
Phase 3 delivers rollout's first end-user-visible building block: a batch-inference
pipeline that loads a model into vLLM via PyO3, pulls
content-addressed prompts off the Phase-2 in-memory queue, and writes JSONL
completions to disk. The pipeline is resumable — rollout infer batch --resume <run_id> scans the per-sample state in rollout-storage and only
enqueues outstanding work, producing zero duplicates on restart.
Phase 3 ships two new crates plus a CLI subcommand:
| Crate | Layer | Phase-3 responsibility |
|---|---|---|
rollout-backend-vllm | 2 | InferenceBackend impl over vLLM's AsyncLLMEngine via PyO3 (in-process). |
rollout-runtime-batch | 3 | CAS sample-state machine, JSONL I/O, plan-time validation, mock backend. |
rollout-cli (extended) | 4 | rollout infer batch --config <toml> [--resume <run_id>] subcommand. |
Per-component chapters land in plan 03-05 (smoke + docs + bench). For now, see
the Wave-0 trait extension in spec 02 §2a and
the WorkerRole addition in spec 01 §3a.
TODO (plan 03-05): vLLM backend chapter · batch-runtime chapter · CPU-mode chapter · macOS Docker dev workflow · resume semantics walkthrough.