- SFT+RL Trainer: Checkpoints FSDP model shard (using DCP), optimizer and scheduler state, and progress (training step, total samples, total tokens)
- Orchestrator: Checkpoints orchestrator progress (training step, total tokens, total samples, total problems)
- Inference: Inference is stateless. Upon restart, the orchestrator will reload the correct weights into the inference engine. No checkpointing is required.
checkpoints and each checkpoint step will live in a step subdirectory, i.e. checkpoints/step_{step}.
Checkpointing is configured with the config key --ckpt. One can specify the interval (--ckpt.interval), whether to save checkpoints asynchronoously (--ckpt.save-async), and how many recent step checkpoints to keep on disk (--ckpt.keep). By default, we do not checkpoint to save disk space.
SFT
Let’s split the reverse text training SFT example, which does 40 steps by default, into two runs of 20 steps each. First, run the first 20 steps and append--ckpt flag will enable the default checkpoint configuration which will only write the final checkpoint to disk, but no intermediate checkpoints.