rl and sft have built-in SLURM support. Adding a [slurm] section to your config switches from local execution to SLURM job submission — no separate entrypoint needed.
Quick Start
[slurm] + [deployment] sections:
How it works
When[slurm] is present, the entrypoint:
- Resolves the full config
- Renders a SLURM batch script from a Jinja2 template
- Writes the script and resolved config to
{output_dir}/ - Submits via
sbatch(or prints the script with--slurm.dry-run)
uv run rl @ or uv run sft @ on the allocated node.
For multi-node jobs, sub-configs are written separately and srun dispatches processes across nodes.
Configuration
[slurm] — Job submission (shared between RL and SFT)
| Field | Description | Default |
|---|---|---|
job_name | SLURM job name | "prime-rl" |
project_dir | Path to the project root on the cluster | "." |
template_path | Path to a custom Jinja2 template | auto-selected |
partition | SLURM partition | "cluster" |
dry_run | Generate script without submitting | false |
[deployment] — Node and GPU allocation
RL uses a discriminated union with type = "single_node" (default) or type = "multi_node":
| Field | single_node | multi_node |
|---|---|---|
gpus_per_node | Number of GPUs per node (default: 8) | Same |
num_train_gpus | Training GPUs | — |
num_infer_gpus | Inference GPUs | — |
num_train_nodes | — | Training nodes |
num_infer_nodes | — | Inference nodes |
nodes_per_fsdp_group | — | Nodes per FSDP island (optional) |
| Field | single_node | multi_node |
|---|---|---|
gpus_per_node | Number of GPUs per node (default: 8) | Same |
num_gpus | Number of GPUs (default: 1) | — |
num_nodes | — | Training nodes (default: 2) |
nodes_per_fsdp_group | — | Nodes per FSDP island (optional) |
deployment.type. You can override it with slurm.template_path.
Constraints
output_dirshould be explicitly set when using SLURM (defaults to"outputs")- Multi-node deployment requires
[slurm]to be set
RL Examples
Single-node SLURM
The simplest case: run on a single allocated node. No[deployment] needed — defaults to single_node.
Multi-node SLURM (Hendrycks Math)
examples/hendrycks_math/rl.toml for the full example.
SFT Examples
Single-node SLURM
Multi-node SLURM (MoE SFT)
examples/hendrycks_math/sft.toml for the full example.
Custom SLURM Templates
The default templates handle standard setups with InfiniBand detection, environment setup, andsrun-based process dispatch. For advanced use cases (custom partitions, account settings, module loads, etc.), provide your own Jinja2 template:
src/prime_rl/templates/ for the default templates as a starting point.