Skip to main content
Both rl and sft have built-in SLURM support. Adding a [slurm] section to your config switches from local execution to SLURM job submission — no separate entrypoint needed.

Quick Start

# Local run
uv run rl @ examples/reverse_text/rl.toml

# SLURM run (same entrypoint, just add [slurm] to the config)
uv run rl @ examples/reverse_text/slurm_rl.toml
The SLURM config is a thin overlay that inherits from a base config and adds [slurm] + [deployment] sections:
# examples/reverse_text/slurm_rl.toml
toml_files = ["rl.toml"]

output_dir = "outputs/reverse-text-rl"

[slurm]
job_name = "reverse-text-rl"

How it works

When [slurm] is present, the entrypoint:
  1. Resolves the full config
  2. Renders a SLURM batch script from a Jinja2 template
  3. Writes the script and resolved config to {output_dir}/
  4. Submits via sbatch (or prints the script with --slurm.dry-run)
For single-node jobs, the entire config is dumped to a TOML file and the template simply runs uv run rl @ or uv run sft @ on the allocated node. For multi-node jobs, sub-configs are written separately and srun dispatches processes across nodes.

Configuration

[slurm] — Job submission (shared between RL and SFT)

FieldDescriptionDefault
job_nameSLURM job name"prime-rl"
project_dirPath to the project root on the cluster"."
template_pathPath to a custom Jinja2 templateauto-selected
partitionSLURM partition"cluster"
dry_runGenerate script without submittingfalse

[deployment] — Node and GPU allocation

RL uses a discriminated union with type = "single_node" (default) or type = "multi_node":
Fieldsingle_nodemulti_node
gpus_per_nodeNumber of GPUs per node (default: 8)Same
num_train_gpusTraining GPUs
num_infer_gpusInference GPUs
num_train_nodesTraining nodes
num_infer_nodesInference nodes
nodes_per_fsdp_groupNodes per FSDP island (optional)
SFT follows the same pattern but only has training nodes:
Fieldsingle_nodemulti_node
gpus_per_nodeNumber of GPUs per node (default: 8)Same
num_gpusNumber of GPUs (default: 1)
num_nodesTraining nodes (default: 2)
nodes_per_fsdp_groupNodes per FSDP island (optional)
The SLURM template is auto-selected based on deployment.type. You can override it with slurm.template_path.

Constraints

  • output_dir should be explicitly set when using SLURM (defaults to "outputs")
  • Multi-node deployment requires [slurm] to be set

RL Examples

Single-node SLURM

The simplest case: run on a single allocated node. No [deployment] needed — defaults to single_node.
output_dir = "/shared/outputs/my-rl-run"

[slurm]
job_name = "my-rl-run"

Multi-node SLURM (Hendrycks Math)

output_dir = "outputs/rl-math-moe"
max_steps = 500
seq_len = 2048

[slurm]
job_name = "hendrycks-math-rl-moe"

[deployment]
type = "multi_node"
num_train_nodes = 1
num_infer_nodes = 1

[weight_broadcast]
type = "nccl"

[model]
name = "Qwen/Qwen3-30B-A3B-Thinking-2507"

[trainer.model]
impl = "custom"
attn = "flash_attention_3"
optim_cpu_offload = true

[trainer.model.ac_offloading]
max_inflight_activations = 5

[trainer.model.ac]
freq = 1

[orchestrator]
batch_size = 512
rollouts_per_example = 16

[orchestrator.sampling]
max_tokens = 2048

[[orchestrator.env]]
id = "math-env"
name = "hendrycks-math"
args = { dataset_name = "PrimeIntellect/Hendrycks-Math", dataset_subset = "default" }

[inference.parallel]
tp = 4
dp = 2
See examples/hendrycks_math/rl.toml for the full example.

SFT Examples

Single-node SLURM

output_dir = "/shared/outputs/my-sft-run"

[slurm]
job_name = "my-sft-run"

Multi-node SLURM (MoE SFT)

output_dir = "outputs/sft-moe-math"
max_steps = 500

[slurm]
job_name = "sft-moe-math"

[deployment]
type = "multi_node"
num_nodes = 2

[model]
name = "Qwen/Qwen3-30B-A3B-Thinking-2507"
impl = "custom"
attn = "flash_attention_3"
optim_cpu_offload = true

[model.ac_offloading]
max_inflight_activations = 5

[model.ac]
freq = 1

[data]
type = "sft"
name = "PrimeIntellect/INTELLECT-3-SFT-10K"
subsets = ["default"]
splits = ["math"]
batch_size = 128
seq_len = 8192
See examples/hendrycks_math/sft.toml for the full example.

Custom SLURM Templates

The default templates handle standard setups with InfiniBand detection, environment setup, and srun-based process dispatch. For advanced use cases (custom partitions, account settings, module loads, etc.), provide your own Jinja2 template:
uv run rl @ my_config.toml --slurm.template-path path/to/my_template.sbatch.j2
See src/prime_rl/templates/ for the default templates as a starting point.

Monitoring

After submission, logs are available at:
# Single-node
tail -F {output_dir}/logs/trainer/rank_0.log

# Multi-node RL
tail -F {output_dir}/slurm/latest_train_node_rank_0.log
tail -F {output_dir}/slurm/latest_infer_node_rank_0.log
tail -F {output_dir}/slurm/latest_orchestrator.log
For convenience, a tmux launcher sets up a session with all log streams:
bash scripts/slurm_tmux.sh my-rl-job /shared/outputs/my-rl-job