Documentation Index
Fetch the complete documentation index at: https://docs.primeintellect.ai/llms.txt
Use this file to discover all available pages before exploring further.
The rl, sft, and inference entrypoints all have built-in SLURM support. Adding a [slurm] section to your config switches from local execution to SLURM job submission — no separate entrypoint needed.
Quick Start
# Local run
uv run rl @ examples/reverse_text/rl.toml
# SLURM run (same entrypoint, just add [slurm] to the config)
uv run rl @ examples/reverse_text/slurm_rl.toml
The SLURM config is a thin overlay that inherits from a base config and adds [slurm] + [deployment] sections:
# examples/reverse_text/slurm_rl.toml
toml_files = ["rl.toml"]
output_dir = "outputs/reverse-text-rl"
[slurm]
job_name = "reverse-text-rl"
How it works
When [slurm] is present, the entrypoint:
- Resolves the full config
- Renders a SLURM batch script from a Jinja2 template
- Writes the script and resolved config to
{output_dir}/
- Submits via
sbatch (or prints the script with --slurm.dry-run)
For single-node jobs, the entire config is dumped to a TOML file and the template simply runs uv run rl @ or uv run sft @ on the allocated node.
For multi-node jobs, sub-configs are written separately and srun dispatches processes across nodes.
Configuration
[slurm] — Job submission (shared between RL and SFT)
| Field | Description | Default |
|---|
job_name | SLURM job name | "prime-rl" |
project_dir | Path to the project root on the cluster | "." |
template_path | Path to a custom Jinja2 template | auto-selected |
partition | SLURM partition | "cluster" |
nodelist | Comma-separated list of specific nodes to run on (--nodelist) | None |
exclude | Comma-separated list of nodes to exclude (--exclude) | None |
account | SLURM account to charge (--account) | None |
time | Maximum wall time, e.g. "24:00:00" (--time) | None |
pre_run_command | Shell command to run on head node after env setup, before starting the job (e.g. cleanup) | None |
[deployment] — Node and GPU allocation
RL uses a discriminated union with type = "single_node" (default) or type = "multi_node":
| Field | single_node | multi_node |
|---|
gpus_per_node | Number of GPUs per node (default: 8) | Same |
num_train_gpus | Training GPUs | — |
num_infer_gpus | Inference GPUs | — |
num_train_nodes | — | Training nodes |
num_infer_nodes | — | Inference nodes |
nodes_per_fsdp_group | — | Nodes per FSDP island (optional) |
SFT follows the same pattern but only has training nodes:
| Field | single_node | multi_node |
|---|
gpus_per_node | Number of GPUs per node (default: 8) | Same |
num_gpus | Number of GPUs (default: 1) | — |
num_nodes | — | Training nodes (default: 2) |
nodes_per_fsdp_group | — | Nodes per FSDP island (optional) |
Inference runs independent vLLM replicas per node:
| Field | single_node | multi_node |
|---|
gpus_per_node | Number of GPUs per node (default: 8) | Same |
num_nodes | — | Number of inference nodes (default: 1) |
The SLURM template is auto-selected based on deployment.type. You can override it with slurm.template_path.
Constraints
output_dir should be explicitly set when using SLURM (defaults to "outputs")
- Multi-node deployment requires
[slurm] to be set
RL Examples
Single-node SLURM
The simplest case: run on a single allocated node. No [deployment] needed — defaults to single_node.
output_dir = "/shared/outputs/my-rl-run"
[slurm]
job_name = "my-rl-run"
Multi-node SLURM (Hendrycks Math)
output_dir = "outputs/rl-math-moe"
max_steps = 500
seq_len = 2048
[slurm]
job_name = "hendrycks-math-rl-moe"
[deployment]
type = "multi_node"
num_train_nodes = 1
num_infer_nodes = 1
[weight_broadcast]
type = "nccl"
[model]
name = "Qwen/Qwen3-30B-A3B-Thinking-2507"
[trainer.model]
impl = "custom"
attn = "flash_attention_3"
optim_cpu_offload = true
[trainer.model.ac_offloading]
max_inflight_activations = 5
[trainer.model.ac]
freq = 1
[orchestrator]
batch_size = 512
rollouts_per_example = 16
[orchestrator.sampling]
max_tokens = 2048
[[orchestrator.env]]
id = "math-env"
name = "hendrycks-math"
args = { dataset_name = "PrimeIntellect/Hendrycks-Math", dataset_subset = "default" }
[inference.parallel]
tp = 4
dp = 2
See examples/hendrycks_math/rl.toml for the full example.
SFT Examples
Single-node SLURM
output_dir = "/shared/outputs/my-sft-run"
[slurm]
job_name = "my-sft-run"
Multi-node SLURM (MoE SFT)
output_dir = "outputs/sft-moe-math"
max_steps = 500
[slurm]
job_name = "sft-moe-math"
[deployment]
type = "multi_node"
num_nodes = 2
[model]
name = "Qwen/Qwen3-30B-A3B-Thinking-2507"
impl = "custom"
attn = "flash_attention_3"
optim_cpu_offload = true
[model.ac_offloading]
max_inflight_activations = 5
[model.ac]
freq = 1
[data]
type = "sft"
name = "PrimeIntellect/INTELLECT-3-SFT-10K"
subsets = ["default"]
splits = ["math"]
batch_size = 128
seq_len = 8192
See examples/hendrycks_math/sft.toml for the full example.
Inference Examples
Single-node SLURM
Run a vLLM server on a single allocated node:
output_dir = "/shared/outputs/my-inference"
[model]
name = "Qwen/Qwen3-8B"
[parallel]
tp = 8
[slurm]
job_name = "my-inference"
uv run inference @ inference_slurm.toml
Multi-node SLURM
Each node runs an independent vLLM replica. TP and DP must fit within a single node — there is no cross-node parallelism.
output_dir = "/shared/outputs/my-inference"
[model]
name = "PrimeIntellect/INTELLECT-3-RL-600"
[parallel]
tp = 4
dp = 2
[deployment]
type = "multi_node"
num_nodes = 4
[slurm]
job_name = "my-inference"
After submission, the SLURM template prints the inference URLs for all nodes (one per node).
Dry run
Use dry_run = true to generate the sbatch script without submitting:
uv run inference @ config.toml --dry-run true
Custom SLURM Templates
The default templates handle standard setups with InfiniBand detection, environment setup, and srun-based process dispatch. For advanced use cases (custom partitions, account settings, module loads, etc.), provide your own Jinja2 template:
uv run rl @ my_config.toml --slurm.template-path path/to/my_template.sbatch.j2
See src/prime_rl/templates/ for the default templates as a starting point.
Monitoring
After submission, logs are available at:
# All deployment types (trainer.log and inference.log are symlinks for multi-node)
tail -F {output_dir}/logs/trainer.log
tail -F {output_dir}/logs/orchestrator.log
tail -F {output_dir}/logs/inference.log
# Multi-node: per-node logs
tail -F {output_dir}/logs/trainer/node_*.log
tail -F {output_dir}/logs/inference/node_*.log
# Multi-node inference: per-replica router logs
tail -F {output_dir}/logs/inference/router_*.log
For convenience, a tmux launcher sets up a session with all log streams:
bash scripts/tmux.sh my-rl-job /shared/outputs/my-rl-job