This guide is for more technically proficient users who are comfortable working directly with command line tools and have some experience training or fine-tuning models.
If you want to join our private beta to gain early-access to our RFT platform, with deep integrations to our Environments Hub, and hands-on support from our research team, send us your details via this form.

Overview

Reinforcement Fine-Tuning (RFT) optimizes a language model with feedback-driven rewards from RL environments. On Prime Intellect, you can run scalable RFT jobs using our compute, our prime-rl trainer, and access hundreds of pre-built environments from our Environments Hub, which all use verifiers.
  • prime-rl (GitHub) - Trainer for large-scale FSDP training
  • verifiers (GitHub) - Library of modular components for building RL environments
  • Environments Hub - Community hub hosting hundreds of RL environments that can be used, evaluated and forked for your RL run
This guide walks you through end-to-end:
  1. Deploy compute using our RFT image with pre-installed dependencies
  2. Validate machine setup
  3. Configure RFT run by setting up RL environment(s) and config files
  4. Launch RFT run

1) Deploy compute

Go to the Deploy GPU page on Prime Intellect, and select the “Prime RL (RFT)” image. This image is preloaded with CUDA, Python/UV, prime-rl, verifiers, and the Prime CLI to minimize setup.
Deploy Prime RL Image
prime-rl requires at least 2 GPUs for training.
Select your desired GPU configuration (at least 2 GPUs), and click deploy. Once the instance is running, use the SSH command to connect to the instance. Then, once connected, prime-rl is installed in /workspace/prime-rl:
cd /workspace/prime-rl

2) Validate machine setup

  1. Check that environment uses Python 3.12
uv run python -V
  1. Check that flash-attn is installed
uv run python -c "import flash_attn"
  1. Check that you can run SFT trainer in debug model (this requires 1 GPU)
uv run sft @ configs/debug/sft.toml
  1. Check that you can run the RL trainer debug mode (this requires 1 GPU)
uv run trainer @ configs/debug/train.toml
  1. Check that you can run the orchestrator against an inference server (this requires 1 GPU)
uv run inference @ configs/debug/infer.toml
uv run orchestrator @ configs/debug/orch.toml
  1. Check that you can run a simple SFT warmup (this requires 1 GPU)
uv run sft @ configs/reverse_text/sft.toml
  1. Check that you can run a toy RL run (this requires 2 GPUs)
uv run rl \
  --trainer @ configs/reverse_text/train.toml \
  --orchestrator @ configs/reverse_text/orch.toml \
  --inference @ configs/reverse_text/infer.toml

3) Configure RFT run

We support doing RL on environments that are built with verifiers (repo). You can find hundreds of community-contributed environments on the Environments Hub.
Not all environments on the Hub are vetted or tested by Prime Intellect. For a set of verified, training and RL-ready environments, visit the prime-environments repository.
To install an environment temporarily within prime-rl, do:
# optional: for pushing environments to the hub or accessing private environments
uv run prime login

# install the environment
uv run prime env install custom-environment
To persist your environment installation in the package-wide pyproject.toml, do:
uv add custom-environment --index https://hub.primeintellect.ai/username/custom-environment
For quick API-based evaluation post-installation, do:
uv run vf-eval custom-environment # -h for config options; defaults to gpt-4.1-mini, 5 prompts, 3 rollouts each
For training:
  1. Create trainer/inference/orchestrator config files following the aforementioned examples
  2. Then set id = custom-environment in the [environment] section of your orchestrator config (along with any desired Environment-level args in [environment.args]).
There is an example of this in the configs/wordle directory.

Additional setup

  1. If you want to log your runs to W&B (wandb), log in
uv run wandb login
# Or set `export WANDB_API_KEY=...`
  1. If you require gated/ private models or datasets from HuggingFace, log in
uv run huggingface-cli login
# Or set `export HF_TOKEN=...`
  1. We provide a convenient tmux layout script to start a run and view the logs for trainer, orchestrator, and inference. To start the session simply run
bash scripts/tmux.sh

4) Launch RFT Run

The prime-rl stack has three cooperating components: inference, orchestrator, and trainer. For convenience, you can launch all via a single entrypoint. Once you have your config files set up, you can launch the run with:
uv run rl \
  --trainer @ configs/<your-environment>/train.toml \
  --orchestrator @ configs/<your-environment>/orch.toml \
  --inference @ configs/<your-environment>/infer.toml \
  --wandb.project <your-project> \
  --wandb.name <your-run-name>
This will launch the run and log to W&B. You can find more details and training examples in the prime-rl README (GitHub), including:
  • Multi-Node training
  • Detailed W&B logging
  • Detailed checkpointing
  • Benchmarking

Checkpointing and Resuming

Enable periodic checkpoints:
uv run trainer @ /workspace/configs/<your-environment>/train.toml --ckpt.interval 10
Resume from step 10:
uv run rl \
  --trainer @ /workspace/configs/<your-environment>/train.toml \
  --orchestrator @ /workspace/configs/<your-environment>/orch.toml \
  --inference @ /workspace/configs/<your-environment>/infer.toml \
  --ckpt.resume-step 10 \
  --trainer.monitor.wandb.id <your-wandb-run-id> \
  --orchestrator.monitor.wandb.id <your-wandb-run-id>
See prime-rl README for details on checkpoint layout and async semantics (GitHub).

References

  • prime-rl: Trainer for large-scale FSDP training — GitHub
  • verifiers: Library of modular components for building RL environments — GitHub
  • Environments Hub: Community hub hosting hundreds of RL environments that can be used, evaluated and forked for your RL run