Reinforcement Fine-Tuning (RFT) on Prime Intellect using prime-rl, verifiers, and Environments Hub
prime-rl
trainer, and access hundreds of pre-built environments from our Environments Hub, which all use verifiers
.
prime-rl
(GitHub) - Trainer for large-scale FSDP trainingverifiers
(GitHub) - Library of modular components for building RL environmentsprime-rl
, verifiers
, and the Prime CLI to minimize setup.
prime-rl
requires at least 2 GPUs for training.prime-rl
is installed in /workspace/prime-rl
:
flash-attn
is installedverifiers
(repo).
You can find hundreds of community-contributed environments on the Environments Hub.
prime-rl
, do:
pyproject.toml
, do:
trainer
/inference
/orchestrator
config files following the aforementioned examplesid = custom-environment
in the [environment]
section of your orchestrator
config (along with any desired Environment-level args in [environment.args]
).wandb
), log inprime-rl
stack has three cooperating components: inference, orchestrator, and trainer. For convenience, you can launch all via a single entrypoint.
Once you have your config files set up, you can launch the run with:
prime-rl
README (GitHub), including:
prime-rl
README for details on checkpoint layout and async semantics (GitHub).
prime-rl
: Trainer for large-scale FSDP training — GitHubverifiers
: Library of modular components for building RL environments — GitHub