Walk through a complete hosted RL training run from environment setup to results
This guide walks you through a complete hosted training run — from setting up your workspace and choosing an environment to launching a run, monitoring progress, and reviewing results.
You can use an existing environment from the Environments Hub or create your own. For this walkthrough, we’ll use the alphabet-sort environment — a multi-turn game where the model must sort letters into alphabetical order. If you’re new to how environments work, see The Environment Model.Install it:
Before training, evaluate the base model to establish a baseline. This helps you confirm the environment works and understand where the model starts:
Copy
Ask AI
prime eval run primeintellect/alphabet-sort \ -m Qwen/Qwen3-4B-Instruct-2507 \ -n 20 -r 1
A good training environment should have a baseline reward between roughly 10–80%. If the model scores 0% after many attempts, the task is too hard. If it’s already at 80%+, consider harder examples or a different environment.
You’ll see output confirming the configuration and a link to the dashboard:
Copy
Ask AI
Loading config from configs/lab/alphabet-sort.tomlCreating RL training run...Configuration: Model: Qwen/Qwen3-4B-Instruct-2507 Environments: primeintellect/alphabet-sort Max Steps: 50 Batch Size: 128 Rollouts per Example: 8 Max Tokens: 512✓ Run created successfully!Monitor run at: https://app.primeintellect.ai/dashboard/training/<run-id>
You can monitor your run in two ways:In the terminal — stream logs in real-time:
Copy
Ask AI
prime rl logs <run-id> -f
On the dashboard — open the URL printed when the run started. The dashboard shows reward curves, rubric scores, reward distributions, and individual rollouts.Key metrics to watch:
Reward — The overall reward curve should trend upward over time
Rubric — Individual rubric component scores
Reward Distribution — Should shift from lower to higher values as training progresses
Once the run completes, you can review the trained model’s performance by running an evaluation with the trained adapter. Trained LoRA adapters can be downloaded from the dashboard.You can also deploy your trained LoRA adapter for live inference — see Deploying LoRA Adapters for Inference for a step-by-step guide.To compare against the baseline, re-run the same evaluation you ran in Step 2 and compare scores.