This guide walks you through a complete hosted training run — from setting up your workspace and choosing an environment to launching a run, monitoring progress, and reviewing results.
Prerequisites
Make sure you’ve completed the initial setup:
# Install and authenticate the CLI
uv tool install prime
prime login
# Set up a workspace
mkdir ~/dev/my-lab && cd ~/dev/my-lab
prime lab setup
See Getting Started if you need help with any of these steps.
Step 1: Choose an Environment
You can use an existing environment from the Environments Hub or create your own. For this walkthrough, we’ll use the alphabet-sort environment — a multi-turn game where the model must sort letters into alphabetical order.
Install it:
prime env install primeintellect/alphabet-sort
Step 2: Run a Baseline Evaluation
Before training, evaluate the base model to establish a baseline. This helps you confirm the environment works and understand where the model starts:
prime eval run primeintellect/alphabet-sort \
-m Qwen/Qwen3-4B-Instruct-2507 \
-n 20 -r 1
A good training environment should have a baseline reward between roughly 10–80%. If the model scores 0% after many attempts, the task is too hard. If it’s already at 80%+, consider harder examples or a different environment.
View the results:
Step 3: Choose a Model
Check which models are available for Hosted Training:
For a first run, we recommend starting with a smaller model to validate your setup quickly:
| Use Case | Recommended Model |
|---|
| Quick validation | Qwen/Qwen3-4B-Instruct-2507 |
| Experimentation | Qwen/Qwen3-30B-A3B-Instruct-2507 |
| Production scale | Qwen/Qwen3-235B-A22B-Instruct-2507 or PrimeIntellect/INTELLECT-3 |
See Models & Pricing for the full list.
Step 4: Create a Training Config
Training runs are configured via a .toml file. Create one in your configs/lab/ directory:
# configs/lab/alphabet-sort.toml
model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 50
batch_size = 128
rollouts_per_example = 8
[sampling]
max_tokens = 512
[[env]]
id = "primeintellect/alphabet-sort"
This is a minimal config suitable for a validation run. The key fields are:
model — The Hugging Face model ID (must be a supported model)
max_steps — Total number of training steps
batch_size — Number of rollouts per training batch
rollouts_per_example — How many rollouts to generate per dataset example
[sampling].max_tokens — Maximum tokens the model can generate per response
[[env]].id — The environment ID on the Environments Hub
Step 5: Launch the Training Run
Start the run:
prime rl run configs/lab/alphabet-sort.toml
You’ll see output confirming the configuration and a link to the dashboard:
Loading config from configs/lab/alphabet-sort.toml
Creating RL training run...
Configuration:
Model: Qwen/Qwen3-4B-Instruct-2507
Environments: primeintellect/alphabet-sort
Max Steps: 50
Batch Size: 128
Rollouts per Example: 8
Max Tokens: 512
✓ Run created successfully!
Monitor run at:
https://app.primeintellect.ai/dashboard/training/<run-id>
Step 6: Monitor the Run
You can monitor your run in two ways:
In the terminal — stream logs in real-time:
prime rl logs <run-id> -f
On the dashboard — open the URL printed when the run started. The dashboard shows reward curves, rubric scores, reward distributions, and individual rollouts.
Key metrics to watch:
- Reward — The overall reward curve should trend upward over time
- Rubric — Individual rubric component scores
- Reward Distribution — Should shift from lower to higher values as training progresses
Step 7: Review Results
Once the run completes, you can review the trained model’s performance by running an evaluation with the trained adapter. Trained LoRA adapters can be downloaded from the dashboard.
You can also deploy your trained LoRA adapter for live inference — see Deploying LoRA Adapters for Inference for a step-by-step guide.
To compare against the baseline, re-run the same evaluation you ran in Step 2 and compare scores.
Putting It All Together
Here’s the complete workflow as a single script:
# Setup
uv tool install prime
prime login
mkdir ~/dev/my-lab && cd ~/dev/my-lab
prime lab setup
# Install and evaluate baseline
prime env install primeintellect/alphabet-sort
prime eval run primeintellect/alphabet-sort \
-m Qwen/Qwen3-4B-Instruct-2507 -n 20 -r 1
# Launch training
prime rl run configs/lab/alphabet-sort.toml
# Monitor
prime rl logs <run-id> -f
Run Size Guidelines
Depending on your goals, here are some recommended configurations:
Small Run (Validation)
Use this to verify your environment and config work correctly before committing to a longer run.
model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 50
batch_size = 128
rollouts_per_example = 8
[sampling]
max_tokens = 512
Medium Run (Experimentation)
Good for iterating on environment design and hyperparameters.
model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
max_steps = 200
batch_size = 256
rollouts_per_example = 16
[sampling]
max_tokens = 512
[wandb]
project = "my-experiment"
name = "alphabet-sort-30b"
[eval]
interval = 50
Large Run (Production)
For serious training with full monitoring and evaluation.
model = "Qwen/Qwen3-235B-A22B-Instruct-2507"
max_steps = 500
batch_size = 512
rollouts_per_example = 16
[sampling]
max_tokens = 1024
[wandb]
project = "production"
name = "alphabet-sort-235b"
[eval]
interval = 100
num_examples = -1
rollouts_per_example = 1
eval_base_model = true
[val]
num_examples = 64
rollouts_per_example = 1
interval = 5
[buffer]
online_difficulty_filtering = true