End-to-End Training Run

This guide walks you through a complete hosted training run — from setting up your workspace and choosing an environment to launching a run, monitoring progress, and reviewing results.

Prerequisites

Make sure you’ve completed the initial setup:

# Install and authenticate the CLI
uv tool install prime
prime login

# Set up a workspace
mkdir ~/dev/my-lab && cd ~/dev/my-lab
prime lab setup

See Getting Started if you need help with any of these steps.

Step 1: Choose an Environment

You can use an existing environment from the Environments Hub or create your own. For this walkthrough, we’ll use the alphabet-sort environment — a multi-turn game where the model must sort letters into alphabetical order. If you’re new to how environments work, see The Environment Model. Install it:

prime env install primeintellect/alphabet-sort

Step 2: Run a Baseline Evaluation

Before training, evaluate the base model to establish a baseline. This helps you confirm the environment works and understand where the model starts:

prime eval run primeintellect/alphabet-sort \
  -m Qwen/Qwen3-4B-Instruct-2507 \
  -n 20 -r 1

A good training environment should have a baseline reward between roughly 10–80%. If the model scores 0% after many attempts, the task is too hard. If it’s already at 80%+, consider harder examples or a different environment.

View the results:

prime eval tui

Step 3: Choose a Model

Check which models are available for Hosted Training:

prime rl models

For a first run, we recommend starting with a smaller model to validate your setup quickly:

Use Case	Recommended Model
Quick validation	`Qwen/Qwen3-4B-Instruct-2507`
Experimentation	`Qwen/Qwen3-30B-A3B-Instruct-2507`
Production scale	`Qwen/Qwen3-235B-A22B-Instruct-2507` or `PrimeIntellect/INTELLECT-3`

See Models & Pricing for the full list.

Step 4: Create a Training Config

Training runs are configured via a .toml file. Create one in your configs/lab/ directory:

# configs/lab/alphabet-sort.toml
model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 50
batch_size = 128
rollouts_per_example = 8

[sampling]
max_tokens = 512

[[env]]
id = "primeintellect/alphabet-sort"

This is a minimal config suitable for a validation run. The key fields are:

model — The Hugging Face model ID (must be a supported model)
max_steps — Total number of training steps
batch_size — Number of rollouts per training batch
rollouts_per_example — How many rollouts to generate per dataset example
[sampling].max_tokens — Maximum tokens the model can generate per response
[[env]].id — The environment ID on the Environments Hub

Step 5: Launch the Training Run

Start the run:

prime rl run configs/lab/alphabet-sort.toml

You’ll see output confirming the configuration and a link to the dashboard:

Loading config from configs/lab/alphabet-sort.toml

Creating RL training run...

Configuration:
  Model: Qwen/Qwen3-4B-Instruct-2507
  Environments: primeintellect/alphabet-sort
  Max Steps: 50
  Batch Size: 128
  Rollouts per Example: 8
  Max Tokens: 512

✓ Run created successfully!

Monitor run at:
  https://app.primeintellect.ai/dashboard/training/<run-id>

Step 6: Monitor the Run

You can monitor your run in two ways: In the terminal — stream logs in real-time:

prime rl logs <run-id> -f

On the dashboard — open the URL printed when the run started. The dashboard shows reward curves, rubric scores, reward distributions, and individual rollouts. Key metrics to watch:

Reward — The overall reward curve should trend upward over time
Rubric — Individual rubric component scores
Reward Distribution — Should shift from lower to higher values as training progresses

Step 7: Review Results

Once the run completes, you can review the trained model’s performance by running an evaluation with the trained adapter. Trained LoRA adapters can be downloaded from the dashboard. You can also deploy your trained LoRA adapter for live inference — see Deploying LoRA Adapters for Inference for a step-by-step guide. To compare against the baseline, re-run the same evaluation you ran in Step 2 and compare scores.

Putting It All Together

Here’s the complete workflow as a single script:

# Setup
uv tool install prime
prime login
mkdir ~/dev/my-lab && cd ~/dev/my-lab
prime lab setup

# Install and evaluate baseline
prime env install primeintellect/alphabet-sort
prime eval run primeintellect/alphabet-sort \
  -m Qwen/Qwen3-4B-Instruct-2507 -n 20 -r 1

# Launch training
prime rl run configs/lab/alphabet-sort.toml

# Monitor
prime rl logs <run-id> -f

Run Size Guidelines

Depending on your goals, here are some recommended configurations:

Small Run (Validation)

Use this to verify your environment and config work correctly before committing to a longer run.

model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 50
batch_size = 128
rollouts_per_example = 8

[sampling]
max_tokens = 512

Medium Run (Experimentation)

Good for iterating on environment design and hyperparameters.

model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
max_steps = 200
batch_size = 256
rollouts_per_example = 16

[sampling]
max_tokens = 512

[wandb]
project = "my-experiment"
name = "alphabet-sort-30b"

[eval]
interval = 50

Large Run (Production)

For serious training with full monitoring and evaluation.

model = "Qwen/Qwen3-235B-A22B-Instruct-2507"
max_steps = 500
batch_size = 512
rollouts_per_example = 16

[sampling]
max_tokens = 1024

[wandb]
project = "production"
name = "alphabet-sort-235b"

[eval]
interval = 100
num_examples = -1
rollouts_per_example = 1
eval_base_model = true

[val]
num_examples = 64
rollouts_per_example = 1
interval = 5

[buffer]
online_difficulty_filtering = true

Advanced Configs

Explore all configuration options including multi-env training, evaluation, and difficulty filtering.

Troubleshooting

Solutions for common issues with hosted training runs.

Getting Started

Lab

Libraries

Compute

Prerequisites

Step 1: Choose an Environment

Step 2: Run a Baseline Evaluation

Step 3: Choose a Model

Step 4: Create a Training Config

Step 5: Launch the Training Run

Step 6: Monitor the Run

Step 7: Review Results

Putting It All Together

Run Size Guidelines

Small Run (Validation)

Medium Run (Experimentation)

Large Run (Production)

Advanced Configs

Troubleshooting

Getting Started

Lab

Libraries

Compute

​Prerequisites

​Step 1: Choose an Environment

​Step 2: Run a Baseline Evaluation

​Step 3: Choose a Model

​Step 4: Create a Training Config

​Step 5: Launch the Training Run

​Step 6: Monitor the Run

​Step 7: Review Results

​Putting It All Together

​Run Size Guidelines

​Small Run (Validation)

​Medium Run (Experimentation)

​Large Run (Production)

Advanced Configs

Troubleshooting

Prerequisites

Step 1: Choose an Environment

Step 2: Run a Baseline Evaluation

Step 3: Choose a Model

Step 4: Create a Training Config

Step 5: Launch the Training Run

Step 6: Monitor the Run

Step 7: Review Results

Putting It All Together

Run Size Guidelines

Small Run (Validation)

Medium Run (Experimentation)

Large Run (Production)