This guide walks you through a complete hosted training run — from setting up your workspace and choosing an environment to launching a run, monitoring progress, and reviewing results.Documentation Index
Fetch the complete documentation index at: https://docs.primeintellect.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Make sure you’ve completed the initial setup:Step 1: Choose an Environment
You can use an existing environment from the Environments Hub or create your own. For this walkthrough, we’ll use thealphabet-sort environment — a multi-turn game where the model must sort letters into alphabetical order. If you’re new to how environments work, see The Environment Model.
Install it:
Step 2: Run a Baseline Evaluation
Before training, evaluate the base model to establish a baseline. This helps you confirm the environment works and understand where the model starts:Step 3: Choose a Model
Check which models are available for Hosted Training:| Use Case | Recommended Model |
|---|---|
| Quick validation | Qwen/Qwen3-4B-Instruct-2507 |
| Experimentation | Qwen/Qwen3-30B-A3B-Instruct-2507 |
| Production scale | Qwen/Qwen3-235B-A22B-Instruct-2507 or PrimeIntellect/INTELLECT-3 |
Step 4: Create a Training Config
Training runs are configured via a.toml file. Create one in your configs/lab/ directory:
model— The Hugging Face model ID (must be a supported model)max_steps— Total number of training stepsbatch_size— Number of rollouts per training batchrollouts_per_example— How many rollouts to generate per dataset example[sampling].max_tokens— Maximum tokens the model can generate per response[[env]].id— The environment ID on the Environments Hub
Step 5: Launch the Training Run
Start the run:Step 6: Monitor the Run
You can monitor your run in two ways: In the terminal — stream logs in real-time:- Reward — The overall reward curve should trend upward over time
- Rubric — Individual rubric component scores
- Reward Distribution — Should shift from lower to higher values as training progresses
Step 7: Review Results
Once the run completes, you can review the trained model’s performance by running an evaluation with the trained adapter. Trained LoRA adapters can be downloaded from the dashboard. You can also deploy your trained LoRA adapter for live inference — see Deploying LoRA Adapters for Inference for a step-by-step guide. To compare against the baseline, re-run the same evaluation you ran in Step 2 and compare scores.Putting It All Together
Here’s the complete workflow as a single script:Run Size Guidelines
Depending on your goals, here are some recommended configurations:Small Run (Validation)
Use this to verify your environment and config work correctly before committing to a longer run.Medium Run (Experimentation)
Good for iterating on environment design and hyperparameters.Large Run (Production)
For serious training with full monitoring and evaluation.Advanced Configs
Explore all configuration options including multi-env training, evaluation, and difficulty filtering.
Troubleshooting
Solutions for common issues with hosted training runs.