> ## Documentation Index
> Fetch the complete documentation index at: https://docs.primeintellect.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# End-to-End Training Run

> Walk through a complete Hosted Training run from environment setup to results

This guide walks you through a complete Hosted Training run — from setting up your workspace and choosing an environment to launching a run, monitoring progress, and reviewing results.

## Prerequisites

Make sure you've completed the initial setup:

```bash theme={null}
# Install and authenticate the CLI
uv tool install prime
prime login

# Set up a workspace
mkdir ~/dev/my-lab && cd ~/dev/my-lab
prime lab setup
```

See [Getting Started](/hosted-training/getting-started) if you need help with any of these steps.

## Step 1: Choose an Environment

You can use an existing environment from the [Environments Hub](https://app.primeintellect.ai/dashboard/environments) or create your own. For this walkthrough, we'll use the `alphabet-sort` environment — a multi-turn game where the model must sort letters into alphabetical order. If you're new to how environments work, see [The Environment Model](/hosted-training/environment-model).

Install it:

```bash theme={null}
prime env install primeintellect/alphabet-sort
```

## Step 2: Run a Baseline Evaluation

Before training, evaluate the base model to establish a baseline. This helps you confirm the environment works and understand where the model starts:

```bash theme={null}
prime eval run primeintellect/alphabet-sort \
  -m Qwen/Qwen3-4B-Instruct-2507 \
  -n 20 -r 1
```

<Tip>
  A good training environment should have a baseline reward between roughly 10–80%. If the model scores 0% after many attempts, the task is too hard. If it's already at 80%+, consider harder examples or a different environment.
</Tip>

View the results:

```bash theme={null}
prime eval tui
```

## Step 3: Choose a Model

Check which models are available for Hosted Training:

```bash theme={null}
prime train models
```

For a first run, we recommend starting with a smaller model to validate your setup quickly:

| Use Case         | Recommended Model                                                    |
| ---------------- | -------------------------------------------------------------------- |
| Quick validation | `Qwen/Qwen3-4B-Instruct-2507`                                        |
| Experimentation  | `Qwen/Qwen3-30B-A3B-Instruct-2507`                                   |
| Production scale | `Qwen/Qwen3-235B-A22B-Instruct-2507` or `PrimeIntellect/INTELLECT-3` |

See [Models & Pricing](/hosted-training/models-and-pricing) for the full list.

## Step 4: Create a Training Config

Training runs are configured via a `.toml` file. Create one in your `configs/rl/` directory:

```toml theme={null}
# configs/rl/alphabet-sort.toml
model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 50
batch_size = 128
rollouts_per_example = 8

[sampling]
max_tokens = 512

[[env]]
id = "primeintellect/alphabet-sort"
```

This is a minimal config suitable for a validation run. The key fields are:

* `model` — The Hugging Face model ID (must be a supported model)
* `max_steps` — Total number of training steps
* `batch_size` — Number of rollouts per training batch
* `rollouts_per_example` — How many rollouts to generate per dataset example
* `[sampling].max_tokens` — Maximum tokens the model can generate per response
* `[[env]].id` — The environment ID on the Environments Hub

## Step 5: Launch the Training Run

Start the run:

```bash theme={null}
prime train run configs/rl/alphabet-sort.toml
```

You'll see output confirming the configuration and a link to the dashboard:

```
Loading config from configs/rl/alphabet-sort.toml

Creating RL training run...

Configuration:
  Model: Qwen/Qwen3-4B-Instruct-2507
  Environments: primeintellect/alphabet-sort
  Max Steps: 50
  Batch Size: 128
  Rollouts per Example: 8
  Max Tokens: 512

✓ Run created successfully!

Monitor run at:
  https://app.primeintellect.ai/dashboard/training/<run-id>
```

## Step 6: Monitor the Run

You can monitor your run in two ways:

**In the terminal** — stream logs in real-time:

```bash theme={null}
prime train logs <run-id> -f
```

**On the dashboard** — open the URL printed when the run started. The dashboard shows reward curves, rubric scores, reward distributions, and individual rollouts.

Key metrics to watch:

* **Reward** — The overall reward curve should trend upward over time
* **Rubric** — Individual rubric component scores
* **Reward Distribution** — Should shift from lower to higher values as training progresses

## Step 7: Review Results

Once the run completes, you can review the trained model's performance by running an evaluation with the trained adapter. Trained LoRA adapters can be downloaded from the dashboard.

You can also deploy your trained LoRA adapter for live inference — see [Deploying LoRA Adapters for Inference](/inference/adapter-deployments) for a step-by-step guide.

To compare against the baseline, re-run the same evaluation you ran in Step 2 and compare scores.

## Putting It All Together

Here's the complete workflow as a single script:

```bash theme={null}
# Setup
uv tool install prime
prime login
mkdir ~/dev/my-lab && cd ~/dev/my-lab
prime lab setup

# Install and evaluate baseline
prime env install primeintellect/alphabet-sort
prime eval run primeintellect/alphabet-sort \
  -m Qwen/Qwen3-4B-Instruct-2507 -n 20 -r 1

# Launch training
prime train run configs/rl/alphabet-sort.toml

# Monitor
prime train logs <run-id> -f
```

## Run Size Guidelines

Depending on your goals, here are some recommended configurations:

### Small Run (Validation)

Use this to verify your environment and config work correctly before committing to a longer run.

```toml theme={null}
model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 50
batch_size = 128
rollouts_per_example = 8

[sampling]
max_tokens = 512
```

### Medium Run (Experimentation)

Good for iterating on environment design and hyperparameters.

```toml theme={null}
model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
max_steps = 200
batch_size = 256
rollouts_per_example = 16

[sampling]
max_tokens = 512

[wandb]
project = "my-experiment"
name = "alphabet-sort-30b"

[eval]
interval = 50
```

### Large Run (Production)

For serious training with full monitoring and evaluation.

```toml theme={null}
model = "Qwen/Qwen3-235B-A22B-Instruct-2507"
max_steps = 500
batch_size = 512
rollouts_per_example = 16

[sampling]
max_tokens = 1024

[wandb]
project = "production"
name = "alphabet-sort-235b"

[eval]
interval = 100
num_examples = -1
rollouts_per_example = 1
eval_base_model = true

[val]
num_examples = 64
rollouts_per_example = 1
interval = 5

[buffer]
online_difficulty_filtering = true
```

<CardGroup cols={2}>
  <Card title="Advanced Configs" icon="sliders" href="/hosted-training/advanced-configs">
    Explore all configuration options including multi-env training, evaluation, and difficulty filtering.
  </Card>

  <Card title="Troubleshooting" icon="wrench" href="/hosted-training/troubleshooting">
    Solutions for common issues with Hosted Training runs.
  </Card>
</CardGroup>
