> ## Documentation Index
> Fetch the complete documentation index at: https://docs.primeintellect.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Advanced Configurations

> Full configuration reference for Hosted Training runs

Hosted Training runs are configured via a `.toml` file. This page covers all available configuration fields, from basic setup to advanced features like multi-environment training, online evaluation, difficulty filtering, and W\&B integration.

## Full Config Reference

Below is a complete annotated config showing all available fields. Required fields are uncommented; optional fields are shown as comments with their defaults.

```toml theme={null}
# ============================================================
# Core Configuration (required)
# ============================================================
model = "Qwen/Qwen3-30B-A3B-Instruct-2507"   # HuggingFace model ID
max_steps = 100                                # Total training steps
batch_size = 256                               # Rollouts per training batch
rollouts_per_example = 8                       # Rollouts generated per dataset example

# ============================================================
# Training Hyperparameters (optional)
# ============================================================
# learning_rate = 1e-4                         # Learning rate for LoRA
# lora_alpha = 16                              # LoRA alpha scaling factor
# oversampling_factor = 2.0                    # Oversample factor for rollout generation
# max_async_level = 2                          # Maximum async generation level
# trajectory_strategy = "interleaved"          # "interleaved" or "branching"

# ============================================================
# Secrets (optional)
# ============================================================
# env_file = ["secrets.env"]                   # File(s) containing environment secrets

# ============================================================
# Sampling Configuration (required)
# ============================================================
[sampling]
max_tokens = 512                               # Max tokens per model response
# enable_thinking = false                      # Toggle thinking mode (Qwen3.5, Nemotron)
# reasoning_effort = "high"                    # Reasoning effort: "low" | "medium" | "high" (GPT-OSS)

# ============================================================
# Environment(s) (at least one required)
# ============================================================
[[env]]
id = "primeintellect/alphabet-sort"            # Environments Hub ID (owner/name)
# args = { min_turns = 3, max_turns = 5 }      # Arguments passed to load_environment()

# Add multiple [[env]] sections for multi-environment training:
# [[env]]
# id = "primeintellect/another-env"
# args = { split = "train", max_examples = 1000 }

# ============================================================
# Weights & Biases Logging (optional)
# ============================================================
# [wandb]
# project = "my-project"                       # W&B project name
# name = "my-run-name"                         # W&B run name
# entity = "my-team"                           # W&B team/entity

# ============================================================
# Online Evaluation (optional)
# ============================================================
# [eval]
# interval = 100                               # Run eval every N training steps
# num_examples = -1                            # Number of eval examples (-1 = all)
# rollouts_per_example = 1                     # Rollouts per eval example
# eval_base_model = true                       # Also evaluate the base (untrained) model
#
# [[eval.env]]                                 # Environment-specific eval overrides
# id = "primeintellect/eval-env"
# args = { split = "test" }
# num_examples = 30
# rollouts_per_example = 4

# ============================================================
# Validation During Training (optional)
# ============================================================
# [val]
# num_examples = 64                            # Validation examples per check
# rollouts_per_example = 1                     # Rollouts per validation example
# interval = 5                                 # Validate every N steps

# ============================================================
# Buffer / Difficulty Filtering (optional)
# ============================================================
# [buffer]
# online_difficulty_filtering = false          # Enable difficulty-based sampling
# easy_threshold = 0.8                         # Reward above this = "easy"
# hard_threshold = 0.2                         # Reward below this = "hard"
# easy_fraction = 0.0                          # Fraction of easy examples to include
# hard_fraction = 0.0                          # Fraction of hard examples to include
# env_ratios = [0.5, 0.5]                      # Ratio between envs (multi-env only)
# seed = 42                                    # Random seed

# ============================================================
# Warm-Start from Checkpoint (optional)
# ============================================================
# checkpoint_id = "..."                        # Resume training from an existing checkpoint

# ============================================================
# Checkpoints (optional)
# ============================================================
# [checkpoints]
# interval = 100                               # Save checkpoint every N steps
# keep_cloud = 5                               # Keep N checkpoints in cloud (-1 = keep all)

# ============================================================
# Adapters (optional)
# ============================================================
# [adapters]
# interval = 0                                 # Upload adapter every N steps (0 = only at run end)
# keep_last = 3                                # Keep N adapters in cloud (-1 = keep all)

# ============================================================
# Infrastructure (optional)
# ============================================================
# [infrastructure]
# compute_size = "M"                           # CPU allocation: S, M (default), or L
```

## Field Reference

### Core Fields

| Field                  | Type    | Required | Description                                                                                                                                                                                                |
| ---------------------- | ------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model`                | string  | ✓        | HuggingFace model ID. Must be a [supported model](/hosted-training/models-and-pricing). Run `prime train models` to see available options.                                                                 |
| `max_steps`            | integer | ✓        | Total number of training steps.                                                                                                                                                                            |
| `batch_size`           | integer | ✓        | Number of rollouts consumed per training batch. Larger values improve stability.                                                                                                                           |
| `rollouts_per_example` | integer | ✓        | Number of rollouts generated per dataset example. Higher values give more reward signal diversity.                                                                                                         |
| `checkpoint_id`        | string  | —        | Checkpoint ID to warm-start from. The checkpoint must be in READY status, accessible to you, and from a run using the same model. See [Warm-Starting from a Checkpoint](#warm-starting-from-a-checkpoint). |

### Training Hyperparameters

| Field                 | Type             | Default         | Description                                                                                                                                                                 |
| --------------------- | ---------------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `learning_rate`       | float            | `1e-4`          | Learning rate for the LoRA adapter.                                                                                                                                         |
| `lora_alpha`          | integer          | `16`            | LoRA alpha scaling factor. Controls the magnitude of LoRA updates.                                                                                                          |
| `oversampling_factor` | float            | `2.0`           | Generate this many more rollouts than needed per batch to ensure sufficient data.                                                                                           |
| `max_async_level`     | integer          | `2`             | Maximum level of asynchronous generation. Higher values increase throughput but use more memory.                                                                            |
| `trajectory_strategy` | string           | `"interleaved"` | How multi-turn trajectories are generated. `"interleaved"` runs turns across examples concurrently. `"branching"` generates full trajectories per example before moving on. |
| `env_file`            | array of strings | `[]`            | Path(s) to `.env` files containing secrets (e.g., API keys). See [Secrets Management](#secrets-management).                                                                 |

### Sampling

| Field                         | Type    | Required | Description                                                                                                           |
| ----------------------------- | ------- | -------- | --------------------------------------------------------------------------------------------------------------------- |
| `[sampling].max_tokens`       | integer | ✓        | Maximum number of tokens the model can generate per response turn.                                                    |
| `[sampling].enable_thinking`  | boolean | —        | Toggle thinking mode for supported models (Qwen3.5, Nemotron). Mutually exclusive with `reasoning_effort`.            |
| `[sampling].reasoning_effort` | string  | —        | Reasoning effort for GPT-OSS models. One of `"low"`, `"medium"`, `"high"`. Mutually exclusive with `enable_thinking`. |

### Environment

| Field          | Type   | Required | Description                                                          |
| -------------- | ------ | -------- | -------------------------------------------------------------------- |
| `[[env]].id`   | string | ✓        | Environment ID on the Environments Hub, in `owner/name` format.      |
| `[[env]].args` | table  | —        | Arguments passed to the environment's `load_environment()` function. |

## Multi-Environment Training

You can train on multiple environments simultaneously by adding multiple `[[env]]` sections:

```toml theme={null}
[[env]]
id = "primeintellect/alphabet-sort"
args = { min_turns = 3, max_turns = 5 }

[[env]]
id = "primeintellect/gsm8k"
args = { split = "train" }
```

Control the ratio of examples from each environment using the `[buffer]` section:

```toml theme={null}
[buffer]
env_ratios = [0.6, 0.4]   # 60% alphabet-sort, 40% gsm8k
```

## Online Evaluation

Enable periodic evaluation during training to track progress without interrupting the run:

```toml theme={null}
[eval]
interval = 100                    # Evaluate every 100 steps
num_examples = -1                 # Use all eval examples
rollouts_per_example = 1
eval_base_model = true            # Include base model comparison

[[eval.env]]
id = "primeintellect/alphabet-sort"
args = { split = "test" }
num_examples = 50
rollouts_per_example = 4
```

The `[eval]` section sets global defaults, and `[[eval.env]]` sections can override settings per environment.

## Validation

Validation is a lightweight check that runs more frequently than full evaluation:

```toml theme={null}
[val]
num_examples = 64
rollouts_per_example = 1
interval = 5                      # Validate every 5 steps
```

This uses the training environment's validation split (if available) and reports metrics to W\&B and the dashboard.

## Difficulty Filtering

The difficulty buffer helps focus training on examples at the right difficulty level for the current model:

```toml theme={null}
[buffer]
online_difficulty_filtering = true
easy_threshold = 0.8              # Examples scored above 0.8 are "easy"
hard_threshold = 0.2              # Examples scored below 0.2 are "hard"
easy_fraction = 0.0               # Exclude easy examples (0.0 = drop all easy)
hard_fraction = 0.0               # Exclude hard examples
```

This is especially useful for large datasets with a wide difficulty range. By filtering out examples that are too easy (model already solves them) or too hard (model gets no reward signal), you focus compute on examples where the model can meaningfully improve.

## Checkpoints

Control how often checkpoints are saved and how many are retained in cloud storage:

```toml theme={null}
[checkpoints]
interval = 100    # Save checkpoint every 100 steps
keep_cloud = 5    # Keep last 5 checkpoints in cloud
```

| Field        | Type    | Default         | Description                                                                            |
| ------------ | ------- | --------------- | -------------------------------------------------------------------------------------- |
| `interval`   | integer | cluster default | Save a checkpoint every N training steps.                                              |
| `keep_cloud` | integer | `5`             | Number of checkpoints to retain in cloud storage. Set to `-1` to keep all checkpoints. |

Checkpoints enable resuming training from a specific step if a run is interrupted. They're automatically uploaded to cloud storage and can be used to create new runs from a saved state.

## Warm-Starting from a Checkpoint

Start a new run from an existing checkpoint by setting `checkpoint_id` at the top level of your config. The checkpoint must be READY, use the same model, and you need access to the original run.

```toml theme={null}
checkpoint_id = "cp_abc123"
```

List available checkpoints with `prime train checkpoints <run-id>`.

## Adapters

Configure periodic adapter uploads during training. Adapters are LoRA weights that can be deployed for inference.

```toml theme={null}
[adapters]
interval = 100    # Upload adapter every 100 steps
keep_last = 3     # Keep last 3 adapters in cloud
```

| Field       | Type    | Default | Description                                                                                    |
| ----------- | ------- | ------- | ---------------------------------------------------------------------------------------------- |
| `interval`  | integer | `0`     | Upload adapter every N training steps. Set to `0` to only upload the final adapter at run end. |
| `keep_last` | integer | `3`     | Number of adapters to retain in cloud storage. Set to `-1` to keep all adapters.               |

<Note>
  Deployed adapters are protected from automatic cleanup. If you deploy an adapter for inference, it will not be deleted even if it exceeds the `keep_last` limit.
</Note>

## Infrastructure

Control the CPU and memory resources allocated to your environment containers. This only affects the environments you provide — trainer and inference infrastructure is fully managed by us.

```toml theme={null}
[infrastructure]
compute_size = "L"
```

| Size | Description                                                                                                            |
| ---- | ---------------------------------------------------------------------------------------------------------------------- |
| `S`  | Lower CPU allocation. Suitable for lightweight environments.                                                           |
| `M`  | Default. Balanced allocation for most workloads.                                                                       |
| `L`  | High CPU allocation. Use for environments that compile code or for vision-language models with heavy image processing. |

<Note>
  If not specified, runs default to `M`. Most users won't need to change this — use `L` if you notice slow CPU-bound operations during training.
</Note>

## Tailscale Networking

<Note>
  Tailscale networking is an **enterprise-only** feature. Contact your account team to enable it on your organization.
</Note>

When enabled, every env-server (training and eval) for the run joins your Tailscale tailnet via a sidecar. From inside your environment code you can then reach private services — internal APIs, MCP servers, datasets behind a VPN — by their Tailscale IP, MagicDNS hostname, or by native LAN IP if a [subnet router](https://tailscale.com/kb/1019/subnets) advertises it.

```toml theme={null}
[tailscale]
enabled = true
# auth_key = "tskey-auth-..."        # preferably via TAILSCALE_AUTH_KEY env var
# hostname_prefix = "prime-hosted-training"
```

| Field                         | Type    | Default                   | Description                                                                                                                                                                                                                                           |
| ----------------------------- | ------- | ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `[tailscale].enabled`         | boolean | `false`                   | Toggle the per-run sidecar.                                                                                                                                                                                                                           |
| `[tailscale].auth_key`        | string  | —                         | Tailscale [pre-authenticated key](https://tailscale.com/kb/1085/auth-keys) (must start with `tskey-auth-`). OAuth client secrets are not supported. Prefer the `TAILSCALE_AUTH_KEY` environment variable so the secret is not committed to `rl.toml`. |
| `[tailscale].hostname_prefix` | string  | `"prime-hosted-training"` | Prefix for the Tailscale node name. The full name is derived as `{prefix}-env-{idx}-{run_id}`. 1–30 lowercase alphanumeric chars or hyphens, must start with a letter.                                                                                |

<Tip>
  Use a **tagged**, ephemeral, reusable auth key. Tagged keys let you scope the env-servers in your tailnet ACL without granting them the same access as a user-owned device.
</Tip>

## Weights & Biases Integration

Log training metrics, reward curves, and rollout samples to W\&B:

```toml theme={null}
[wandb]
project = "my-rl-experiments"
name = "qwen3-30b-alphabet-sort"
entity = "my-team"
```

When W\&B is configured, all training metrics, evaluation results, and sample rollouts are logged automatically.

## Secrets Management

<Tip>
  The recommended way to supply secrets to Hosted Training is via [environment secrets](/tutorials-environments/secrets). Secrets linked or added to your environment are automatically injected at runtime — no config changes needed.
</Tip>

If you prefer to supply secrets via a file, you can use `env_file` in your training config instead:

```toml theme={null}
env_file = ["secrets.env"]
```

The `secrets.env` file should contain key-value pairs:

```
OPENAI_API_KEY=sk-...
CUSTOM_API_KEY=...
```

You can also manage secrets via the CLI:

```bash theme={null}
prime secret list              # list global secrets
prime env secret list my-env   # list secrets for an environment
```

In your environment code, validate required keys early using `vf.ensure_keys()`:

```python theme={null}
def load_environment(api_key_var: str = "OPENAI_API_KEY") -> vf.Environment:
    vf.ensure_keys([api_key_var])
    # ...
```

<CardGroup cols={2}>
  <Card title="End-to-End Run" icon="rocket" href="/hosted-training/end-to-end-run">
    Walk through a complete training run step by step.
  </Card>

  <Card title="Troubleshooting" icon="wrench" href="/hosted-training/troubleshooting">
    Solutions for common issues with Hosted Training.
  </Card>
</CardGroup>
