Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.primeintellect.ai/llms.txt

Use this file to discover all available pages before exploring further.

Hosted Training supports a range of open-weights models. This page lists the currently available models, their pricing, and guidance on choosing the right model for your use case.

Available Models

Prices are per million tokens, billed separately for input, output, and training. Prefix cache hits receive automatically-applied discounts.
ModelInput ($ / 1M)Output ($ / 1M)Train ($ / 1M)
Qwen/Qwen3.5-0.8B0.020.060.06
Qwen/Qwen3.5-2B0.050.150.15
Qwen/Qwen3.5-4B0.100.300.30
Qwen/Qwen3.5-9B0.200.600.60
Qwen/Qwen3.5-35B-A3B0.250.751.00
Qwen/Qwen3.5-122B-A10B0.501.502.00
Qwen/Qwen3.5-397B-A17B1.003.004.00
meta-llama/Llama-3.2-1B-Instruct0.020.060.06
meta-llama/Llama-3.2-3B-Instruct0.050.150.15
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF160.150.450.60
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF160.300.901.20
openai/gpt-oss-20b0.100.300.40
openai/gpt-oss-120b0.250.751.00

Choosing a Model

For Validation and Debugging

Start with a small, fast model to verify your environment and config work correctly before committing compute to a larger run. Recommended: Qwen/Qwen3.5-0.8B or meta-llama/Llama-3.2-1B-Instruct
model = "Qwen/Qwen3.5-0.8B"
max_steps = 50
batch_size = 128
rollouts_per_example = 8

For Experimentation

MoE models with small active parameter counts give strong performance per token. Recommended: Qwen/Qwen3.5-35B-A3B
model = "Qwen/Qwen3.5-35B-A3B"
max_steps = 200
batch_size = 256
rollouts_per_example = 16

For Production Training

For serious training runs where you want the strongest possible results, use the larger MoE models. Recommended: Qwen/Qwen3.5-397B-A17B or nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
model = "Qwen/Qwen3.5-397B-A17B"
max_steps = 500
batch_size = 512
rollouts_per_example = 16

Thinking Mode

Qwen3.5 and Nemotron models support a thinking mode that produces extended chain-of-thought reasoning before the final answer. Toggle it via [sampling].enable_thinking in your config. Thinking mode tends to help on tasks that benefit from multi-step reasoning (math, code, logic) at the cost of longer outputs.

Checking Available Models

Always use the CLI to check the current list of supported models:
prime train models
Add --output json to get the live pricing alongside the model list. This may differ from this page as models are being added regularly.