Hosted Training supports a range of open-weights models. This page lists the currently available models, their pricing, and guidance on choosing the right model for your use case.Documentation Index
Fetch the complete documentation index at: https://docs.primeintellect.ai/llms.txt
Use this file to discover all available pages before exploring further.
Available Models
Prices are per million tokens, billed separately for input, output, and training. Prefix cache hits receive automatically-applied discounts.| Model | Input ($ / 1M) | Output ($ / 1M) | Train ($ / 1M) |
|---|---|---|---|
Qwen/Qwen3.5-0.8B | 0.02 | 0.06 | 0.06 |
Qwen/Qwen3.5-2B | 0.05 | 0.15 | 0.15 |
Qwen/Qwen3.5-4B | 0.10 | 0.30 | 0.30 |
Qwen/Qwen3.5-9B | 0.20 | 0.60 | 0.60 |
Qwen/Qwen3.5-35B-A3B | 0.25 | 0.75 | 1.00 |
Qwen/Qwen3.5-122B-A10B | 0.50 | 1.50 | 2.00 |
Qwen/Qwen3.5-397B-A17B | 1.00 | 3.00 | 4.00 |
meta-llama/Llama-3.2-1B-Instruct | 0.02 | 0.06 | 0.06 |
meta-llama/Llama-3.2-3B-Instruct | 0.05 | 0.15 | 0.15 |
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | 0.15 | 0.45 | 0.60 |
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 | 0.30 | 0.90 | 1.20 |
openai/gpt-oss-20b | 0.10 | 0.30 | 0.40 |
openai/gpt-oss-120b | 0.25 | 0.75 | 1.00 |
Choosing a Model
For Validation and Debugging
Start with a small, fast model to verify your environment and config work correctly before committing compute to a larger run. Recommended:Qwen/Qwen3.5-0.8B or meta-llama/Llama-3.2-1B-Instruct
For Experimentation
MoE models with small active parameter counts give strong performance per token. Recommended:Qwen/Qwen3.5-35B-A3B
For Production Training
For serious training runs where you want the strongest possible results, use the larger MoE models. Recommended:Qwen/Qwen3.5-397B-A17B or nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
Thinking Mode
Qwen3.5 and Nemotron models support a thinking mode that produces extended chain-of-thought reasoning before the final answer. Toggle it via[sampling].enable_thinking in your config. Thinking mode tends to help on tasks that benefit from multi-step reasoning (math, code, logic) at the cost of longer outputs.
Checking Available Models
Always use the CLI to check the current list of supported models:--output json to get the live pricing alongside the model list. This may differ from this page as models are being added regularly.