Skip to main content
Hosted Training supports a range of open-weights models. This page lists the currently available models, their pricing, and guidance on choosing the right model for your use case.
The model list is subject to change during the Private Beta as we adjust and expand our infrastructure. Run prime rl models for the most up-to-date list.

Available Models

ModelParametersArchitectureNotes
HuggingFaceTB/SmolLM3-3B3BDenseCompact model for lightweight tasks
PrimeIntellect/INTELLECT-3.1
PrimeIntellect/MiniMax-M2.5-bf16230B (10B active)MoEMiniMax coding and agentic model
PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT0.6BDenseExample SFT model for testing
Qwen/Qwen3-30B-A3B-Instruct-250730B (3B active)MoERecommended for experimentation
Qwen/Qwen3-30B-A3B-Thinking-250730B (3B active)MoEThinking variant
Qwen/Qwen3-4B-Instruct-25074BDenseRecommended for validation runs
Qwen/Qwen3-4B-Thinking-25074BDenseThinking variant with chain-of-thought
Qwen/Qwen3-VL-4B-Instruct4BDenseVision-language model
Qwen/Qwen3-VL-8B-Instruct8BDenseVision-language model
Qwen/Qwen3.5-35B-A3B
Qwen/Qwen3.5-4B
Qwen/Qwen3.5-9B
allenai/OLMo-3-7B-Instruct7BDenseAi2 open language model
arcee-ai/Trinity-MiniArcee’s Trinity model
arcee-ai/Trinity-Nano-Preview
meta-llama/Llama-3.2-1B-Instruct1BDenseSmallest option, good for rapid prototyping
meta-llama/Llama-3.2-3B-Instruct3BDense
nvidia/OpenReasoning-Nemotron-7B7BDenseReasoning model based on Qwen2.5-7B
zai-org/GLM-4.7

Pricing

Models are currently free to use with rate limits during the Private Beta. Paid usage will be enabled in the near future. Pricing will be per million tokens, billed separately for input, output, and training tokens. Prefix cache hits receive automatically-applied discounts.

Choosing a Model

For Validation and Debugging

Start with a small, fast model to verify your environment and config work correctly before committing compute to a larger run. Recommended: Qwen/Qwen3-4B-Instruct-2507 or meta-llama/Llama-3.2-1B-Instruct
model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 50
batch_size = 128
rollouts_per_example = 8

For Experimentation

MoE models with small active parameter counts give strong performance per token. Recommended: Qwen/Qwen3-30B-A3B-Instruct-2507
model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
max_steps = 200
batch_size = 256
rollouts_per_example = 16

For Production Training

For serious training runs where you want the strongest possible results, use the larger MoE models. Recommended: Qwen/Qwen3-235B-A22B-Instruct-2507 or PrimeIntellect/INTELLECT-3
model = "Qwen/Qwen3-235B-A22B-Instruct-2507"
max_steps = 500
batch_size = 512
rollouts_per_example = 16

Instruct vs. Thinking Variants

Models with -Thinking in the name are trained for extended chain-of-thought reasoning. They tend to produce longer outputs with explicit reasoning steps, which can be beneficial for tasks that require multi-step problem solving (math, code, logic). However, they also consume more tokens per response. Use Instruct variants for tasks where concise responses are preferred or where the task is straightforward. Use Thinking variants for tasks that benefit from extended reasoning, such as math questions or complex coding problems.

Checking Available Models

Always use the CLI to check the current list of supported models:
prime rl models
This returns the live list of models available for Hosted Training, which may differ from this page as models are being added regularly.