Skip to main content
Hosted Training supports a range of open-weights models. This page lists the currently available models, their pricing, and guidance on choosing the right model for your use case.
The model list is subject to change during the Private Beta as we adjust and expand our infrastructure. Run prime rl models for the most up-to-date list.

Available Models

ModelParametersArchitectureNotes
meta-llama/Llama-3.2-1B-Instruct1BDenseSmallest option, good for rapid prototyping
HuggingFaceTB/SmolLM3-3B3BDenseCompact model for lightweight tasks
PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT0.6BDenseExample SFT model for testing
Qwen/Qwen3-4B-Instruct-25074BDenseRecommended for validation runs
Qwen/Qwen3-4B-Thinking-25074BDenseThinking variant with chain-of-thought
Qwen/Qwen3-30B-A3B-Instruct-250730B (3B active)MoERecommended for experimentation
Qwen/Qwen3-30B-A3B-Thinking-250730B (3B active)MoEThinking variant
Qwen/Qwen3-235B-A22B-Instruct-2507235B (22B active)MoEProduction scale
Qwen/Qwen3-235B-A22B-Thinking-2507235B (22B active)MoEThinking variant
PrimeIntellect/INTELLECT-3106B (12B active)MoEPrime Intellect’s own open-source model
arcee-ai/Trinity-MiniArcee’s Trinity model

Pricing

Models are currently free to use with rate limits during the Private Beta. Paid usage will be enabled in the near future. Pricing will be per million tokens, billed separately for input, output, and training tokens. Prefix cache hits receive automatically-applied discounts.

Choosing a Model

For Validation and Debugging

Start with a small, fast model to verify your environment and config work correctly before committing compute to a larger run. Recommended: Qwen/Qwen3-4B-Instruct-2507 or meta-llama/Llama-3.2-1B-Instruct
model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 50
batch_size = 128
rollouts_per_example = 8

For Experimentation

MoE models with small active parameter counts give strong performance per token. Recommended: Qwen/Qwen3-30B-A3B-Instruct-2507
model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
max_steps = 200
batch_size = 256
rollouts_per_example = 16

For Production Training

For serious training runs where you want the strongest possible results, use the larger MoE models. Recommended: Qwen/Qwen3-235B-A22B-Instruct-2507 or PrimeIntellect/INTELLECT-3
model = "Qwen/Qwen3-235B-A22B-Instruct-2507"
max_steps = 500
batch_size = 512
rollouts_per_example = 16

Instruct vs. Thinking Variants

Models with -Thinking in the name are trained for extended chain-of-thought reasoning. They tend to produce longer outputs with explicit reasoning steps, which can be beneficial for tasks that require multi-step problem solving (math, code, logic). However, they also consume more tokens per response. Use Instruct variants for tasks where concise responses are preferred or where the task is straightforward. Use Thinking variants for tasks that benefit from extended reasoning, such as math questions or complex coding problems.

Checking Available Models

Always use the CLI to check the current list of supported models:
prime rl models
This returns the live list of models available for Hosted Training, which may differ from this page as models are being added regularly.