Models & Pricing

Hosted Training supports a range of open-weights models. This page lists the currently available models, their pricing, and guidance on choosing the right model for your use case.

The model list is subject to change during the Private Beta as we adjust and expand our infrastructure. Run prime rl models for the most up-to-date list.

Available Models

Model	Parameters	Architecture	Notes
`meta-llama/Llama-3.2-1B-Instruct`	1B	Dense	Smallest option, good for rapid prototyping
`HuggingFaceTB/SmolLM3-3B`	3B	Dense	Compact model for lightweight tasks
`PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT`	0.6B	Dense	Example SFT model for testing
`Qwen/Qwen3-4B-Instruct-2507`	4B	Dense	Recommended for validation runs
`Qwen/Qwen3-4B-Thinking-2507`	4B	Dense	Thinking variant with chain-of-thought
`Qwen/Qwen3-30B-A3B-Instruct-2507`	30B (3B active)	MoE	Recommended for experimentation
`Qwen/Qwen3-30B-A3B-Thinking-2507`	30B (3B active)	MoE	Thinking variant
`Qwen/Qwen3-235B-A22B-Instruct-2507`	235B (22B active)	MoE	Production scale
`Qwen/Qwen3-235B-A22B-Thinking-2507`	235B (22B active)	MoE	Thinking variant
`PrimeIntellect/INTELLECT-3`	106B (12B active)	MoE	Prime Intellect’s own open-source model
`arcee-ai/Trinity-Mini`	—	—	Arcee’s Trinity model

Pricing

Models are currently free to use with rate limits during the Private Beta. Paid usage will be enabled in the near future. Pricing will be per million tokens, billed separately for input, output, and training tokens. Prefix cache hits receive automatically-applied discounts.

Choosing a Model

For Validation and Debugging

Start with a small, fast model to verify your environment and config work correctly before committing compute to a larger run. Recommended: Qwen/Qwen3-4B-Instruct-2507 or meta-llama/Llama-3.2-1B-Instruct

model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 50
batch_size = 128
rollouts_per_example = 8

For Experimentation

MoE models with small active parameter counts give strong performance per token. Recommended: Qwen/Qwen3-30B-A3B-Instruct-2507

model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
max_steps = 200
batch_size = 256
rollouts_per_example = 16

For Production Training

For serious training runs where you want the strongest possible results, use the larger MoE models. Recommended: Qwen/Qwen3-235B-A22B-Instruct-2507 or PrimeIntellect/INTELLECT-3

model = "Qwen/Qwen3-235B-A22B-Instruct-2507"
max_steps = 500
batch_size = 512
rollouts_per_example = 16

Instruct vs. Thinking Variants

Models with -Thinking in the name are trained for extended chain-of-thought reasoning. They tend to produce longer outputs with explicit reasoning steps, which can be beneficial for tasks that require multi-step problem solving (math, code, logic). However, they also consume more tokens per response. Use Instruct variants for tasks where concise responses are preferred or where the task is straightforward. Use Thinking variants for tasks that benefit from extended reasoning, such as math questions or complex coding problems.

Checking Available Models

Always use the CLI to check the current list of supported models:

prime rl models

This returns the live list of models available for Hosted Training, which may differ from this page as models are being added regularly.

Getting Started

Lab

Libraries

Compute

Available Models

Pricing

Choosing a Model

For Validation and Debugging

For Experimentation

For Production Training

Instruct vs. Thinking Variants

Checking Available Models

Getting Started

Lab

Libraries

Compute

​Available Models

​Pricing

​Choosing a Model

​For Validation and Debugging

​For Experimentation

​For Production Training

​Instruct vs. Thinking Variants

​Checking Available Models

Available Models

Pricing

Choosing a Model

For Validation and Debugging

For Experimentation

For Production Training

Instruct vs. Thinking Variants

Checking Available Models