Full Fine-Tuning (Beta)

Full fine-tuning is in closed beta. Access is gated per-team — reach out to us to get enabled.

Full fine-tuning updates every parameter of the model on a dedicated cluster reserved for your run, instead of training a LoRA adapter on top of a shared deployment.

Config

Full-FT runs use the native prime-rl config schema. Set type = "full_finetune" at the top of your TOML and size the run with [deployment] — num_train_gpus / num_infer_gpus for single-node, num_train_nodes / num_infer_nodes for multi-node. Minimal single-node example (1 trainer GPU + 1 inference GPU):

type = "full_finetune"
name = "reverse-text-full-ft"
max_steps = 100
seq_len = 2048

[model]
name = "PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT"

[deployment]
num_train_gpus = 1
num_infer_gpus = 1

[trainer.optim]
lr = 3e-6

[orchestrator]
batch_size = 64
rollouts_per_example = 8

[orchestrator.train.sampling]
max_completion_tokens = 512

[[orchestrator.train.env]]
id = "primeintellect/reverse-text"
name = "reverse-text"

[orchestrator.renderer]
name = "default"

[inference]

Multi-node example (2 train nodes + 2 inference nodes, each a full 8-GPU node):

type = "full_finetune"
name = "qwen30b-math"
seq_len = 32768

[model]
name = "Qwen/Qwen3-30B-A3B-Thinking-2507"

[deployment]
num_train_nodes = 2
num_infer_nodes = 2

[trainer.model]
impl = "custom"
attn = "flash_attention_3"
ep = 8                            # expert parallel (MoE)

[trainer.optim]
type = "adamw"
lr = 1e-6

[orchestrator]
batch_size = 512
oversampling_factor = 2
max_off_policy_steps = 8

[orchestrator.train.sampling]
max_completion_tokens = 32768

[[orchestrator.train.env]]
id = "primeintellect/math-env"
name = "math"

[inference.parallel]
tp = 8                            # tensor parallel inside each inference replica

Multi-node runs broadcast weights over NCCL by default and auto-discover the cluster’s RDMA devices — no extra config needed.

See the prime-rl docs and config examples for the full schema.

Launching a run

Same CLI as LoRA — prime train auto-detects the config shape:

prime train run configs/full-ft.toml

On dispatch you get a run ID:

Dispatched hosted run wn2cjdrzdo6bmfqajoeuu30p

Runs use the main tag of the prime-rl image by default. Pin a specific build with --image-tag v0.5.1 on the CLI or image_tag = "v0.5.1" in the TOML (CLI wins).

Monitoring

A full-FT run has several distinct components. Pick which one to read with -c / --component:

prime train logs <run-id>                  # orchestrator (default)
prime train logs <run-id> -c trainer       # trainer (FSDP / torchrun)
prime train logs <run-id> -c inference     # vLLM inference server
prime train logs <run-id> --env <env-name> # env-server for a specific env

List the orchestrator and env-server components for a run:

prime train components <run-id>

Follow and filter the same way as LoRA — -f, --search, --regex, --level, --since. See Monitoring for details. The dashboard works as it does for LoRA runs: reward curves, rubric scores, and individual rollouts at https://app.primeintellect.ai/dashboard/training/<run-id>.

End-to-End Run

LoRA walkthrough — most workflow steps apply identically.

prime-rl Configuration

Full reference for the underlying training framework config schema.

Getting Started

Lab

Libraries

Compute

Config

Launching a run

Monitoring

End-to-End Run

prime-rl Configuration

​Config

​Launching a run

​Monitoring

End-to-End Run

prime-rl Configuration

Config

Launching a run

Monitoring