Dedicated full-parameter RL training on Hosted Training
Full fine-tuning is in closed beta. Access is gated per-team — reach out to us to get enabled.
Full fine-tuning updates every parameter of the model on a dedicated cluster reserved for your run, instead of training a LoRA adapter on top of a shared deployment.
Full-FT runs use the native prime-rl config schema. Set type = "full_finetune" at the top of your TOML and size the run with [deployment] — num_train_gpus / num_infer_gpus for single-node, num_train_nodes / num_infer_nodes for multi-node.Minimal single-node example (1 trainer GPU + 1 inference GPU):
Same CLI as LoRA — prime train auto-detects the config shape:
prime train run configs/full-ft.toml
On dispatch you get a run ID:
Dispatched hosted run wn2cjdrzdo6bmfqajoeuu30p
Runs use the main tag of the prime-rl image by default. Pin a specific build with --image-tag v0.5.1 on the CLI or image_tag = "v0.5.1" in the TOML (CLI wins).
A full-FT run has several distinct components. Pick which one to read with -c / --component:
prime train logs <run-id> # orchestrator (default)prime train logs <run-id> -c trainer # trainer (FSDP / torchrun)prime train logs <run-id> -c inference # vLLM inference serverprime train logs <run-id> --env <env-name> # env-server for a specific env
List the orchestrator and env-server components for a run:
prime train components <run-id>
Follow and filter the same way as LoRA — -f, --search, --regex, --level, --since. See Monitoring for details.The dashboard works as it does for LoRA runs: reward curves, rubric scores, and individual rollouts at https://app.primeintellect.ai/dashboard/training/<run-id>.
End-to-End Run
LoRA walkthrough — most workflow steps apply identically.
prime-rl Configuration
Full reference for the underlying training framework config schema.