This page covers workflows for developing onDocumentation Index
Fetch the complete documentation index at: https://docs.primeintellect.ai/llms.txt
Use this file to discover all available pages before exploring further.
prime-rl itself — running the test suite, contributing changes, and adding new model architectures with the small-scale tooling we use to iterate on MoE families without booting up a 100B+ run.
Table of Contents
Test Suite
The test suite is split into three tiers, each with its own CI workflow.Layout
tests/unit/— fast-running, hermetic tests for isolated logic: config parsing and validation, advantage / loss / scheduler / packer math, individual dataset paths, model-conversion roundtrips, etc. Tests that need a GPU are tagged with thegpumarker.tests/integration/— full-stack RL/SFT runs on a tiny model end-to-end through inference + orchestrator + trainer.tests/nightly/— runs the configs inexamples/every night to catch regressions in the shipped examples.
Running Tests Locally
CI Workflows
| Workflow | Trigger | What runs | Where |
|---|---|---|---|
cpu_tests.yaml | every PR + push to main | pytest tests/unit -m "not gpu", plus a slim-wheel install check that prime-rl-configs imports cleanly without heavy deps (no torch / vllm / transformers / wandb / verifiers / datasets / liger / loguru in sys.modules) | ubuntu-latest |
gpu_tests.yaml | every non-draft PR + push to main | pytest tests/unit -m gpu, plus a matrix of named integration scenarios (reverse_text, reverse_text_sft, reverse_text_lora, reverse_text_moe, reverse_text_multi_run, reverse_text_rl_opd, reverse_text_rl_sft, reverse_text_sft_lora, alphabet_sort, benchmark_regression) | self-hosted GPU runners (vm, 4xa6000) |
nightly_tests.yaml | 03:00 PST daily + manual workflow_dispatch (single-file filter optional) | every file in tests/nightly/, one matrix job per file | research-cluster |
Markers
Two pytest markers are declared inpyproject.toml (addopts = "--strict-markers"):
gpu— gate a test that needs CUDA. CPU CI uses-m "not gpu"; the GPU unit job uses-m gpu.slow— gate a test that’s expensive enough you’d usually skip it locally. Deselect with-m "not slow".
Pre-Commit Hooks
Install the pre-commit hooks before your first commit so ruff check + format run on staged Python files automatically:Adding a New Model
Bringing up a new model family is three steps: implement the modeling code, register a mini preset, and run the smoke test. The preset and smoke test let you iterate on the modeling code at ~0.5B scale on 1–2 GPUs instead of paying the cost of the full-size model — useful for catching bugs in modeling code, state-dict conversions, and pipeline integration before scaling.Implement the Modeling Code
Drop the modeling code undersrc/prime_rl/trainer/models/<arch>/ (HF-compatible config, modeling, and weight conversion). Mirror the layout of an existing family — glm4_moe/ or qwen3_moe/ are good starting points.
Register a Mini Preset
Add an entry toscripts/mini_moe.py so the smoke-test workflow can build a ~0.5B test model in your architecture. The preset names the config class, picks small dimensions, and wires up the HF + PrimeRL model classes plus a tokenizer source:
Run the Smoke Test
Build the mini model. This creates a ~543M-parameter GLM-4 MoE (1024 hidden, 24 layers, 8 experts) with random weights, copies the tokenizer from the original GLM-4 model, and verifies the HF↔PrimeRL roundtrip is lossless:- No crashes. Validates the full inference + orchestrator + trainer pipeline end-to-end.
- Finite, non-zero KL. Confirms the reference distribution is meaningful.
- Loss reasonable. Not NaN, not stuck.