Documentation Index
Fetch the complete documentation index at: https://docs.primeintellect.ai/llms.txt
Use this file to discover all available pages before exploring further.
Getting Started
How do I quickly test my environment?
Useprime eval run with a small sample:
-s flag prints sample outputs so you can see what’s happening.
How do I see what the model is outputting?
If usingprime eval run: Results are saved automatically. Browse them interactively with:
environment -> model -> run). Press Enter on a run to open rollout details, b to go back, tab to cycle panes, e and x to expand or collapse history, pageup and pagedown to scroll history, and c for Copy Mode.
If using the Python API (env.generate() / env.evaluate()):
How do I enable debug logging?
Set theVF_LOG_LEVEL environment variable:
Environments
Which environment class should I use?
- SingleTurnEnv: One prompt, one response (Q&A, classification)
- MultiTurnEnv: Custom back-and-forth interaction (games, simulations)
- ToolEnv: Model calls Python functions (search, calculator)
- StatefulToolEnv: Tools that need per-rollout state (sandbox IDs, sessions)
What does max_turns=-1 mean?
Unlimited turns. The rollout continues until a stop condition is triggered (e.g., model stops calling tools, or a custom condition you define).
How do I add a custom stop condition?
Use the@vf.stop decorator on a method that returns True to end the rollout:
How do I handle tool call errors gracefully?
InToolEnv, customize error handling:
Reward Functions
What arguments can my reward function receive?
Reward functions receive any of these via**kwargs:
completion- the model’s responseanswer- ground truth from datasetprompt- the input promptstate- full rollout stateparser- the rubric’s parser (if set)task-vf.Taskobject for taskset-backed environmentsinfo- metadata dict from dataset
How do group reward functions work?
Group reward functions receive plural arguments (completions, answers, states) and return a list of floats. They’re detected automatically by parameter names:
Training
How do I use a local vLLM server?
Point the client to your local server:Which client_type should I use for RL training?
Three options trade off control vs simplicity:
openai_chat_completions(MITO) — server-side templating, text only. Standard OpenAI path. The trainer re-tokenizes for training, which can drift across multi-turn rollouts and fragment them into multiple samples.openai_chat_completions_token(TITO) — server-side templating, returns token IDs alongside text. The trainer doesn’t re-tokenize. Use when the server’s chat template is stable across turns.renderer(experimental) — client-side tokenization via a per-model renderer in therendererspackage. Install it withuv add "verifiers[renderers]"before usingclient_type="renderer". Stronger token-preservation in theory:bridge_to_next_turnkeeps multi-turn rollouts merged into one sample and survives mid-completion truncation cleanly. Hand-coded renderers exist only for a subset of models and corner cases are still being shaken out.
openai_chat_completions_token — it’s the tried-and-tested path. Try renderer if you want the stronger guarantees and your model has a hand-coded renderer. See Inference Client Types for the full breakdown.