Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.primeintellect.ai/llms.txt

Use this file to discover all available pages before exploring further.

Getting Started

How do I quickly test my environment?

Use prime eval run with a small sample:
prime eval run my-environment -m openai/gpt-4.1-mini -n 5
The -s flag prints sample outputs so you can see what’s happening.

How do I see what the model is outputting?

If using prime eval run: Results are saved automatically. Browse them interactively with:
prime eval tui
The TUI opens a single run browser (environment -> model -> run). Press Enter on a run to open rollout details, b to go back, tab to cycle panes, e and x to expand or collapse history, pageup and pagedown to scroll history, and c for Copy Mode. If using the Python API (env.generate() / env.evaluate()):
vf.print_prompt_completions_sample(outputs, n=3)

How do I enable debug logging?

Set the VF_LOG_LEVEL environment variable:
VF_LOG_LEVEL=DEBUG prime eval run my-environment -m openai/gpt-4.1-mini -n 5

Environments

Which environment class should I use?

  • SingleTurnEnv: One prompt, one response (Q&A, classification)
  • MultiTurnEnv: Custom back-and-forth interaction (games, simulations)
  • ToolEnv: Model calls Python functions (search, calculator)
  • StatefulToolEnv: Tools that need per-rollout state (sandbox IDs, sessions)

What does max_turns=-1 mean?

Unlimited turns. The rollout continues until a stop condition is triggered (e.g., model stops calling tools, or a custom condition you define).

How do I add a custom stop condition?

Use the @vf.stop decorator on a method that returns True to end the rollout:
@vf.stop
async def task_completed(self, state: State) -> bool:
    return "DONE" in state["completion"][-1]["content"]

How do I handle tool call errors gracefully?

In ToolEnv, customize error handling:
env = ToolEnv(
    tools=[my_tool],
    error_formatter=lambda e: f"Error: {type(e).__name__}: {e}",
    stop_errors=[CriticalError],  # These errors end the rollout
)
Non-critical errors are returned to the model as tool responses so it can retry.

Reward Functions

What arguments can my reward function receive?

Reward functions receive any of these via **kwargs:
  • completion - the model’s response
  • answer - ground truth from dataset
  • prompt - the input prompt
  • state - full rollout state
  • parser - the rubric’s parser (if set)
  • task - vf.Task object for taskset-backed environments
  • info - metadata dict from dataset
Just include the ones you need in your function signature.

How do group reward functions work?

Group reward functions receive plural arguments (completions, answers, states) and return a list of floats. They’re detected automatically by parameter names:
def relative_reward(completions: list, answers: list, **kwargs) -> list[float]:
    # Score all completions for an example together
    scores = [compute_score(c, a) for c, a in zip(completions, answers)]
    # Normalize relative to group
    max_score = max(scores) if scores else 1.0
    return [s / max_score for s in scores]

Training

How do I use a local vLLM server?

Point the client to your local server:
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"
)

outputs = await env.evaluate(client, model="your-model-name", ...)

Which client_type should I use for RL training?

Three options trade off control vs simplicity:
  • openai_chat_completions (MITO) — server-side templating, text only. Standard OpenAI path. The trainer re-tokenizes for training, which can drift across multi-turn rollouts and fragment them into multiple samples.
  • openai_chat_completions_token (TITO) — server-side templating, returns token IDs alongside text. The trainer doesn’t re-tokenize. Use when the server’s chat template is stable across turns.
  • renderer (experimental) — client-side tokenization via a per-model renderer in the renderers package. Install it with uv add "verifiers[renderers]" before using client_type="renderer". Stronger token-preservation in theory: bridge_to_next_turn keeps multi-turn rollouts merged into one sample and survives mid-completion truncation cleanly. Hand-coded renderers exist only for a subset of models and corner cases are still being shaken out.
For production training, use openai_chat_completions_token — it’s the tried-and-tested path. Try renderer if you want the stronger guarantees and your model has a hand-coded renderer. See Inference Client Types for the full breakdown.