Table of Contents
- Type Aliases
- Data Types
- Classes
- Client Classes
- Configuration Types
- Prime CLI Plugin
- Decorators
- Utility Functions
Type Aliases
Messages
ChatMessage
role, content, and optional tool_calls / tool_call_id fields.
Info
SamplingArgs
temperature, top_p, max_tokens).
RewardFunc
ClientType
Client implementation to use. Set via ClientConfig.client_type.
Data Types
State
dict subclass that tracks rollout information. Accessing keys in INPUT_FIELDS automatically forwards to the nested input object.
Fields set during initialization:
| Field | Type | Description |
|---|---|---|
input | RolloutInput | Nested input data |
client | Client | Client instance |
model | str | Model name |
sampling_args | SamplingArgs | None | Generation parameters |
is_completed | bool | Whether rollout has ended |
is_truncated | bool | Whether generation was truncated |
tool_defs | list[Tool] | None | Available tool definitions |
trajectory | list[TrajectoryStep] | Multi-turn trajectory |
trajectory_id | str | UUID for this rollout |
timing | RolloutTiming | Timing information |
| Field | Type | Description |
|---|---|---|
completion | Messages | None | Final completion |
reward | float | None | Final reward |
advantage | float | None | Advantage over group mean |
metrics | dict[str, float] | None | Per-function metrics |
stop_condition | str | None | Name of triggered stop condition |
error | Error | None | Error if rollout failed |
RolloutInput
RolloutOutput
dict subclass that provides typed access to known fields while supporting arbitrary additional fields from state_columns. All values must be JSON-serializable. Used in GenerateOutputs and for saving results to disk.
TrajectoryStep
TrajectoryStepTokens
RolloutTiming
GenerateOutputs
Environment.generate(). Contains a list of RolloutOutput objects (one per rollout) and generation metadata. Each RolloutOutput is a serialized, JSON-compatible dict containing the rollout’s prompt, completion, answer, reward, metrics, timing, and other per-rollout data.
GenerateMetadata
base_url is always serialized as a string. For multi-endpoint runs (e.g., using ClientConfig.endpoint_configs), it is stored as a comma-separated list of URLs.
version_info captures the verifiers framework version/commit and the environment package version/commit at generation time. Populated automatically by GenerateOutputsBuilder.
RolloutScore / RolloutScores
Classes
Environment Classes
Environment
| Method | Returns | Description |
|---|---|---|
generate(inputs, client, model, ...) | GenerateOutputs | Run rollouts asynchronously. client accepts Client | ClientConfig. |
generate_sync(inputs, client, ...) | GenerateOutputs | Synchronous wrapper |
evaluate(client, model, ...) | GenerateOutputs | Evaluate on eval_dataset |
evaluate_sync(client, model, ...) | GenerateOutputs | Synchronous evaluation |
| Method | Returns | Description |
|---|---|---|
get_dataset(n=-1, seed=None) | Dataset | Get training dataset (optionally first n, shuffled) |
get_eval_dataset(n=-1, seed=None) | Dataset | Get evaluation dataset |
make_dataset(...) | Dataset | Static method to create dataset from inputs |
| Method | Returns | Description |
|---|---|---|
rollout(input, client, model, sampling_args) | State | Abstract: run single rollout |
init_state(input, client, model, sampling_args) | State | Create initial state from input |
get_model_response(state, prompt, ...) | Response | Get model response for prompt |
is_completed(state) | bool | Check all stop conditions |
run_rollout(sem, input, client, model, sampling_args) | State | Run rollout with semaphore |
run_group(group_inputs, client, model, ...) | list[State] | Generate and score one group |
| Method | Description |
|---|---|
set_kwargs(**kwargs) | Set attributes using setter methods when available |
add_rubric(rubric) | Add or merge rubric |
set_max_seq_len(max_seq_len) | Set maximum sequence length |
set_score_rollouts(bool) | Enable/disable scoring |
SingleTurnEnv
Single-response Q&A tasks. Inherits fromEnvironment.
MultiTurnEnv
env_response.
Abstract method:
has_error, prompt_too_long, max_turns_reached, has_final_env_response
Hooks:
| Method | Description |
|---|---|
setup_state(state) | Initialize per-rollout state |
get_prompt_messages(state) | Customize prompt construction |
render_completion(state) | Customize completion rendering |
add_trajectory_step(state, step) | Customize trajectory handling |
ToolEnv
no_tools_called (ends when model responds without tool calls)
Methods:
| Method | Description |
|---|---|
add_tool(tool) | Add a tool at runtime |
remove_tool(tool) | Remove a tool at runtime |
call_tool(name, args, id) | Override to customize tool execution |
StatefulToolEnv
Tools requiring per-rollout state. Overridesetup_state and update_tool_args to inject state.
SandboxEnv
prime sandboxes.
Key parameters:
| Parameter | Type | Description |
|---|---|---|
sandbox_name | str | Name prefix for sandbox instances |
docker_image | str | Docker image to use for the sandbox |
cpu_cores | int | Number of CPU cores |
memory_gb | int | Memory allocation in GB |
disk_size_gb | int | Disk size in GB |
gpu_count | int | Number of GPUs |
timeout_minutes | int | Sandbox timeout in minutes |
timeout_per_command_seconds | int | Per-command execution timeout |
environment_vars | dict[str, str] | None | Environment variables to set in sandbox |
labels | list[str] | None | Labels for sandbox categorization and filtering |
PythonEnv
Persistent Python REPL in sandbox. ExtendsSandboxEnv.
OpenEnvEnv
.build.json), supports both gym and MCP contracts, and requires a prompt_renderer to convert observations into chat messages.
EnvGroup
Parser Classes
Parser
XMLParser
| Method | Returns | Description |
|---|---|---|
parse(text) | SimpleNamespace | Parse XML into object with field attributes |
parse_answer(completion) | str | None | Extract answer field from completion |
get_format_str() | str | Get format description string |
get_fields() | list[str] | Get canonical field names |
format(**kwargs) | str | Format kwargs into XML string |
ThinkParser
</think> tag. For models that always include <think> tags but don’t parse them automatically.
MaybeThinkParser
Handles optional<think> tags (for models that may or may not think).
Rubric Classes
Rubric
1.0. Functions with weight=0.0 are tracked as metrics only.
Methods:
| Method | Description |
|---|---|
add_reward_func(func, weight=1.0) | Add a reward function |
add_metric(func, weight=0.0) | Add a metric (no reward contribution) |
add_class_object(name, obj) | Add object accessible in reward functions |
JudgeRubric
LLM-as-judge evaluation.MathRubric
Math-specific evaluation usingmath-verify.
RubricGroup
Combines rubrics forEnvGroup.
Client Classes
Client
vf types (Messages, Tool, Response) and provider-native formats. The client property exposes the underlying SDK client (e.g., AsyncOpenAI, AsyncAnthropic).
get_response() is the main public method — it converts the prompt and tools to the native format, calls the provider API, validates the response, and converts it back to a vf.Response. Errors are wrapped in vf.ModelError unless they are already vf.Error or authentication errors.
Abstract methods (for subclass implementors):
| Method | Description |
|---|---|
setup_client(config) | Create the native SDK client from ClientConfig |
to_native_prompt(messages) | Convert Messages → native prompt format + extra kwargs |
to_native_tool(tool) | Convert Tool → native tool format |
get_native_response(prompt, model, ...) | Call the provider API |
raise_from_native_response(response) | Raise ModelError for invalid responses |
from_native_response(response) | Convert native response → vf.Response |
close() | Close the underlying SDK client |
Built-in Client Implementations
| Class | client_type | SDK Client | Description |
|---|---|---|---|
OpenAIChatCompletionsClient | "openai_chat_completions" | AsyncOpenAI | Chat Completions API (default) |
OpenAICompletionsClient | "openai_completions" | AsyncOpenAI | Legacy Completions API |
OpenAIChatCompletionsTokenClient | "openai_chat_completions_token" | AsyncOpenAI | Custom vLLM token route |
AnthropicMessagesClient | "anthropic_messages" | AsyncAnthropic | Anthropic Messages API |
vf.OpenAIChatCompletionsClient, vf.AnthropicMessagesClient, etc.
Response
Client implementations return Response from get_response().
Tool
Client converts them to its native format via to_native_tool().
Configuration Types
ClientConfig
client_type selects which Client implementation to instantiate (see Client Classes). Use endpoint_configs for multi-endpoint round-robin. In grouped scoring mode, groups are distributed round-robin across endpoint configs.
When api_key_var is "PRIME_API_KEY" (the default), credentials are loaded with the following precedence:
- API key:
PRIME_API_KEYenv var >~/.prime/config.json>"EMPTY" - Team ID:
PRIME_TEAM_IDenv var >~/.prime/config.json> not set
prime login.
EndpointClientConfig
ClientConfig.endpoint_configs. Has the same fields as ClientConfig except endpoint_configs itself, preventing recursive nesting.
EvalConfig
Endpoint
Endpoints maps an endpoint id to one or more endpoint variants. A single variant is represented as a one-item list.
Prime CLI Plugin
Verifiers exposes a plugin contract consumed byprime for command execution.
PRIME_PLUGIN_API_VERSION
prime and verifiers.
PrimeCLIPlugin
build_module_command returns a subprocess command list for python -m <module> ....
get_plugin
prime.
Decorators
@vf.stop
is_completed().
@vf.cleanup
@vf.teardown
Utility Functions
Data Utilities
\boxed{} format.
#### marker (GSM8K format).
Environment Utilities
"primeintellect/gsm8k").
Configuration Utilities
MissingKeyError (a ValueError subclass) with a clear message listing all missing keys and instructions for setting them.
Logging Utilities
VF_LOG_LEVEL env var to change default.
vf.log_level("WARNING").