Table of Contents
Type Aliases
Messages
ChatMessage
role, content, and optional tool_calls / tool_call_id fields.
Info
SamplingArgs
temperature, top_p, max_tokens).
RewardFunc
ModelResponse
Data Types
State
dict subclass that tracks rollout information. Accessing keys in INPUT_FIELDS automatically forwards to the nested input object.
Fields set during initialization:
| Field | Type | Description |
|---|---|---|
input | RolloutInput | Nested input data |
client | AsyncOpenAI | OpenAI client |
model | str | Model name |
sampling_args | SamplingArgs | None | Generation parameters |
is_completed | bool | Whether rollout has ended |
is_truncated | bool | Whether generation was truncated |
oai_tools | list[ChatCompletionToolParam] | Available tools |
trajectory | list[TrajectoryStep] | Multi-turn trajectory |
trajectory_id | str | UUID for this rollout |
timing | RolloutTiming | Timing information |
| Field | Type | Description |
|---|---|---|
completion | Messages | None | Final completion |
reward | float | None | Final reward |
advantage | float | None | Advantage over group mean |
metrics | dict[str, float] | None | Per-function metrics |
stop_condition | str | None | Name of triggered stop condition |
error | Error | None | Error if rollout failed |
RolloutInput
TrajectoryStep
TrajectoryStepTokens
RolloutTiming
GenerateOutputs
Environment.generate().
GenerateMetadata
RolloutScore / RolloutScores
ProcessedOutputs
Classes
Environment Classes
Environment
| Method | Returns | Description |
|---|---|---|
generate(inputs, client, model, ...) | GenerateOutputs | Run rollouts asynchronously |
generate_sync(inputs, client, ...) | GenerateOutputs | Synchronous wrapper |
evaluate(client, model, ...) | GenerateOutputs | Evaluate on eval_dataset |
evaluate_sync(client, model, ...) | GenerateOutputs | Synchronous evaluation |
| Method | Returns | Description |
|---|---|---|
get_dataset(n=-1, seed=None) | Dataset | Get training dataset (optionally first n, shuffled) |
get_eval_dataset(n=-1, seed=None) | Dataset | Get evaluation dataset |
make_dataset(...) | Dataset | Static method to create dataset from inputs |
| Method | Returns | Description |
|---|---|---|
rollout(input, client, model, sampling_args) | State | Abstract: run single rollout |
init_state(input, client, model, sampling_args) | State | Create initial state from input |
get_model_response(state, prompt, ...) | ModelResponse | Get model response for prompt |
is_completed(state) | bool | Check all stop conditions |
run_rollout(sem, input, client, model, sampling_args) | State | Run rollout with semaphore |
run_group(group_inputs, client, model, ...) | list[State] | Generate and score one group |
| Method | Description |
|---|---|
set_kwargs(**kwargs) | Set attributes using setter methods when available |
add_rubric(rubric) | Add or merge rubric |
set_max_seq_len(max_seq_len) | Set maximum sequence length |
set_interleaved_rollouts(bool) | Enable/disable interleaved rollouts |
set_score_rollouts(bool) | Enable/disable scoring |
SingleTurnEnv
Single-response Q&A tasks. Inherits fromEnvironment.
MultiTurnEnv
env_response.
Abstract method:
has_error, prompt_too_long, max_turns_reached, has_final_env_response
Hooks:
| Method | Description |
|---|---|
setup_state(state) | Initialize per-rollout state |
get_prompt_messages(state) | Customize prompt construction |
render_completion(state) | Customize completion rendering |
add_trajectory_step(state, step) | Customize trajectory handling |
ToolEnv
no_tools_called (ends when model responds without tool calls)
Methods:
| Method | Description |
|---|---|
add_tool(tool) | Add a tool at runtime |
remove_tool(tool) | Remove a tool at runtime |
call_tool(name, args, id) | Override to customize tool execution |
StatefulToolEnv
Tools requiring per-rollout state. Overridesetup_state and update_tool_args to inject state.
SandboxEnv
Sandboxed container execution usingprime sandboxes.
PythonEnv
Persistent Python REPL in sandbox. ExtendsSandboxEnv.
EnvGroup
Parser Classes
Parser
XMLParser
| Method | Returns | Description |
|---|---|---|
parse(text) | SimpleNamespace | Parse XML into object with field attributes |
parse_answer(completion) | str | None | Extract answer field from completion |
get_format_str() | str | Get format description string |
get_fields() | list[str] | Get canonical field names |
format(**kwargs) | str | Format kwargs into XML string |
ThinkParser
</think> tag. For models that always include <think> tags but don’t parse them automatically.
MaybeThinkParser
Handles optional<think> tags (for models that may or may not think).
Rubric Classes
Rubric
1.0. Functions with weight=0.0 are tracked as metrics only.
Methods:
| Method | Description |
|---|---|
add_reward_func(func, weight=1.0) | Add a reward function |
add_metric(func, weight=0.0) | Add a metric (no reward contribution) |
add_class_object(name, obj) | Add object accessible in reward functions |
JudgeRubric
LLM-as-judge evaluation.MathRubric
Math-specific evaluation usingmath-verify.
RubricGroup
Combines rubrics forEnvGroup.
Configuration Types
ClientConfig
EvalConfig
Endpoint
Decorators
@vf.stop
is_completed().
@vf.cleanup
@vf.teardown
Utility Functions
Data Utilities
\boxed{} format.
#### marker (GSM8K format).
Environment Utilities
"primeintellect/gsm8k").
Logging Utilities
VF_LOG_LEVEL env var to change default.