GitHub Repository
Verifiers Repo
- A dataset of task inputs
- A harness for the model (tools, sandboxes, context management, etc.)
- A reward function or rubric to score the model’s performance
Getting Started
Ensure you haveuv installed, as well as the prime CLI tool:
uv init), installs verifiers (with uv add verifiers), creates the recommended workspace structure, and downloads useful starter files:
verifiers to an existing project:
my_env with a runnable environment
template. For v1 templates, start by editing the generated TasksetConfig,
Taskset.load_tasks(), and @vf.reward methods. Use --with-harness when the
environment also owns reusable execution behavior.
load_environment function which returns an
environment object. For simple legacy environments, this can still be a direct
constructor:
load_environment typed as vf.EnvConfig, put task settings on
TasksetConfig, and add load_harness(config: MyHarnessConfig) only when the
environment owns a reusable harness.
Reusable v1 taskset and harness packages live in tasksets and harnesses.
Install them with uv add "verifiers[packages]", or with the narrower
verifiers[tasksets], verifiers[harnesses], and backend-specific extras such
as verifiers[nemogym]. For example, Harbor task directories can run through
the bundled OpenCode CLI harness with:
Taskset and Harness classes can also be constructed directly in
ordinary Python code; loaders are the environment/config composition path.
NeMoGymTaskset and NeMoGymHarness expose packaged NeMo Gym rows and rollout
collection through the same taskset/harness composition boundary.
To run a local evaluation with any OpenAI-compatible model, do:
./configs/endpoints.toml.
View local evaluation results in the terminal UI:
environment -> model -> run). Press Enter on a run to open rollout details, b to go back, tab to cycle panes, e and x to expand or collapse history, pageup and pagedown to scroll history, and c for Copy Mode.
To publish the environment to the Environments Hub, do: