Development

This guide covers setup, testing, and contributing to the verifiers package.

Setup
Project Structure
Running Tests
Writing Tests
Contributing
Common Issues
Environment Development
Quick Reference

Setup

Prerequisites

Python 3.10, 3.11, 3.12, or 3.13
uv package manager

Installation

# Clone and install for development
git clone https://github.com/PrimeIntellect-ai/verifiers.git
cd verifiers

# CPU-only development:
uv sync

# GPU-based trainer development:
uv sync --all-extras

# Install pre-commit hooks:
uv run pre-commit install

Project Structure

verifiers/
├── verifiers/          # Main package
│   ├── envs/           # Environment classes
│   │   ├── integrations/   # Third-party wrappers (TextArena, ReasoningGym)
│   │   └── experimental/   # Newer environments (MCP, Harbor, etc.)
│   ├── parsers/        # Parser classes  
│   ├── rubrics/        # Rubric classes
│   ├── rl/             # Training infrastructure
│   │   ├── inference/  # vLLM server utilities
│   │   └── trainer/    # RLTrainer implementation
│   ├── scripts/        # CLI entry points
│   └── utils/          # Utilities
├── environments/       # Installable environment modules
├── configs/            # Example training configurations
├── tests/              # Test suite
└── docs/               # Documentation

Running Tests

# Run all tests
uv run pytest tests/

# Run with coverage
uv run pytest tests/ --cov=verifiers --cov-report=html

# Run specific test file
uv run pytest tests/test_parser.py

# Stop on first failure with verbose output
uv run pytest tests/ -xvs

# Run tests matching a pattern
uv run pytest tests/ -k "xml_parser"

# Run environment tests
uv run pytest tests/test_envs.py -vv

# Run environment tests across all CPU cores
uv run pytest -n auto tests/test_envs.py -vv

# Run specific environment tests
uv run pytest tests/test_envs.py -k math_python

The test suite includes 380+ tests covering parsers, rubrics, environments, and utilities.

Writing Tests

Test Structure

class TestFeature:
    """Test the feature functionality."""
    
    def test_basic_functionality(self):
        """Test normal operation."""
        # Arrange
        feature = Feature()
        
        # Act
        result = feature.process("input")
        
        # Assert
        assert result == "expected"
    
    def test_error_handling(self):
        """Test error cases."""
        with pytest.raises(ValueError):
            Feature().process(invalid_input)

Using Mocks

The test suite provides mock OpenAI clients:

from tests.mock_openai_client import MockOpenAIClient

def test_with_mock(mock_client):
    env = vf.SingleTurnEnv(client=mock_client)
    # Test without real API calls

Guidelines

Test both success and failure cases
Use descriptive test names that explain what’s being tested
Leverage existing fixtures from conftest.py
Group related tests in test classes
Keep tests fast - use mocks instead of real API calls

Contributing

Workflow

Fork the repository
Create a feature branch: git checkout -b feature-name
Make changes following existing patterns
Add tests for new functionality
Run tests: uv run pytest tests/
Run linting: uv run ruff check --fix .
Update docs if adding/changing public APIs
Submit PR with clear description

Code Style

Strict ruff enforcement - all PRs must pass ruff check --fix .
Use type hints for function parameters and returns
Write docstrings for public functions/classes
Keep functions focused and modular
Fail fast, fail loud - no defensive programming or silent fallbacks

PR Checklist

Tests pass locally (uv run pytest tests/)
Linting passes (uv run ruff check --fix .)
Pre-commit hooks pass (uv run pre-commit run --all-files)
Added tests for new functionality
Updated documentation if needed

Common Issues

Import Errors

# Ensure package is installed in development mode
uv sync

Integration Tests

# Install optional dependencies for specific integrations
uv sync --extra ta   # for TextArenaEnv
uv sync --extra rg   # for ReasoningGymEnv

Test Failures

# Debug specific test
uv run pytest tests/test_file.py::test_name -vvs --pdb

Environment Development

Creating a New Environment Module

# Initialize template
vf-init my-environment

# Install locally for testing
vf-install my-environment

# Test your environment
vf-eval my-environment -m gpt-4.1-mini -n 5

Environment Module Structure

# my_environment.py
import verifiers as vf

def load_environment(**kwargs):
    """Load the environment."""
    dataset = vf.load_example_dataset("dataset_name")
    parser = vf.XMLParser(fields=["reasoning", "answer"])
    
    def reward_func(parser, completion, answer, **kwargs):
        return 1.0 if parser.parse_answer(completion) == answer else 0.0
    
    rubric = vf.Rubric(
        funcs=[reward_func, parser.get_format_reward_func()],
        weights=[1.0, 0.2],
        parser=parser
    )
    
    return vf.SingleTurnEnv(
        dataset=dataset,
        parser=parser,
        rubric=rubric,
        **kwargs
    )

Quick Reference

Essential Commands

# Development setup
uv sync                               # CPU-only
uv sync --all-extras                  # With RL/training extras

# Run tests
uv run pytest tests/                  # All tests
uv run pytest tests/ -xvs             # Debug mode
uv run pytest tests/ --cov=verifiers  # With coverage

# Run environment tests
uv run pytest tests/test_envs.py -vv              # All environments
uv run pytest tests/test_envs.py -k math_python   # Specific environment

# Linting
uv run ruff check --fix .             # Fix lint errors
uv run pre-commit run --all-files     # Run all pre-commit hooks

# Environment tools
vf-init new-env                       # Create environment
vf-install new-env                    # Install environment
vf-eval new-env -m gpt-4.1-mini -n 5  # Test environment
vf-tui                                # Browse eval results

CLI Tools

Command	Description
`vf-eval`	Run evaluations on environments
`vf-init`	Initialize new environment from template
`vf-install`	Install environment module
`vf-setup`	Set up training workspace
`vf-rl`	Run vf.RLTrainer
`vf-train`	Run SFT training
`vf-tui`	Terminal UI for browsing eval results
`prime-rl`	Launch prime-rl training

Project Guidelines

Environments: Installable modules with load_environment() function
Parsers: Extract structured data from model outputs
Rubrics: Define multi-criteria evaluation functions
Tests: Comprehensive coverage with mocks for external dependencies

Verifiers

Table of Contents

Setup

Prerequisites

Installation

Project Structure

Running Tests

Writing Tests

Test Structure

Using Mocks

Guidelines

Contributing

Workflow

Code Style

PR Checklist

Common Issues

Import Errors

Integration Tests

Test Failures

Environment Development

Creating a New Environment Module

Environment Module Structure

Quick Reference

Essential Commands

CLI Tools

Project Guidelines

Verifiers

​Table of Contents

​Setup

​Prerequisites

​Installation

​Project Structure

​Running Tests

​Writing Tests

​Test Structure

​Using Mocks

​Guidelines

​Contributing

​Workflow

​Code Style

​PR Checklist

​Common Issues

​Import Errors

​Integration Tests

​Test Failures

​Environment Development

​Creating a New Environment Module

​Environment Module Structure

​Quick Reference

​Essential Commands

​CLI Tools

​Project Guidelines

Table of Contents

Setup

Prerequisites

Installation

Project Structure

Running Tests

Writing Tests

Test Structure

Using Mocks

Guidelines

Contributing

Workflow

Code Style

PR Checklist

Common Issues

Import Errors

Integration Tests

Test Failures

Environment Development

Creating a New Environment Module

Environment Module Structure

Quick Reference

Essential Commands

CLI Tools

Project Guidelines