Table of Contents
- Setup
- Project Structure
- Running Tests
- Writing Tests
- Contributing
- Common Issues
- Environment Development
- Quick Reference
Setup
Prerequisites
- Python 3.10, 3.11, 3.12, or 3.13
- uv package manager
Installation
Project Structure
Running Tests
Writing Tests
Test Structure
Using Mocks
The test suite provides mock OpenAI clients:Guidelines
- Test both success and failure cases
- Use descriptive test names that explain what’s being tested
- Leverage existing fixtures from
conftest.py - Group related tests in test classes
- Keep tests fast - use mocks instead of real API calls
Contributing
Workflow
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make changes following existing patterns
- Add tests for new functionality
- Run tests:
uv run pytest tests/ - Run linting:
uv run ruff check --fix . - Update docs if adding/changing public APIs
- Submit PR with clear description
Code Style
- Strict
ruffenforcement - all PRs must passruff check --fix . - Use type hints for function parameters and returns
- Write docstrings for public functions/classes
- Keep functions focused and modular
- Fail fast, fail loud - no defensive programming or silent fallbacks
PR Checklist
- Tests pass locally (
uv run pytest tests/) - Linting passes (
uv run ruff check --fix .) - Pre-commit hooks pass (
uv run pre-commit run --all-files) - Added tests for new functionality
- Updated documentation if needed
Common Issues
Import Errors
Integration Tests
Test Failures
Environment Development
Creating a New Environment Module
Environment Module Structure
Quick Reference
Essential Commands
CLI Tools
| Command | Description |
|---|---|
vf-eval | Run evaluations on environments |
vf-init | Initialize new environment from template |
vf-install | Install environment module |
vf-setup | Set up training workspace |
vf-rl | Run vf.RLTrainer |
vf-train | Run SFT training |
vf-tui | Terminal UI for browsing eval results |
prime-rl | Launch prime-rl training |
Project Guidelines
- Environments: Installable modules with
load_environment()function - Parsers: Extract structured data from model outputs
- Rubrics: Define multi-criteria evaluation functions
- Tests: Comprehensive coverage with mocks for external dependencies