Setup
Prerequisites
- Python 3.11 or 3.12
- uv package manager
Installation
Project Structure
Running Tests
Writing Tests
Test Structure
Using Mocks
The test suite provides mock OpenAI clients:Guidelines
- Test both success and failure cases
- Use descriptive test names that explain what’s being tested
- Leverage existing fixtures from
conftest.py - Group related tests in test classes
- Keep tests fast - use mocks instead of real API calls
Tip: When subclassingMultiTurnEnv, always callawait super().is_completed(...)(orawait self.max_turns_reached(state)) so shared guards—especially max turn limits—remain effective.
Contributing
Workflow
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make changes following existing patterns
- Add tests for new functionality
- Run tests:
uv run pytest tests/ - Update docs if adding/changing public APIs
- Submit PR with clear description
Code Style
- Follow existing conventions in the codebase
- Use type hints for function parameters and returns
- Write docstrings for public functions/classes
- Keep functions focused and modular
PR Checklist
- Tests pass locally
- Added tests for new functionality
- Updated documentation if needed
- No breaking changes (or clearly documented)
Common Issues
Import Errors
Async Test Issues
Test Failures
Environment Development
Creating a New Environment Module
Environment Module Structure
Quick Reference
Essential Commands
Project Guidelines
- Environments: Installable modules with
load_environment()function - Parsers: Extract structured data from model outputs
- Rubrics: Define multi-criteria evaluation functions
- Tests: Comprehensive coverage with mocks for external dependencies