Advanced Rubrics
Beyond basic reward functions, Verifiers provides specialized rubric types for complex evaluation scenarios.JudgeRubric: LLM-Based Evaluation
Use language models to evaluate responses when rule-based scoring is insufficient:JudgeRubric avoids making redundant API requests by caching the judge’s response within the state dictionary for each rollout.
To make this caching effective, JudgeRubric defaults to parallelize_scoring=False. This forces its reward functions to run sequentially, ensuring that the first function makes the API call and populates the cache, while subsequent functions get an instant cache hit.
Example: Multi-Step Math with Judge Evaluation
RubricGroup: Combining Multiple Rubrics
Aggregate scores from different rubrics:ToolRubric: Tracking Tool Usage
Count total and per-tool calls during a rollout. Pass your tool functions to enable per-tool counters. By default, counts are added as metrics with zero weight; adjustreward_weights if you want the counts to affect reward.
Rubric.class_objects: Passing Objects to Reward Functions
Theclass_objects pattern allows you to pass class instances or other objects directly to your reward functions. This is especially useful when your reward functions need access to parsers, clients, or other stateful objects.
How it works:
When a rubric calls a reward function, it automatically merges self.class_objects with the standard arguments (prompt, completion, answer, state, etc.). Your reward functions can then accept these objects as parameters.
Basic Example:
JudgeRubric demonstrates this pattern:
- Include all necessary objects: Add any parsers, clients, or helpers your reward functions need
- Use descriptive names: Choose clear names for your class_objects keys
- Document dependencies: Make it clear which reward functions use which objects
- Handle missing objects gracefully: Use
**kwargsand check for object availability
Tools
Verifiers provides native support for tool calling, leveraging models’ built-in function calling capabilities.Defining Tools
Tools are simple Python functions with type hints, and can be either sync or async:Using ToolEnv
ToolEnv automatically converts Python functions to tool schemas and handles tool calling:role: tool messages as tool outputs. It does NOT impose any XML structure or require hardcoded patterns.
Tool Design Best Practices
- Clear Signatures: Use descriptive names and type hints
- Comprehensive Docstrings: Models use these to understand tool purpose
- Error Handling: Return helpful error messages, don’t raise exceptions
- Timeouts: Add timeouts for long-running operations
- Input Validation: Validate and sanitize inputs
Complex Tool Examples
For more sophisticated tool setups, see thewiki_search environment in the repository, which demonstrates:
- Multiple interdependent tools
- State management across tool calls
- Sophisticated error handling
- Tool usage optimization
Parsers
Parsers extract structured information from model outputs. While many tasks work with raw text, parsers help when you need specific formats.Built-in Parsers
XMLParser
Extract XML-tagged content:ThinkParser
Separate reasoning from final answers:Custom Parser Patterns
Create domain-specific parsers by extending the base class: Example: Code Block ParserParser Integration
Parsers integrate seamlessly with environments and rubrics:Practical Examples
Interactive Game Environment
Build a Wordle-like game with multi-turn interaction:Training Data Generation
Generate training data using environment rollouts:Environment Composition
Build complex environments from simpler ones:Best Practices
For Rubrics
- Start simple with basic reward functions
- Use JudgeRubric when rule-based evaluation is insufficient
- Combine rubrics with RubricGroup for multi-faceted evaluation
- Test reward functions thoroughly with edge cases
For Tools
- Keep tool functions simple and focused
- Use clear names and comprehensive docstrings
- Handle errors gracefully - return messages, don’t raise
- Add timeouts for external operations
- Let the model’s chat template handle tool calling format
For Parsers
- Use built-in parsers when they fit your needs
- Create custom parsers for domain-specific formats
- Always handle parsing failures gracefully
- Consider providing format rewards to guide model output
Next Steps
- Build your own environments using these components in Environments
- Train models with your environments in Training
- Understand the type system in Type Reference