Getting Started
How do I quickly test my environment?
Usevf-eval with a small sample:
-s flag prints sample outputs so you can see what’s happening.
How do I see what the model is outputting?
If usingvf-eval: Results are saved automatically. Browse them interactively with:
env.generate() / env.evaluate()):
How do I enable debug logging?
Set theVF_LOG_LEVEL environment variable:
Environments
Which environment class should I use?
- SingleTurnEnv: One prompt, one response (Q&A, classification)
- MultiTurnEnv: Custom back-and-forth interaction (games, simulations)
- ToolEnv: Model calls Python functions (search, calculator)
- StatefulToolEnv: Tools that need per-rollout state (sandbox IDs, sessions)
What does max_turns=-1 mean?
Unlimited turns. The rollout continues until a stop condition is triggered (e.g., model stops calling tools, or a custom condition you define).
How do I add a custom stop condition?
Use the@vf.stop decorator on a method that returns True to end the rollout:
How do I handle tool call errors gracefully?
InToolEnv, customize error handling:
Reward Functions
What arguments can my reward function receive?
Reward functions receive any of these via**kwargs:
completion- the model’s responseanswer- ground truth from datasetprompt- the input promptstate- full rollout stateparser- the rubric’s parser (if set)task- task identifierinfo- metadata dict from dataset
How do group reward functions work?
Group reward functions receive plural arguments (completions, answers, states) and return a list of floats. They’re detected automatically by parameter names:
Training
What’s the difference between prime-rl and vf-rl?
- prime-rl: Production-ready, multi-node, MoE support, advanced features. Use for serious training.
- vf-rl: Minimal (~1000 LOC), single-node, hackable. Use for small-scale testing or as a starting point for your own training loop.