Environment Issues
ModuleNotFoundError: No module named 'verifiers'
ModuleNotFoundError: No module named 'verifiers'
The verifiers library is not installed in your project.Or if you’re installing a specific environment:
load_environment not found
load_environment not found
This usually means a module name collision — your environment name conflicts with an existing Python package, or the environment wasn’t installed correctly.Solutions:
- Rename your environment to avoid conflicts
- Reinstall:
prime env install my-env - Check that your environment module exposes a
load_environmentfunction at the top level
Environment not found on the Hub
Environment not found on the Hub
The environment ID doesn’t match any published environment.Make sure you’re using the correct
owner/name format and that you have access if it’s a private environment.MissingKeyError when loading environment
MissingKeyError when loading environment
The environment requires API keys that aren’t set. The error message will list which keys are missing.For local evaluation:For Hosted Training, add secrets to your config:Or use the CLI to manage secrets:
Training Issues
Reward is always 0.0
Reward is always 0.0
The task is too hard for the model at its current capability level. The model can’t solve any examples, so there’s no reward signal to learn from.Solutions:
- Try a larger or more capable model
- Use easier examples (filter your dataset or adjust environment args)
- Increase
max_tokensin[sampling]— the model may need more space to reason - Check your rubric implementation for bugs that might always return 0
- Run a baseline evaluation first:
prime eval run my-env -m <model> -n 20 -r 1
Reward is always 1.0
Reward is always 1.0
The task is too easy — the model already solves everything.Solutions:
- Use harder examples or a more challenging dataset split
- Add more demanding rubric criteria
- Use a smaller model that has more room to improve
Reward is not changing / training seems stuck
Reward is not changing / training seems stuck
There could be several causes:
- Low reward diversity: If all rollouts for an example get the same reward, there’s no contrast for the model to learn from. Increase
rollouts_per_example(16–32) to get more variation. - Learning rate too low: Try increasing
learning_rate(e.g., from1e-4to3e-4). - Batch size too small: Larger batches provide more stable gradient estimates. Try
batch_size = 512. - Task mismatch: The task may not be suitable for RL training. Ensure the reward function produces a meaningful gradient of scores, not just binary 0/1.
Pydantic validation error in config
Pydantic validation error in config
A field in your TOML config has the wrong type or an invalid value.Common causes:
- String values not quoted:
model = Qwen/Qwen3-4B→model = "Qwen/Qwen3-4B" - Integer where float expected or vice versa
- Missing required sections like
[sampling]or[[env]]
Model not available
Model not available
The model you specified isn’t currently supported for Hosted Training.The model list is subject to change during the beta period. See Models & Pricing for the current list.
CLI Issues
prime: command not found
prime: command not found
The CLI isn’t installed or isn’t on your PATH.If you installed with
uv tool install and it’s still not found, make sure ~/.local/bin is in your PATH.Authentication failed / not logged in
Authentication failed / not logged in
Your CLI session may have expired.This opens a browser window to re-authenticate.
prime rl run fails immediately
prime rl run fails immediately
Check the following:
- Your config file is valid TOML (no syntax errors)
- The model is available:
prime rl models - The environment ID is correct and accessible
- You’re authenticated:
prime login - Your CLI is up to date:
uv tool install -U prime
Evaluation Issues
Rate limit exceeded (429 errors)
Rate limit exceeded (429 errors)
Reduce concurrency when running evaluations:The
-c flag controls maximum concurrent requests. Lower it if you’re hitting rate limits.ThinkParser failures with Qwen3 models
ThinkParser failures with Qwen3 models
Qwen3 and DeepSeek-R1 models have chat templates that automatically remove
<think> tags from message history. This conflicts with ThinkParser.Solution: Use MaybeThinkParser or Parser instead of ThinkParser in your environment:Evaluation results look wrong or inconsistent
Evaluation results look wrong or inconsistent
- Check
rollouts_per_example: Low values (1–2) produce noisy results. Use at least 3–5 for reliable metrics. - Check
num_examples: Very small sample sizes can be misleading. - Check sampling temperature: High temperatures produce more variation between runs.
- Check your rubric: Make sure reward functions handle edge cases (empty responses, malformed outputs, etc.).
Environment Development Issues
Environment works locally but fails in Hosted Training
Environment works locally but fails in Hosted Training
Common causes:
- Missing dependencies: Make sure all required packages are listed in your environment’s
pyproject.toml - Missing secrets: API keys available locally may not be set for hosted runs. Use
env_fileorprime secrets. - Hardcoded paths: Avoid absolute file paths in your environment code
- Network access: Some external APIs may not be reachable from the hosted environment
How do I debug my environment?
How do I debug my environment?
Start with a local evaluation using verbose output:The
-v flag enables verbose logging. You can also test your environment directly in Python:Getting Help
If your issue isn’t covered here:- Discord: Join the Prime Intellect Discord for community support and Q&A
- Research Support: Fill out the research support form for hands-on assistance
- Feedback: Use the thumbs-up/down on any docs page to let us know what’s helpful or missing