Prerequisites
Before running hosted evaluations, ensure you have:-
Published Environment: Your environment must be pushed to the Environments Hub
- Environment Access: You must be the owner or have write permissions on the environment
- Account Balance: Sufficient credits to cover inference costs (costs vary by model and usage)
Running a Hosted Evaluation
Step 1: Navigate to Your Environment
- Go to the Environments Hub
- Find your environment (either in “My Environments” or via search)
- Click on the environment to open its detail page
- Navigate to the “Evaluations” tab
Step 2: Start New Evaluation
Click the “Run Hosted Evaluation” button to begin the evaluation wizard.If you don’t see this button, verify that you have write permissions on the environment.
Step 3: Select Model
The model selection page displays all available inference models.
Step 4: Configure Evaluation
Configure how your evaluation will run:Number of test cases to evaluate from your environment’s dataset.
Number of times to run inference on each example for statistical aggregation.
Optional key-value pairs passed to your environment during evaluation.
Environment Secrets: Any secrets configured in your environment settings will be automatically exposed during evaluation. You don’t need to pass them as arguments. For global secret linking and precedence rules, see Secrets.

Step 5: Monitor Progress
After submitting, you’ll be redirected to the Evaluations list where you can monitor progress.
Step 6: View Results
Click on a completed evaluation to view detailed results: Metrics Tab:- Average reward/score across all examples
- Total samples evaluated
- Statistical aggregations (if multiple rollouts)
- Model information and parameters used
- Individual test case results
- Model inputs and outputs
- Per-example scores
- Grouped view for multiple rollouts (shows variance)

Failed Evaluations
When an evaluation fails, detailed debugging information is provided: Error Information:- Error message describing what went wrong
- Full evaluation logs in a scrollable terminal view
-
Environment Errors:
- Missing secrets
- Bug in environment code
- Missing dependencies
- Incorrect verifier implementation
-
Timeout:
- Evaluation took longer than maximum allowed time (60 min)
- Consider reducing examples or optimizing environment
-
Insufficient Balance:
- Not enough credits to complete evaluation
- Add funds and re-run
-
Model API Errors:
- Temporary issues with inference service
- Rate limiting
- Try again or contact support
Pricing
Hosted evaluations use Prime Inference API under the hood. Costs are calculated based on:- Model Pricing: Each model has different rates (shown on model selection page)
- Token Usage: Both prompt and completion tokens are counted
- Number of Examples × Rollouts: Multiplies the total inference calls