Prerequisites
Before running hosted evaluations, ensure you have:- 
Published Environment: Your environment must be pushed to the Environment Hub
 - Environment Access: You must be the owner or have write permissions on the environment
 - Account Balance: Sufficient credits to cover inference costs (costs vary by model and usage)
 
Running a Hosted Evaluation
Step 1: Navigate to Your Environment
- Go to the Environment Hub
 - Find your environment (either in “My Environments” or via search)
 - Click on the environment to open its detail page
 - Navigate to the “Evaluations” tab
 
Step 2: Start New Evaluation
Click the “Run Hosted Evaluation” button to begin the evaluation wizard.If you don’t see this button, verify that you have write permissions on the environment.
Step 3: Select Model
The model selection page displays all available inference models.
Step 4: Configure Evaluation
Configure how your evaluation will run:Number of test cases to evaluate from your environment’s dataset.
Number of times to run inference on each example for statistical aggregation.
Optional key-value pairs passed to your environment during evaluation.
Environment Secrets: Any secrets configured in your environment settings will be automatically exposed during evaluation. You don’t need to pass them as arguments.

Step 5: Monitor Progress
After submitting, you’ll be redirected to the Evaluations list where you can monitor progress.
Step 6: View Results
Click on a completed evaluation to view detailed results: Metrics Tab:- Average reward/score across all examples
 - Total samples evaluated
 - Statistical aggregations (if multiple rollouts)
 - Model information and parameters used
 
- Individual test case results
 - Model inputs and outputs
 - Per-example scores
 - Grouped view for multiple rollouts (shows variance)
 

Failed Evaluations
When an evaluation fails, detailed debugging information is provided: Error Information:- Error message describing what went wrong
 - Full evaluation logs in a scrollable terminal view
 
- 
Environment Errors:
- Missing secrets
 - Bug in environment code
 - Missing dependencies
 - Incorrect verifier implementation
 
 - 
Timeout:
- Evaluation took longer than maximum allowed time (60 min)
 - Consider reducing examples or optimizing environment
 
 - 
Insufficient Balance:
- Not enough credits to complete evaluation
 - Add funds and re-run
 
 - 
Model API Errors:
- Temporary issues with inference service
 - Rate limiting
 - Try again or contact support
 
 
Pricing
Hosted evaluations use Prime Inference API under the hood. Costs are calculated based on:- Model Pricing: Each model has different rates (shown on model selection page)
 - Token Usage: Both prompt and completion tokens are counted
 - Number of Examples × Rollouts: Multiplies the total inference calls
 
Start with a small number of examples (5-10) to test your environment before running large-scale evaluations.