Hosted Evaluations

Hosted Evaluations allow you to run environment evaluations directly on the Environments Hub interface, providing a convenient alternative to CLI-based evaluations. The platform handles all the infrastructure, automatically provisions compute resources, and stores results in your environment’s evaluation history.

Prerequisites

Before running hosted evaluations, ensure you have:

Published Environment: Your environment must be pushed to the Environments Hub
```
prime env push
```
Environment Access: You must be the owner or have write permissions on the environment
Account Balance: Sufficient credits to cover inference costs (costs vary by model and usage)

Running a Hosted Evaluation

Step 1: Navigate to Your Environment

Go to the Environments Hub
Find your environment (either in “My Environments” or via search)
Click on the environment to open its detail page
Navigate to the “Evaluations” tab

Step 2: Start New Evaluation

Click the “Run Hosted Evaluation” button to begin the evaluation wizard.

If you don’t see this button, verify that you have write permissions on the environment.

Step 3: Select Model

The model selection page displays all available inference models.

Model selection interface showing various inference models with pricing

Step 4: Configure Evaluation

Configure how your evaluation will run:

Number of Examples

integer

required

Number of test cases to evaluate from your environment’s dataset.

Rollouts per Example

integer

required

Number of times to run inference on each example for statistical aggregation.

Environment Arguments

object

Optional key-value pairs passed to your environment during evaluation.

Environment Secrets: Any secrets configured in your environment settings will be automatically exposed during evaluation. You don’t need to pass them as arguments. For global secret linking and precedence rules, see Secrets.

Click “Run Evaluation” to submit your evaluation job.

Configuration page with evaluation parameters and summary panel

Step 5: Monitor Progress

After submitting, you’ll be redirected to the Evaluations list where you can monitor progress.

Evaluations list showing multiple runs with different statuses

Step 6: View Results

Click on a completed evaluation to view detailed results: Metrics Tab:

Average reward/score across all examples
Total samples evaluated
Statistical aggregations (if multiple rollouts)
Model information and parameters used

Examples Tab:

Individual test case results
Model inputs and outputs
Per-example scores
Grouped view for multiple rollouts (shows variance)

Results page displaying metrics and detailed evaluation statistics

Failed Evaluations

When an evaluation fails, detailed debugging information is provided: Error Information:

Error message describing what went wrong
Full evaluation logs in a scrollable terminal view

Common Failure Reasons:

Environment Errors:
- Missing secrets
- Bug in environment code
- Missing dependencies
- Incorrect verifier implementation
Timeout:
- Evaluation took longer than maximum allowed time (60 min)
- Consider reducing examples or optimizing environment
Insufficient Balance:
- Not enough credits to complete evaluation
- Add funds and re-run
Model API Errors:
- Temporary issues with inference service
- Rate limiting
- Try again or contact support

Pricing

Hosted evaluations use Prime Inference API under the hood. Costs are calculated based on:

Model Pricing: Each model has different rates (shown on model selection page)
Token Usage: Both prompt and completion tokens are counted
Number of Examples × Rollouts: Multiplies the total inference calls

Start with a small number of examples (5-10) to test your environment before running large-scale evaluations.

Getting Started

Lab

Libraries

Compute

Prerequisites

Running a Hosted Evaluation

Step 1: Navigate to Your Environment

Step 2: Start New Evaluation

Step 3: Select Model

Step 4: Configure Evaluation

Step 5: Monitor Progress

Step 6: View Results

Failed Evaluations

Pricing

Getting Started

Lab

Libraries

Compute

​Prerequisites

​Running a Hosted Evaluation

​Step 1: Navigate to Your Environment

​Step 2: Start New Evaluation

​Step 3: Select Model

​Step 4: Configure Evaluation

​Step 5: Monitor Progress

​Step 6: View Results

​Failed Evaluations

​Pricing

Prerequisites

Running a Hosted Evaluation

Step 1: Navigate to Your Environment

Step 2: Start New Evaluation

Step 3: Select Model

Step 4: Configure Evaluation

Step 5: Monitor Progress

Step 6: View Results

Failed Evaluations

Pricing