Create Evaluation

curl --request POST \
  --url https://api.primeintellect.ai/api/v1/evaluations/ \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "name": "<string>",
  "team_id": "<string>",
  "environments": [
    {
      "id": "<string>",
      "version_id": "<string>"
    }
  ],
  "suite_id": "<string>",
  "run_id": "<string>",
  "is_hosted": false,
  "inference_model": "<string>",
  "eval_config": {
    "num_examples": 4503599627370495,
    "rollouts_per_example": 1024,
    "env_args": {},
    "allow_sandbox_access": false,
    "allow_instances_access": false,
    "timeout_minutes": 780,
    "custom_secrets": {}
  },
  "model_name": "<string>",
  "dataset": "<string>",
  "framework": "<string>",
  "task_type": "<string>",
  "description": "<string>",
  "tags": [],
  "metadata": {},
  "metrics": {},
  "is_public": false
}
'

{
  "evaluation_id": "<string>",
  "name": "<string>",
  "status": "PENDING",
  "eval_type": "suite",
  "created_at": "2023-11-07T05:31:56Z",
  "environment_ids": [
    "<string>"
  ],
  "suite_id": "<string>",
  "run_id": "<string>",
  "version_id": "<string>"
}

POST

api

evaluations

Create Evaluation

curl --request POST \
  --url https://api.primeintellect.ai/api/v1/evaluations/ \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "name": "<string>",
  "team_id": "<string>",
  "environments": [
    {
      "id": "<string>",
      "version_id": "<string>"
    }
  ],
  "suite_id": "<string>",
  "run_id": "<string>",
  "is_hosted": false,
  "inference_model": "<string>",
  "eval_config": {
    "num_examples": 4503599627370495,
    "rollouts_per_example": 1024,
    "env_args": {},
    "allow_sandbox_access": false,
    "allow_instances_access": false,
    "timeout_minutes": 780,
    "custom_secrets": {}
  },
  "model_name": "<string>",
  "dataset": "<string>",
  "framework": "<string>",
  "task_type": "<string>",
  "description": "<string>",
  "tags": [],
  "metadata": {},
  "metrics": {},
  "is_public": false
}
'

{
  "evaluation_id": "<string>",
  "name": "<string>",
  "status": "PENDING",
  "eval_type": "suite",
  "created_at": "2023-11-07T05:31:56Z",
  "environment_ids": [
    "<string>"
  ],
  "suite_id": "<string>",
  "run_id": "<string>",
  "version_id": "<string>"
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Request to create a new evaluation

name

string

required

Name of the evaluation

team_id

string | null

Team ID if creating evaluation for a team

environments

EnvironmentReference · object[] | null

List of environment references with optional version IDs

Show child attributes

suite_id

string | null

Suite ID if this evaluation is part of a suite

run_id

string | null

Run ID for Prime RL runs (optional)

is_hosted

boolean

default:false

Whether this is a hosted evaluation

inference_model

string | null

Prime Inference model ID

eval_config

HostedEvalConfig · object

Hosted evaluation configuration

Show child attributes

model_name

string | null

Model name

dataset

string | null

Dataset name

framework

string | null

Framework used (e.g., 'prime-rl', 'openai/evals')

task_type

string | null

Type of task (e.g., 'classification', 'generation')

description

string | null

Description of the evaluation

Response

Successful Response

Response after creating an evaluation

evaluation_id

string

required

ID of the created evaluation

name

string

required

status

enum<string>

required

Evaluation status enum

Available options:

PENDING,

RUNNING,

PROCESSING,

COMPLETED,

FAILED,

TIMEOUT,

CANCELLED

eval_type

enum<string>

required

Type: 'prime_rl', 'environment', or 'suite'

Available options:

suite,

training,

environment

created_at

string<date-time>

required

environment_ids

string[] | null

suite_id

string | null

run_id

string | null

version_id

string | null

List Evaluations Bulk Delete Evaluations

⌘I

API Documentation

Compute API

Inference API

Availability

Disks

evals

FRP Plugin

Images

Pods

Sandbox

Secrets

SSH Keys

Template

Tunnel

user

Create Evaluation

Authorizations

Body

Response