> ## Documentation Index
> Fetch the complete documentation index at: https://docs.primeintellect.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Evaluation

> Create a new evaluation

This endpoint supports:
- Environment evaluations: Provide environments
- Prime RL evaluations: Provide run_id
- Suite evaluations: Provide suite_id

Ownership:
- If team_id is provided in request, the evaluation will be owned by the team
- Otherwise, the evaluation will be owned by the authenticated user



## OpenAPI

````yaml https://api.primeintellect.ai/openapi.json post /api/v1/evaluations/
openapi: 3.1.0
info:
  title: PI API
  version: 0.1.0
servers:
  - url: https://api.primeintellect.ai
security: []
paths:
  /api/v1/evaluations/:
    post:
      tags:
        - evals
      summary: Create Evaluation
      description: >-
        Create a new evaluation


        This endpoint supports:

        - Environment evaluations: Provide environments

        - Prime RL evaluations: Provide run_id

        - Suite evaluations: Provide suite_id


        Ownership:

        - If team_id is provided in request, the evaluation will be owned by the
        team

        - Otherwise, the evaluation will be owned by the authenticated user
      operationId: create_evaluation_api_v1_evaluations__post
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateEvaluationRequest'
      responses:
        '201':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/CreateEvaluationResponse'
        '401':
          description: Authorization failed
        '422':
          description: Invalid request data
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
      security:
        - HTTPBearer: []
components:
  schemas:
    CreateEvaluationRequest:
      properties:
        name:
          type: string
          title: Name
          description: Name of the evaluation
        team_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Team Id
          description: Team ID if creating evaluation for a team
        environments:
          anyOf:
            - items:
                $ref: '#/components/schemas/EnvironmentReference'
              type: array
            - type: 'null'
          title: Environments
          description: List of environment references with optional version IDs
        suite_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Suite Id
          description: Suite ID if this evaluation is part of a suite
        run_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Run Id
          description: Run ID for Prime RL runs (optional)
        is_hosted:
          type: boolean
          title: Is Hosted
          description: Whether this is a hosted evaluation
          default: false
        inference_model:
          anyOf:
            - type: string
            - type: 'null'
          title: Inference Model
          description: Prime Inference model ID
        eval_config:
          anyOf:
            - $ref: '#/components/schemas/HostedEvalConfig'
            - type: 'null'
          description: Hosted evaluation configuration
        model_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Model Name
          description: Model name
        dataset:
          anyOf:
            - type: string
            - type: 'null'
          title: Dataset
          description: Dataset name
        framework:
          anyOf:
            - type: string
            - type: 'null'
          title: Framework
          description: Framework used (e.g., 'prime-rl', 'openai/evals')
        task_type:
          anyOf:
            - type: string
            - type: 'null'
          title: Task Type
          description: Type of task (e.g., 'classification', 'generation')
        description:
          anyOf:
            - type: string
            - type: 'null'
          title: Description
          description: Description of the evaluation
        tags:
          items:
            type: string
          type: array
          title: Tags
          description: Tags for categorization
          default: []
        metadata:
          anyOf:
            - type: object
            - type: 'null'
          title: Metadata
          description: Additional metadata
        metrics:
          anyOf:
            - type: object
            - type: 'null'
          title: Metrics
          description: High-level metrics summary
        is_public:
          type: boolean
          title: Is Public
          description: Whether this evaluation is publicly shareable by link
          default: false
        show_on_leaderboard:
          type: boolean
          title: Show On Leaderboard
          description: Whether this public evaluation appears on environment leaderboards
          default: false
      type: object
      required:
        - name
      title: CreateEvaluationRequest
      description: Request to create a new evaluation
    CreateEvaluationResponse:
      properties:
        evaluation_id:
          type: string
          title: Evaluation Id
          description: ID of the created evaluation
        name:
          type: string
          title: Name
        status:
          $ref: '#/components/schemas/EvaluationStatus'
        eval_type:
          $ref: '#/components/schemas/EvaluationType'
          description: 'Type: ''prime_rl'', ''environment'', or ''suite'''
        environment_ids:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Environment Ids
        suite_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Suite Id
        run_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Run Id
        version_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Version Id
        viewer_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Viewer Url
        created_at:
          type: string
          format: date-time
          title: Created At
      type: object
      required:
        - evaluation_id
        - name
        - status
        - eval_type
        - created_at
      title: CreateEvaluationResponse
      description: Response after creating an evaluation
    ErrorResponse:
      properties:
        errors:
          items:
            $ref: '#/components/schemas/ErrorDetail'
          type: array
          title: Errors
      type: object
      required:
        - errors
      title: ErrorResponse
    EnvironmentReference:
      properties:
        id:
          type: string
          title: Id
        version_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Version Id
      type: object
      required:
        - id
      title: EnvironmentReference
      description: Environment reference with optional version
    HostedEvalConfig:
      properties:
        num_examples:
          type: integer
          maximum: 9007199254740991
          minimum: -1
          title: Num Examples
          description: Number of examples to evaluate (-1 for all)
        rollouts_per_example:
          type: integer
          maximum: 2048
          minimum: 1
          title: Rollouts Per Example
          description: Rollouts per example
        env_args:
          anyOf:
            - type: object
            - type: 'null'
          title: Env Args
          description: Optional environment arguments to pass to the evaluation
        allow_sandbox_access:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Allow Sandbox Access
          description: Allow sandbox read/write access
          default: false
        allow_instances_access:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Allow Instances Access
          description: Allow instance creation and management access
          default: false
        allow_tunnel_access:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Allow Tunnel Access
          description: Allow tunnel creation and management access
          default: false
        timeout_minutes:
          anyOf:
            - type: integer
              maximum: 1440
              minimum: 120
            - type: 'null'
          title: Timeout Minutes
          description: >-
            Custom timeout in minutes for the hosted eval run (default: 1440 =
            24 hours, min: 120, max: 1440)
        custom_secrets:
          anyOf:
            - additionalProperties:
                type: string
              type: object
            - type: 'null'
          title: Custom Secrets
          description: >-
            Custom secrets to set in the evaluation sandbox (e.g., API keys,
            tokens)
        sampling_args:
          anyOf:
            - type: object
            - type: 'null'
          title: Sampling Args
          description: >-
            Optional sampling arguments forwarded to `prime eval run
            --sampling-args`
        max_concurrent:
          anyOf:
            - type: integer
              maximum: 9007199254740991
              minimum: 1
            - type: 'null'
          title: Max Concurrent
          description: >-
            Optional max concurrency forwarded to `prime eval run
            --max-concurrent`
        max_retries:
          anyOf:
            - type: integer
              maximum: 9007199254740991
              minimum: 0
            - type: 'null'
          title: Max Retries
          description: Optional max retries forwarded to `prime eval run --max-retries`
        state_columns:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: State Columns
          description: Optional state columns forwarded to `prime eval run --state-columns`
        independent_scoring:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Independent Scoring
          description: Forward `--independent-scoring` to the hosted eval runner
          default: false
        verbose:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Verbose
          description: Forward `--verbose` to the hosted eval runner
          default: false
        headers:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Headers
          description: Optional repeated headers forwarded to `prime eval run --header`
        extra_env_kwargs:
          anyOf:
            - type: object
            - type: 'null'
          title: Extra Env Kwargs
          description: >-
            Optional environment constructor kwargs forwarded to `prime eval run
            --extra-env-kwargs`
        api_client_type:
          anyOf:
            - type: string
            - type: 'null'
          title: Api Client Type
          description: >-
            Optional API client type forwarded to `prime eval run
            --api-client-type`
        api_base_url:
          anyOf:
            - type: string
            - type: 'null'
          title: Api Base Url
          description: >-
            Optional inference base URL forwarded to `prime eval run
            --api-base-url`
        api_key_var:
          anyOf:
            - type: string
            - type: 'null'
          title: Api Key Var
          description: Optional API key env var forwarded to `prime eval run --api-key-var`
      type: object
      required:
        - num_examples
        - rollouts_per_example
      title: HostedEvalConfig
      description: Hosted evaluation configuration
    EvaluationStatus:
      type: string
      enum:
        - PENDING
        - RUNNING
        - PROCESSING
        - COMPLETED
        - FAILED
        - TIMEOUT
        - CANCELLED
      title: EvaluationStatus
      description: Evaluation status enum
    EvaluationType:
      type: string
      enum:
        - suite
        - training
        - environment
      title: EvaluationType
      description: Evaluation type enum
    ErrorDetail:
      properties:
        param:
          type: string
          title: Param
        details:
          type: string
          title: Details
      type: object
      required:
        - param
        - details
      title: ErrorDetail
  securitySchemes:
    HTTPBearer:
      type: http
      scheme: bearer

````