Inference (Closed Beta)

Prime Intellect Inference provides OpenAI-compatible API access to state-of-the-art language models. Our inference service routes requests to various model providers, offering flexible model selection made for running large scale evaluations.

Inference API is currently in closed beta. Features and model availability may change as we continue to improve the service.

Getting Started

1. Get Your API Key

First, obtain your API key from the Prime Intellect Platform:

Navigate to your account settings
Go to the API Keys section
Generate a new API key for inference access

2. Set Up Authentication

Set your API key as an environment variable:

export PRIME_API_KEY="your-api-key-here"

3. Access through the CLI or API

You can use Prime Inference in two ways:

Prime CLI (Recommended for Evaluations)

The Prime CLI provides easy access to inference models, especially useful for running evaluations:

# List available models
prime inference models

# Use with environment evaluations (most common use case)
prime env eval gsm8k -m meta-llama/llama-3.1-70b-instruct -n 25

For evaluations: See Environment Evaluations guide for comprehensive examples and best practices regarding evaluations.

Direct API Access (OpenAI-Compatible)

import openai

# Configure the client to use Prime Inference
client = openai.OpenAI(
    api_key=os.environ.get("PRIME_API_KEY"),
    base_url="https://api.pinference.ai/api/v1"
)

# Make a chat completion request
response = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[
        {"role": "user", "content": "What is Prime Intellect?"}
    ]
)

print(response.choices[0].message.content)

Available Models

Prime Inference provides access to various state-of-the-art language models. You can list all available models using the models endpoint:

Get All Available Models

# List all available models
prime inference models

Using the Inference Endpoint

Basic Chat Completion

response = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    max_tokens=500,
    temperature=0.7
)

Streaming Responses

For real-time applications, use streaming to receive responses as they’re generated:

stream = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[
        {"role": "user", "content": "Write a short story about a robot."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Advanced Parameters

Prime Inference supports all standard OpenAI API parameters:

response = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[{"role": "user", "content": "Your prompt here"}],

    # Generation parameters
    max_tokens=1000,
    temperature=0.8,
    top_p=0.9,
    frequency_penalty=0.1,
    presence_penalty=0.1,

    # Advanced options
    stream=False,
    stop=["END", "\n\n"],
    logprobs=True,
    top_logprobs=3
)

Pricing and Billing

Prime Inference uses token-based pricing with competitive rates:

Input tokens: Charged for tokens in your prompt
Output tokens: Charged for tokens in the model’s response
Billing: Automatic deduction from your Prime Intellect account balance

Pricing varies by model. We will provide more details on pricing soon and make it available through the models API.

Support and Feedback

Since Inference API is in closed beta, we welcome your feedback:

Issues: Report bugs or request features on GitHub
Discord: Join our community Discord for support
Email: Contact us at contact@primeintellect.ai

API Reference

The Prime Intellect Inference API provides OpenAI-compatible endpoints:

Available Endpoints

GET /models - List Available Models

Returns a list of all available models that you can use for inference requests.Response:

{
  "object": "list",
  "data": [
    {
      "id": "meta-llama/llama-3.1-70b-instruct",
      "object": "model",
      "owned_by": "meta",
      "created": 1693721698
    }
  ]
}

GET /models/{model_id} - Get Model Details

Retrieves detailed information about a specific model.Parameters:

model_id (path): The ID of the model to retrieve

Response:

{
  "id": "meta-llama/llama-3.1-70b-instruct",
  "object": "model",
  "owned_by": "meta",
  "created": 1693721698
}

POST /chat/completions - Create Chat Completion

Creates a model response for the given chat conversation.Request Body:

{
  "model": "meta-llama/llama-3.1-70b-instruct",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ],
  "max_tokens": 100,
  "temperature": 0.7,
  "stream": false
}

Response:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1693721698,
  "model": "meta-llama/llama-3.1-70b-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 9,
    "total_tokens": 17
  }
}

Interactive API Documentation: Full interactive API documentation with request/response examples will be available soon. The current endpoints are fully compatible with OpenAI’s API format.

Next Steps

Environment Evaluations

Primary use case: Learn how to run model evaluations using prime env eval with inference models

Inference API Reference

Detailed documentation for models and chat completion endpoints

Prime CLI Setup

Install and configure the Prime CLI for easy access to inference models

Compute API

Manage GPU instances and compute resources

Getting Started

On-Demand Cloud

Storage

Multi-Node Clusters

Environment Hub

Reinforcement Fine-Tuning

Sandboxes

Community Pools

Inference (Closed Beta)

Getting Started

1. Get Your API Key

2. Set Up Authentication

3. Access through the CLI or API

Prime CLI (Recommended for Evaluations)

Direct API Access (OpenAI-Compatible)

Available Models

Get All Available Models

Using the Inference Endpoint

Basic Chat Completion

Streaming Responses

Advanced Parameters

Pricing and Billing

Support and Feedback

API Reference

Available Endpoints

Next Steps

Environment Evaluations

Inference API Reference

Prime CLI Setup

Compute API

Getting Started

On-Demand Cloud

Storage

Multi-Node Clusters

Environment Hub

Inference (Closed Beta)

Reinforcement Fine-Tuning

Sandboxes

Community Pools

​Getting Started

​1. Get Your API Key

​2. Set Up Authentication

​3. Access through the CLI or API

​Prime CLI (Recommended for Evaluations)

​Direct API Access (OpenAI-Compatible)

​Available Models

​Get All Available Models

​Using the Inference Endpoint

​Basic Chat Completion

​Streaming Responses

​Advanced Parameters

​Pricing and Billing

​Support and Feedback

​API Reference

​Available Endpoints

​Next Steps

Environment Evaluations

Inference API Reference

Prime CLI Setup

Compute API

Getting Started

1. Get Your API Key

2. Set Up Authentication

3. Access through the CLI or API

Prime CLI (Recommended for Evaluations)

Direct API Access (OpenAI-Compatible)

Available Models

Get All Available Models

Using the Inference Endpoint

Basic Chat Completion

Streaming Responses

Advanced Parameters

Pricing and Billing

Support and Feedback

API Reference

Available Endpoints

Next Steps