Skip to main content
Prime Intellect Inference provides OpenAI-compatible API access to state-of-the-art language models. Our inference service routes requests to various model providers, offering flexible model selection made for running large scale evaluations.
Inference API is currently in closed beta. Features and model availability may change as we continue to improve the service.

Getting Started

1. Get Your API Key

First, obtain your API key from the Prime Intellect Platform:
  1. Navigate to your account settings
  2. Go to the API Keys section
  3. Generate a new API key with Inference permission enabled
Make sure to select the Inference permission when creating your API key. Without this permission, your requests will fail with authentication errors.
Select Inference Permission

2. Set Up Authentication

Set your API key as an environment variable:
export PRIME_API_KEY="your-api-key-here"

3. Access through the CLI or API

You can use Prime Inference in two ways: The Prime CLI provides easy access to inference models, especially useful for running evaluations:
# List available models
prime inference models

# Use with environment evaluations (most common use case)
prime env eval gsm8k -m meta-llama/llama-3.1-70b-instruct -n 25
For evaluations: See Environment Evaluations guide for comprehensive examples and best practices regarding evaluations.

Direct API Access (OpenAI-Compatible)

Team accounts: Include the X-Prime-Team-ID header to use team credits instead of personal account. Find your team ID via prime teams list or on your Team Profile page.
import openai
import os

# Personal account
client = openai.OpenAI(
    api_key=os.environ.get("PRIME_API_KEY"),
    base_url="https://api.pinference.ai/api/v1"
)

# Team account (add X-Prime-Team-ID header)
client = openai.OpenAI(
    api_key=os.environ.get("PRIME_API_KEY"),
    base_url="https://api.pinference.ai/api/v1",
    default_headers={
        "X-Prime-Team-ID": "your-team-id-here"
    }
)

# Make a chat completion request
response = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[
        {"role": "user", "content": "What is Prime Intellect?"}
    ]
)

print(response.choices[0].message.content)

Available Models

Prime Inference provides access to various state-of-the-art language models. You can list all available models using the models endpoint:

Get All Available Models

# List all available models
prime inference models

Pricing and Billing

Prime Inference uses token-based pricing with competitive rates:
  • Input tokens: Charged for tokens in your prompt
  • Output tokens: Charged for tokens in the model’s response
  • Billing: Automatic deduction from your Prime Intellect account balance
Pricing varies by model. We will provide more details on pricing soon and make it available through the models API.

Viewing Your Inference Usage

Track your inference usage and billing on the Billing Dashboard under the Inference tab: Inference Billing

Next Steps

I