> ## Documentation Index
> Fetch the complete documentation index at: https://docs.primeintellect.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat Completions

> Generate text responses using language models

Create model responses for chat conversations using OpenAI-compatible API.

## Base URL

```
https://api.pinference.ai/api/v1
```

## Authentication

All requests require a Bearer token in the Authorization header:

```bash theme={null}
Authorization: Bearer your_api_key
```

### Team Account Usage

<Warning>
  When using a team account, you must include the `X-Prime-Team-ID` header. Without this header, requests default to your personal account instead of your team account.
</Warning>

```bash theme={null}
X-Prime-Team-ID: your-team-id-here
```

Find your Team ID on your [Team's Profile page](https://app.primeintellect.ai/dashboard/team-profile).

## Create Chat Completion

Generate a response from a language model given a conversation history.

### Request

<CodeGroup>
  ```bash cURL theme={null}
  curl -X POST https://api.pinference.ai/api/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "meta-llama/llama-3.1-70b-instruct",
      "messages": [
        {"role": "user", "content": "What is the capital of France?"}
      ]
    }'

  # With team account (add X-Prime-Team-ID header)
  curl -X POST https://api.pinference.ai/api/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "X-Prime-Team-ID: your-team-id-here" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "meta-llama/llama-3.1-70b-instruct",
      "messages": [
        {"role": "user", "content": "What is the capital of France?"}
      ]
    }'
  ```

  ```python Python theme={null}
  import openai

  # Personal account
  client = openai.OpenAI(
      api_key="your-api-key-here",
      base_url="https://api.pinference.ai/api/v1"
  )

  # Team account (required: set X-Prime-Team-ID header)
  client = openai.OpenAI(
      api_key="your-api-key-here",
      base_url="https://api.pinference.ai/api/v1",
      default_headers={
          "X-Prime-Team-ID": "your-team-id-here"
      }
  )

  response = client.chat.completions.create(
      model="meta-llama/llama-3.1-70b-instruct",
      messages=[
          {"role": "user", "content": "What is the capital of France?"}
      ]
  )

  print(response.choices[0].message.content)
  ```

  ```javascript JavaScript theme={null}
  const response = await fetch('https://api.pinference.ai/api/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer your-api-key-here',
      'Content-Type': 'application/json'
      // Add 'X-Prime-Team-ID': 'your-team-id-here' for team accounts
    },
    body: JSON.stringify({
      model: 'meta-llama/llama-3.1-70b-instruct',
      messages: [
        { role: 'user', content: 'What is the capital of France?' }
      ]
    })
  });

  const data = await response.json();
  console.log(data.choices[0].message.content);
  ```
</CodeGroup>

### Parameters

| Parameter     | Type         | Required | Description                    |
| ------------- | ------------ | -------- | ------------------------------ |
| `model`       | string       | Yes      | Model ID to use for completion |
| `messages`    | array        | Yes      | Conversation messages          |
| `max_tokens`  | integer      | No       | Maximum tokens to generate     |
| `temperature` | number       | No       | Sampling temperature (0-2)     |
| `stream`      | boolean      | No       | Enable streaming responses     |
| `stop`        | string/array | No       | Stop sequences                 |

### Response

```json theme={null}
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1693721698,
  "model": "meta-llama/llama-3.1-70b-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}
```

## Streaming

Enable real-time response streaming by setting `stream: true`:

<CodeGroup>
  ```python Python theme={null}
  stream = client.chat.completions.create(
      model="meta-llama/llama-3.1-70b-instruct",
      messages=[{"role": "user", "content": "Tell me a story"}],
      stream=True
  )

  for chunk in stream:
      if chunk.choices[0].delta.content is not None:
          print(chunk.choices[0].delta.content, end="")
  ```

  ```bash cURL theme={null}
  curl -X POST https://api.pinference.ai/api/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "meta-llama/llama-3.1-70b-instruct",
      "messages": [{"role": "user", "content": "Tell me a story"}],
      "stream": true
    }'
  ```

  ```javascript JavaScript theme={null}
  const response = await fetch('https://api.pinference.ai/api/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer your-api-key-here',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'meta-llama/llama-3.1-70b-instruct',
      messages: [{ role: 'user', content: 'Tell me a story' }],
      stream: true
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    console.log(chunk);
  }
  ```
</CodeGroup>

## Advanced Parameters

### Temperature Control

```python theme={null}
# Deterministic output
response = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[{"role": "user", "content": "Solve: 2+2=?"}],
    temperature=0.1
)

# Creative output
response = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[{"role": "user", "content": "Write a poem"}],
    temperature=0.9
)
```

### System Messages

```python theme={null}
response = client.chat.completions.create(
    model="meta-llama/llama-3.1-70b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor."},
        {"role": "user", "content": "Explain calculus"}
    ]
)
```

## Error Handling

### Rate Limit (429)

```json theme={null}
{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}
```

### Invalid Model (400)

```json theme={null}
{
  "error": {
    "message": "Invalid model specified",
    "type": "invalid_request_error",
    "code": "invalid_model"
  }
}
```

### Context Length Exceeded (400)

```json theme={null}
{
  "error": {
    "message": "Context length exceeded",
    "type": "invalid_request_error",
    "code": "context_length_exceeded"
  }
}
```
