> ## Documentation Index
> Fetch the complete documentation index at: https://docs.primeintellect.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Deploying LoRA Adapters for Inference

> Deploy trained LoRA adapters from Hosted Training runs and query them via an OpenAI-compatible API

When a Hosted Training run completes, it produces a LoRA adapter — a lightweight set of model weights that captures what the model learned during training. You can deploy these adapters for live inference and query them through an OpenAI-compatible API, using the same tools and SDKs you already use.

## Prerequisites

* A completed training run with a **READY** LoRA adapter (see [End-to-End Training Run](/hosted-training/end-to-end-run))
* The Prime CLI installed and authenticated (`prime login`)
* A Prime API key with **Inference** permission (see [Inference Overview](/inference/overview))

## Step 1: List Your LoRA Adapters

View all LoRA adapters from your training runs and their current status:

```bash theme={null}
prime deployments list
```

You'll see a table like this:

| ID                         | Name     | Base Model                  | Status        | Deployed At |
| -------------------------- | -------- | --------------------------- | ------------- | ----------- |
| `gw3zytpj9den6zgp4w9xosnk` | my-model | Qwen/Qwen3-4B-Instruct-2507 | NOT\_DEPLOYED | -           |

The **Status** column shows where each adapter is in the deployment lifecycle — see the [Status Reference](#status-reference) below for all possible states.

<Tip>
  Use `prime deployments list -o json` for machine-readable output, or `--team <team_id>` to filter by team.
</Tip>

## Step 2: Deploy a LoRA Adapter

Deploy a LoRA adapter by its ID:

```bash theme={null}
prime deployments create <adapter_id>
```

The CLI will show LoRA adapter details and ask for confirmation:

```
Deploying model:
  ID: gw3zytpj9den6zgp4w9xosnk
  Name: my-model
  Base Model: Qwen/Qwen3-4B-Instruct-2507

Are you sure you want to deploy this model? [y/N]: y

Deployment initiated successfully!
Status: DEPLOYING

The model is being deployed. This may take a few minutes.
Use 'prime deployments list' to check deployment status.
```

<Note>
  Deployment typically takes a few minutes. Use `prime deployments list` to check when the status changes to **DEPLOYED**.
</Note>

## Step 3: Run Inference

Once deployed, query your LoRA adapter through the OpenAI-compatible inference API. The model identifier uses the format `base_model:adapter_id`.

Set your API key if you haven't already:

```bash theme={null}
export PRIME_API_KEY="your-api-key-here"
```

<CodeGroup>
  ```bash cURL theme={null}
  curl -X POST https://api.pinference.ai/api/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $PRIME_API_KEY" \
    -d '{
      "model": "Qwen/Qwen3-4B-Instruct-2507:gw3zytpj9den6zgp4w9xosnk",
      "messages": [{"role": "user", "content": "Hello"}],
      "max_tokens": 100
    }'
  ```

  ```python Python theme={null}
  from openai import OpenAI

  client = OpenAI(
      base_url="https://api.pinference.ai/api/v1",
      api_key="your-api-key"
  )

  response = client.chat.completions.create(
      model="Qwen/Qwen3-4B-Instruct-2507:gw3zytpj9den6zgp4w9xosnk",
      messages=[{"role": "user", "content": "Hello"}],
      max_tokens=100
  )

  print(response.choices[0].message.content)
  ```
</CodeGroup>

<Tip>
  Replace `Qwen/Qwen3-4B-Instruct-2507:gw3zytpj9den6zgp4w9xosnk` with your actual base model and adapter ID from `prime deployments list`.
</Tip>

## Step 4: Unload a LoRA Adapter

When you no longer need the LoRA adapter for inference, unload it:

```bash theme={null}
prime deployments delete <adapter_id>
```

```
Unload initiated successfully!
Status: UNLOADING

The model is being unloaded.
Use 'prime deployments list' to check status.
```

<Note>
  Unloading removes the LoRA adapter from serving but preserves the model files. You can redeploy the same adapter at any time with `prime deployments create`.
</Note>

## Status Reference

| Status          | Description                                           |
| --------------- | ----------------------------------------------------- |
| `NOT_DEPLOYED`  | Adapter is not loaded for inference                   |
| `DEPLOYING`     | Adapter is being loaded onto inference infrastructure |
| `DEPLOYED`      | Adapter is live and accepting inference requests      |
| `UNLOADING`     | Adapter is being removed from inference               |
| `DEPLOY_FAILED` | Deployment failed — check error details               |
| `UNLOAD_FAILED` | Unload failed — check error details                   |

## Troubleshooting

### Deployment stuck in DEPLOYING

Deployments typically complete within a few minutes. If the status remains **DEPLOYING** for an extended period, try running `prime deployments list` to check status. If the issue persists, contact support or post in the #inference channel in our [Discord](https://discord.gg/ZTFydGWPKj).

### DEPLOY\_FAILED or UNLOAD\_FAILED

These error states indicate an infrastructure issue. You can retry deployment with `prime deployments create <adapter_id>`, or unload a failed deployment with `prime deployments delete <adapter_id>`. If the error recurs, contact support or post in the #inference channel in our [Discord](https://discord.gg/ZTFydGWPKj).

<CardGroup cols={2}>
  <Card title="End-to-End Training Run" icon="arrow-trend-up" href="/hosted-training/end-to-end-run">
    Walk through a complete Hosted Training run from start to finish.
  </Card>

  <Card title="Inference Overview" icon="bolt" href="/inference/overview">
    Learn more about the OpenAI-compatible inference API.
  </Card>

  <Card title="Troubleshooting" icon="wrench" href="/hosted-training/troubleshooting">
    Solutions for common issues with Hosted Training runs.
  </Card>
</CardGroup>
