Deploying LoRA Adapters for Inference

When a hosted training run completes, it produces a LoRA adapter — a lightweight set of model weights that captures what the model learned during training. You can deploy these adapters for live inference and query them through an OpenAI-compatible API, using the same tools and SDKs you already use.

Prerequisites

A completed training run with a READY LoRA adapter (see End-to-End Training Run)
The Prime CLI installed and authenticated (prime login)
A Prime API key with Inference permission (see Inference Overview)

Step 1: List Your LoRA Adapters

View all LoRA adapters from your training runs and their current status:

prime deployments list

You’ll see a table like this:

ID	Name	Base Model	Status	Deployed At
`gw3zytpj9den6zgp4w9xosnk`	my-model	Qwen/Qwen3-4B-Instruct-2507	NOT_DEPLOYED	-

The Status column shows where each adapter is in the deployment lifecycle — see the Status Reference below for all possible states.

Use prime deployments list -o json for machine-readable output, or --team <team_id> to filter by team.

Step 2: Deploy a LoRA Adapter

Deploy a LoRA adapter by its ID:

prime deployments create <adapter_id>

The CLI will show LoRA adapter details and ask for confirmation:

Deploying model:
  ID: gw3zytpj9den6zgp4w9xosnk
  Name: my-model
  Base Model: Qwen/Qwen3-4B-Instruct-2507

Are you sure you want to deploy this model? [y/N]: y

Deployment initiated successfully!
Status: DEPLOYING

The model is being deployed. This may take a few minutes.
Use 'prime deployments list' to check deployment status.

Deployment typically takes a few minutes. Use prime deployments list to check when the status changes to DEPLOYED.

Step 3: Run Inference

Once deployed, query your LoRA adapter through the OpenAI-compatible inference API. The model identifier uses the format base_model:adapter_id. Set your API key if you haven’t already:

export PRIME_API_KEY="your-api-key-here"

curl -X POST https://api.pinference.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PRIME_API_KEY" \
  -d '{
    "model": "Qwen/Qwen3-4B-Instruct-2507:gw3zytpj9den6zgp4w9xosnk",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'

Replace Qwen/Qwen3-4B-Instruct-2507:gw3zytpj9den6zgp4w9xosnk with your actual base model and adapter ID from prime deployments list.

Step 4: Unload a LoRA Adapter

When you no longer need the LoRA adapter for inference, unload it:

prime deployments delete <adapter_id>

Unload initiated successfully!
Status: UNLOADING

The model is being unloaded.
Use 'prime deployments list' to check status.

Unloading removes the LoRA adapter from serving but preserves the model files. You can redeploy the same adapter at any time with prime deployments create.

Status Reference

Status	Description
`NOT_DEPLOYED`	Adapter is not loaded for inference
`DEPLOYING`	Adapter is being loaded onto inference infrastructure
`DEPLOYED`	Adapter is live and accepting inference requests
`UNLOADING`	Adapter is being removed from inference
`DEPLOY_FAILED`	Deployment failed — check error details
`UNLOAD_FAILED`	Unload failed — check error details

Troubleshooting

Deployment stuck in DEPLOYING

Deployments typically complete within a few minutes. If the status remains DEPLOYING for an extended period, try running prime deployments list to check status. If the issue persists, contact support or post in the #inference channel in our Discord.

DEPLOY_FAILED or UNLOAD_FAILED

These error states indicate an infrastructure issue. You can retry deployment with prime deployments create <adapter_id>, or unload a failed deployment with prime deployments delete <adapter_id>. If the error recurs, contact support or post in the #inference channel in our Discord.

End-to-End Training Run

Walk through a complete hosted training run from start to finish.

Inference Overview

Learn more about the OpenAI-compatible inference API.

Troubleshooting

Solutions for common issues with hosted training runs.

Getting Started

Lab

Libraries

Compute

Deploying LoRA Adapters for Inference

Prerequisites

Step 1: List Your LoRA Adapters

Step 2: Deploy a LoRA Adapter

Step 3: Run Inference

Step 4: Unload a LoRA Adapter

Status Reference

Troubleshooting

Deployment stuck in DEPLOYING

DEPLOY_FAILED or UNLOAD_FAILED

End-to-End Training Run

Inference Overview

Troubleshooting

Getting Started

Lab

Libraries

Compute

​Prerequisites

​Step 1: List Your LoRA Adapters

​Step 2: Deploy a LoRA Adapter

​Step 3: Run Inference

​Step 4: Unload a LoRA Adapter

​Status Reference

​Troubleshooting

​Deployment stuck in DEPLOYING

​DEPLOY_FAILED or UNLOAD_FAILED

End-to-End Training Run

Inference Overview

Troubleshooting

Prerequisites

Step 1: List Your LoRA Adapters

Step 2: Deploy a LoRA Adapter

Step 3: Run Inference

Step 4: Unload a LoRA Adapter

Status Reference

Troubleshooting

Deployment stuck in DEPLOYING

DEPLOY_FAILED or UNLOAD_FAILED