Skip to main content
When a hosted training run completes, it produces a LoRA adapter — a lightweight set of model weights that captures what the model learned during training. You can deploy these adapters for live inference and query them through an OpenAI-compatible API, using the same tools and SDKs you already use.

Prerequisites

  • A completed training run with a READY LoRA adapter (see End-to-End Training Run)
  • The Prime CLI installed and authenticated (prime login)
  • A Prime API key with Inference permission (see Inference Overview)

Step 1: List Your LoRA Adapters

View all LoRA adapters from your training runs and their current status:
prime deployments list
You’ll see a table like this:
                                   Adapter Deployments
┌──────────────────────────┬──────────┬────────────────────┬──────────────┬─────────────┐
│ ID                       │ Name     │ Base Model         │ Status       │ Deployed At │
├──────────────────────────┼──────────┼────────────────────┼──────────────┼─────────────┤
│ gw3zytpj9den6zgp4w9xosnk │ my-model │ Qwen/Qwen3-4B-Ins… │ NOT_DEPLOYED │ -           │
└──────────────────────────┴──────────┴────────────────────┴──────────────┴─────────────┘
The Status column shows where each adapter is in the deployment lifecycle — see the Status Reference below for all possible states.
Use prime deployments list -o json for machine-readable output, or --team <team_id> to filter by team.

Step 2: Deploy a LoRA Adapter

Deploy a LoRA adapter by its ID:
prime deployments create <adapter_id>
The CLI will show LoRA adapter details and ask for confirmation:
Deploying model:
  ID: gw3zytpj9den6zgp4w9xosnk
  Name: my-model
  Base Model: Qwen/Qwen3-4B-Instruct-2507

Are you sure you want to deploy this model? [y/N]: y

Deployment initiated successfully!
Status: DEPLOYING

The model is being deployed. This may take a few minutes.
Use 'prime deployments list' to check deployment status.
Deployment typically takes a few minutes. Use prime deployments list to check when the status changes to DEPLOYED.

Step 3: Run Inference

Once deployed, query your LoRA adapter through the OpenAI-compatible inference API. The model identifier uses the format base_model:adapter_id. Set your API key if you haven’t already:
export PRIME_API_KEY="your-api-key-here"
curl -X POST https://api.pinference.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $PRIME_API_KEY" \
  -d '{
    "model": "Qwen/Qwen3-4B-Instruct-2507:gw3zytpj9den6zgp4w9xosnk",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'
Replace Qwen/Qwen3-4B-Instruct-2507:gw3zytpj9den6zgp4w9xosnk with your actual base model and adapter ID from prime deployments list.

Step 4: Unload a LoRA Adapter

When you no longer need the LoRA adapter for inference, unload it:
prime deployments delete <adapter_id>
Unload initiated successfully!
Status: UNLOADING

The model is being unloaded.
Use 'prime deployments list' to check status.
Unloading removes the LoRA adapter from serving but preserves the model files. You can redeploy the same adapter at any time with prime deployments create.

Status Reference

StatusDescription
NOT_DEPLOYEDAdapter is not loaded for inference
DEPLOYINGAdapter is being loaded onto inference infrastructure
DEPLOYEDAdapter is live and accepting inference requests
UNLOADINGAdapter is being removed from inference
DEPLOY_FAILEDDeployment failed — check error details
UNLOAD_FAILEDUnload failed — check error details

Troubleshooting

Deployment stuck in DEPLOYING

Deployments typically complete within a few minutes. If the status remains DEPLOYING for an extended period, try running prime deployments list to check status. If the issue persists, contact support or post in the #inference channel in our Discord.

DEPLOY_FAILED or UNLOAD_FAILED

These error states indicate an infrastructure issue. You can retry deployment with prime deployments create <adapter_id>, or unload a failed deployment with prime deployments delete <adapter_id>. If the error recurs, contact support or post in the #inference channel in our Discord.