Prerequisites
- A completed training run with a READY LoRA adapter (see End-to-End Training Run)
- The Prime CLI installed and authenticated (
prime login) - A Prime API key with Inference permission (see Inference Overview)
Step 1: List Your LoRA Adapters
View all LoRA adapters from your training runs and their current status:Step 2: Deploy a LoRA Adapter
Deploy a LoRA adapter by its ID:Deployment typically takes a few minutes. Use
prime deployments list to check when the status changes to DEPLOYED.Step 3: Run Inference
Once deployed, query your LoRA adapter through the OpenAI-compatible inference API. The model identifier uses the formatbase_model:adapter_id.
Set your API key if you haven’t already:
Step 4: Unload a LoRA Adapter
When you no longer need the LoRA adapter for inference, unload it:Unloading removes the LoRA adapter from serving but preserves the model files. You can redeploy the same adapter at any time with
prime deployments create.Status Reference
| Status | Description |
|---|---|
NOT_DEPLOYED | Adapter is not loaded for inference |
DEPLOYING | Adapter is being loaded onto inference infrastructure |
DEPLOYED | Adapter is live and accepting inference requests |
UNLOADING | Adapter is being removed from inference |
DEPLOY_FAILED | Deployment failed — check error details |
UNLOAD_FAILED | Unload failed — check error details |
Troubleshooting
Deployment stuck in DEPLOYING
Deployments typically complete within a few minutes. If the status remains DEPLOYING for an extended period, try runningprime deployments list to check status. If the issue persists, contact support or post in the #inference channel in our Discord.
DEPLOY_FAILED or UNLOAD_FAILED
These error states indicate an infrastructure issue. You can retry deployment withprime deployments create <adapter_id>, or unload a failed deployment with prime deployments delete <adapter_id>. If the error recurs, contact support or post in the #inference channel in our Discord.