Go from idea to training run to deployed agent using Lab. No programming or RL experience required.
In this guide, we’ll walk you through setting up your Lab workspace, creating your first agent environment, using it to evaluate baseline performance, launching an RL run with Hosted Training, and deploying your model for inference.These instructions are intended for use on a Mac or Linux CPU development environment. No previous experience with RL is required. In fact, experience with coding isn’t even required — we’ll use agents for everything. We’ll just assume that you have Claude Code, Codex, Cursor, OpenCode, Amp, or some other similar coding agent installed on your computer.
Ensure you have uv installed for managing Python packages:
Copy
Ask AI
curl -LsSf https://astral.sh/uv/install.sh | sh
Install the prime CLI:
Copy
Ask AI
uv tool install prime
Choose a folder on your machine as your Lab workspace (e.g. ~/dev/my-lab) and do:
Copy
Ask AI
prime lab setup
This command prepares your workspace with:
Python project bootstrap
Creates a Python project and installs verifiers for environment development.
Coding-agent setup
Configures your workspace for coding-agent workflows.
Instruction files
Downloads agent instruction files like AGENTS.md and Agent Skills.
Starter configs
Downloads example training and evaluation configs.
Copy
Ask AI
~/dev/demo prime lab setupSupported coding agents: codex, claude, cursor, opencode, ampPrimary coding agent [codex]:Using multiple coding agents? [y/N]:No pyproject.toml found, initializing uv project...Running: uv initInitialized project `demo`Running: uv add verifiers...... # install + download outputs omitted...[................................................................................] 1371 / 1371Downloaded configs/rl/wordle.toml from https://github.com/primeintellect-ai/verifiers+------------------------------------------ get started -------------------------------------------+| || idea -> environment -> eval -> training || || +-------------------------------- ask codex ---------------------------------+ || | | || | I want to train a model for <my task domain>. Propose an initial | || | environment scaffold including relevant tools, generate a small | || | synthetic dataset, run a quick eval baseline, inspect the results, | || | and decide how to iterate on refining the implementation. | || | | || +----------------------------------------------------------------------------+ || || +-------------------- quick commands --------------------+ || | | || | $ prime env init my-env | || | $ prime eval run my-env -m gpt-5-nano -n 5 | || | $ prime eval tui | || | $ prime rl run configs/rl/wiki-search.toml | || | $ prime gepa run my-env -m gpt-5-nano | || | | || +--------------------------------------------------------+ || |+--------------------------------------------------------------------------------------------------+
Use one Lab workspace per research project and version it with Git. A workspace can contain multiple environments, configs, scripts, data, and eval outputs.
For many low-to-medium complexity environments, we find that the latest coding agents are often capable of “one-shotting” them, when equipped with the provided context from prime lab setup and given a sufficiently detailed prompt.Providing the prompt below to a frontier coding agent (OpenCode + Codex 5.3) resulted in a fully functional environment for a calendar scheduling agent:
Save this as prompt.md and pass it directly to your coding agent as your initial task prompt.
prompt.md
Copy
Ask AI
Make an environment for a calendar scheduling agent.In each task, there should be a set of people with busy calendars, and individual + global constraints for scheduling the meeting.Some constraints can be "hard" (not allowed to violate), others can be "soft", where violating a constraint incurs some utility cost for certain attendees.Each attendee has a utility for the proposed meeting time between 0 and 1, and the task score will be the weighted average of attendee scores if an acceptable meeting time is found, and 0 otherwise.Attendee importance weights should be normalized to 1 for each task.We should be able to programmatically generate task problems, and deterministically validate that satisfying solutions exist (and what their best possible score would be).We should have fine-grained controls for key degrees of freedom in task generation, with higher-level parameters ("easy" / "medium" / "hard") for the full task set, which then map into setting ranges for the more fine-grained controls.Be creative, and use your judgment to design clean composition rules for converting meeting choices and conflicts into scores. Avoid complex branching/conditional logic where possible.Think carefully about designing your system in a way which discourages "backdoor" strategies or reward hacks.The best approach for an agent should be to make a good-faith effort to satisfy constraints as best as possible.Experiment with sampling strategies to ensure that tasks are solvable most of the time (so that we can pre-filter any unsolvable tasks cheaply), and that they aren't too easy -- there shouldn't be an abundance of valid solutions, random proposal times should be a bad strategy.Types of constraints we want to potentially account for:- Conflicting schedules- Time zones + early/late/day preferences- Meeting length- Room availability- Back-to-back meeting preferences- Desired-but-optional attendees- Other related constraints which reflect real-world calendar challengesDegrees of freedom:- Number of attendees- Window of consideration- Types of constraints- Tightness of constraintsUse the StatefulToolEnv pattern, and in-memory data structures for the calendar + attendee information. The agent should have tools for things like:- Checking attendee calendars- Viewing attendee constraints- Checking score of a proposed window- Submitting a windowThe environment should have a max_turns parameter, and tool results should show the remaining turns to the agent.Default limit should be enough to allow reasonable exploration, but not so high that the agent can brute-force search all times.We should also have a nice standalone script in the environment which creates a TUI to visualize a "calendar problem" similar to typical meeting apps, including attendees, timeblocks, and constraints, but fully in the terminal, using Rich styling, similar design language to the `prime eval tui` viewer implemented within the `verifiers` library (inspect verifiers source for reference).Create a detailed design doc and plan for testing (PLAN.md), implement in full, revise PLAN.md after major milestones to reflect accomplishments and updated TODOs, and run basic small evals throughout as needed.You are welcome to use the PRIME_API_KEY set in my environment for inference tests (see configs/endpoints.toml for models).Let me know when you're happy with your implementation.
You can view the created environment in the Environments Hub:
After the environment is created, we prompt our agent to test performance more exhaustively, then to start an RL training run using Qwen/Qwen3-30B-A3B-Instruct-2507, which is available for LoRA finetuning via Hosted Training.
Copy
Ask AI
test with GPT-4.1 models, make sure we're seeing proper rollouts that succeed with non-zero scores. those models should be able to solve; if they can't we have a bug somewhere or need clearer instructions in the env. Once you feel good about the env, make a RL config and start a training run with bs 128, 8 rollouts per group, for the 30b instruct qwen model.
Available models can be viewed with:
Copy
Ask AI
prime rl models
Example training configs (in configs/rl after running prime lab setup) look like:
From the platform, you can view training curves, rollouts, configs, logs, and checkpoints.Training runs are shareable. You can view this example run here:
Under Deployments, you can deploy LoRA adapters for inference with a single click:You can then share the Model Identifier with your coding agent and ask it to run some more evals if desired, or incorporate it into your application directly. And with that, you have now successfully deployed your first RL-trained model!