About

Environments Hub is a community-powered platform for aggregating and showcasing environments, both for RL training and downstream evaluation. You can view all available environments on our platform hub.

Motivation

There are a few inter-related issues we see with the current ecosystem for both evals and RL environments which we’re aiming to address with this hub:
  • Despite the rapidly growing interest in training LLMs with RL, there is currently no established community platform for exploring and sharing train-ready environments.
  • Environment implementations are often tied to a specific training RL stack and can be difficult to adapt to a new trainer.
  • Popular evaluation suites (lm_eval, lighteval, openbench, simple-evals, HELM) offer convenient entrypoints into many single-turn Q&A evals, but these suites generally lack support for tasks which are agentic in nature or require complex infrastructure setups (TAU-bench, TerminalBench, SWE-bench), resulting in a proliferation of independent eval repos without shared entrypoints or specs.
  • RL environments and agent evals are basically the same thing (dataset + harness + scoring rules), but current open-source efforts generally treat them as fundamentally separate.
  • Realistic agent environments can be complex pieces of software requiring dependencies and versioning, and are ill-served by monorepo structures for environment collections which can quickly become unmaintainable.
With the Environments Hub, we’ve built a community platform that doubles as a proper Python package registry. Environments are modules which declare dependencies in a pyproject.toml and are distributed as wheels. By adopting the verifiers spec, development efforts can focus on task-specific components (datasets, tools or harnesses, reward functions) and automatically leverage existing infrastructure for running evaluations or training models with RL.

Resources

Support

Join the Prime Intellect discord to discuss, share feedback, and ask any questions.