Use Runloop and W&B Weave to answer these questions with real benchmark data. Read Part 3 of our series.


Discover the new Runloop Agent API. Learn how its pluggable, versioned design helps streamline your operations. Explore use cases & examples!
The Runloop Agent API adds a cleaner way to package, version, and deploy agents across Devboxes. Register an agent once, then mount it by name or ID wherever it needs to run.
Setting up agentic workflows today usually means juggling three separate concerns:
Runloop Blueprints solve the first part well. But agents have often ended up as the awkward middle layer: either baked into a blueprint, or installed ad hoc after launch. That creates two common failure modes.
First, coupling agents to blueprints slows down iteration. Agents tend to change far more often than the underlying sandbox image. If every prompt change, package upgrade, or source update requires a blueprint rebuild, development gets slower than it needs to be.
Second, installing agents from public registries at Devbox startup is fragile. It adds startup latency and introduces external points of failure:
The result is a setup that’s hard to reproduce consistently. That’s a problem for developers trying to move quickly, and for teams running benchmarks or production workflows that need predictable behavior.
The Agent API lets you define an agent once and reuse it anywhere on Runloop. You specify where the code comes from and how it should be installed, and then reference that agent whenever you launch a devbox.
At a high level, the flow is simple:
agent_mountThis release supports four source types, covering the main ways teams ship agent code:
That flexibility matters because teams don’t all work the same way. Some agents are under active development in Git. Some are already published as packages. Some need to be bundled ahead of time for fast, deterministic startup.
Agent objects are immutable. Once created, an agent is a fixed snapshot of source and configuration.
That gives teams a clear versioning model:
1.0.0This is especially useful for evaluations and benchmark runs. If a result came from a specific agent ID or version, it always resolves to the same agent definition.
Here’s a minimal TypeScript example that creates an npm-based agent and mounts it onto a Devbox. Notice there is no manual shell setup, no custom bootstrap script per launch, and no blueprint rebuild just to change the agent.
import { RunloopSDK } from "@runloop/api-client";
const runloop = new RunloopSDK({
bearerToken: process.env.RUNLOOP_API_KEY,
});
async function main() {
// Reuse one stable blueprint for multiple agent roles.
const blueprintId = "bpt_123456789";
// Use agents created and saved previously
const codingAgentId = "agt_abcdefghi";
const reviewAgentId = "agt_jklmnopqr";
// Devbox for code generation
const codingDevbox = await runloop.devbox.create({
blueprint_id: blueprintId,
mounts: [
{
type: "agent_mount",
agent_id: codingAgentId,
agent_path: "/home/user/agent",
},
],
});
// Devbox for review, using the same blueprint but a different agent
const reviewDevbox = await runloop.devbox.create({
blueprint_id: blueprintId,
mounts: [
{
type: "agent_mount",
agent_id: reviewAgentId,
agent_path: "/home/user/agent",
},
],
});
console.log("Coding devbox:", codingDevbox.id);
console.log("Review devbox:", reviewDevbox.id);
}
main().catch(console.error);
The Agent API is designed to support a few common patterns especially well.
Benchmarks need stable environments and easy agent swapping. That’s exactly where the Agent API helps.
Instead of rebuilding benchmark blueprints for every agent revision, teams can keep the blueprint stable and compare different agents or versions independently. That makes it easier to answer practical questions like:
Because agent objects are versioned and immutable, benchmark runs are easier to audit and reproduce later.
Some workflows need more than one agent role. A coding agent may write changes, a review agent may inspect them, and a verification agent may run checks or validate results.
The key benefit here is decoupling. Different agents can run against the same underlying blueprint without forcing separate environment definitions for each role.
import { RunloopSDK } from "@runloop/api-client";
const runloop = new RunloopSDK({
bearerToken: process.env.RUNLOOP_API_KEY,
});
async function main() {
// Reuse one stable blueprint for multiple agent roles.
const blueprintId = "bpt_123456789";
// Use agents created and saved previously
const codingAgentId = "agt_abcdefghi";
const reviewAgentId = "agt_jklmnopqr";
// Devbox for code generation
const codingDevbox = await runloop.devbox.create({
blueprint_id: blueprintId,
mounts: [
{
type: "agent_mount",
agent_id: codingAgentId,
agent_path: "/home/user/agent",
},
],
});
// Devbox for review, using the same blueprint but a different agent
const reviewDevbox = await runloop.devbox.create({
blueprint_id: blueprintId,
mounts: [
{
type: "agent_mount",
agent_id: reviewAgentId,
agent_path: "/home/user/agent",
},
],
});
console.log("Coding devbox:", codingDevbox.id);
console.log("Review devbox:", reviewDevbox.id);
}
main().catch(console.error);
This pattern is useful whether the workflow is orchestrated formally or managed through application logic. The environment remains stable, while agent roles can evolve independently.
Most teams don’t want to rebuild core Devbox definitions every time agent code changes. In many cases, a blueprint remains stable for months while agents are updated frequently.
The Agent API separates those concerns cleanly:
That separation reduces rebuild cycles and keeps agent iteration lightweight.
Object-based agents are the fastest and most controlled way to install agents on your Devbox. Instead of fetching code from Git, npm, or PyPI at startup, the agent is packaged ahead of time and stored on Runloop infrastructure as a storage object.
This changes startup behavior in a useful way:
For production workloads, this makes startup both faster and more reliable.
Object-based agents are a strong fit when you need:
They also preserve the same mounting pattern as every other source type. That means teams can move from a git, npm, or pip-based agent to an object-based pattern without changing how Devboxes reference it.
The Agent API also fits naturally into CI/CD.
The runloopai/deploy-agent GitHub Action automates agent creation from a pipeline. It supports all source types, including Git-based deployment and workflows that first package a tar archive and then publish it as an object-based agent.
This lets teams treat agent deployment as a release step:
main, publish the latest development versionname: Deploy agent on release
on:
release:
types: [published]
jobs:
deploy-agent:
runs-on: ubuntu-latest
steps:
- name: Deploy Git-based agent to Runloop
uses: runloopai/deploy-agent@v1
with:
runloop-api-key: ${{ secrets.RUNLOOP_API_KEY }}
source-type: git
name: my-agent
version: ${{ github.event.release.tag_name }}
repository: ${{ github.server_url }}/${{ github.repository }}
ref: ${{ github.event.release.tag_name }}
This closes an important loop for teams shipping agents regularly:
The result is a cleaner path from source control to runnable infrastructure.
The fastest way to try the Agent API is to register one agent and mount it onto a Devbox.
A practical starting sequence looks like this:
nameversionagent_setup commandsagent_mountA few best practices help early:
code-review-agent instead of generic labelsagent_id when exact resolution mattersIf you want a managed starting point, Runloop also provides public wrappers for several common coding agents, including Claude Code, Codex, OpenCode, Gemini CLI, and DeepAgents through the public agent listing APIs documented in the Agents docs.
The Runloop Agent API adds a missing layer to agent infrastructure: a versioned, reusable way to define and mount agents without tying every change to a blueprint rebuild.
That matters in three places immediately:
By making agents declarative, pluggable, and immutable, the platform gives teams a cleaner way to iterate on the part of the stack that changes fastest.
To learn more, see the Runloop Agents documentation and API reference, then create an agent and mount it onto a devbox. The setup is simple, and the operational payoff shows up quickly.
Read More: Runloop Agents API Docs