Agent Management

why runloop

Manage, version, and measure your agents.

The Runloop platform puts your agents in one place: a registry to organize them, versioning to track every iteration, and benchmarking to measure them. Teams compare agents head-to-head to evaluate cost and performance.

Text button?

Button Primary

BADGE

Every agent, organized in one place

A complete agent sandbox does three things: define and provision the environment, control its state across runs, and run anything the agent needs inside it. Runloop ships each as a composable primitive, so you can use one on its own or wire them together.

Block 1 Title

Devboxes: Hardware-isolated microVMs with full system access and sub-second startup. Your agent gets a real machine to work in, with nothing it breaks reaching anything else.

Blueprints: Your devbox environment as code: a Dockerfile, system setup commands, code mounts, build args, secrets, named contexts, and network policie.

Together, they define what your agent agent's environment looks like and provision it on demand

Configurable

Query the stream from any control surface: UI, API, CLI, SDK

Scale

Production-scale durability and availability

Find Out More

Block 2 Title

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Every agent, organized in one place

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Streamlined Surface Area:
Create a session, publish an event, subscribe to the stream. Hold the whole surface in your head.

Same Calls no matter the workload:
The surface stays the same no matter how complex the worflow is

Find Out More

Block 4 Title

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

feature detail row

Validate Agent Behavior Across Multi-Step Workflows

When an agent is tasked with resolving a GitHub issue, it makes dozens of sequential decisions: which files to read, what edits to make, which tests to run, whether to retry a failed command. A single misstep at step 14 can invalidate the preceding 13 steps. Runloop captures the complete decision trajectory so you can pinpoint exactly where agent reasoning diverges from expected behavior.

Sentry fires an exception alert

Full execution path recording with decision-point annotations

Replay and modify

Change parameters at any step and re-run from that point

Hands back a diagnosis

Run each scenario N times to distinguish real failures from noise

Regression detection

Compare current agent against baseline across identical scenarios

View Documentation

MCP Hub

feature detail row inverse

Validate Agent Behavior Across Multi-Step Workflows

MCP Hub

When an agent is tasked with resolving a GitHub issue, it makes dozens of sequential decisions: which files to read, what edits to make, which tests to run, whether to retry a failed command. A single misstep at step 14 can invalidate the preceding 13 steps. Runloop captures the complete decision trajectory so you can pinpoint exactly where agent reasoning diverges from expected behavior.

Sentry fires an exception alert

Full execution path recording with decision-point annotations

Replay and modify

Change parameters at any step and re-run from that point

Hands back a diagnosis

Run each scenario N times to distinguish real failures from noise

Regression detection

Compare current agent against baseline across identical scenarios

View Documentation

developer quickstart

Three lines of code to production infrastructure

Runloop is API-first. SDKs for Python and TypeScript. Full CLI. Every operation that works in the dashboard works through the API.

Create a Devbox

A Devbox is an isolated Linux MicroVM with its own filesystem, network, and process space. Create one from a Blueprint or from scratch.

PYTHON

import runloop

	client = runloop.Client(api_key="rl_...")
    devbox = client.devboxes.create(
    blueprint_id="bp_python312",
    launch_parameters={
        "keep_alive_time_seconds": 600,
        "architecture": "x86_64"
    }
)

print(devbox.id)       # dvb_abc123
print(devbox.status)   # "running"

Response

PYTHON

{
  "id": "dvb_abc123",
  "status": "running",
  "blueprint_id": "bp_python312",
  "architecture": "x86_64",
  "created_at": "2025-01-15T08:30:00Z"
}

Full Devbox API Reference

Run a Command

Execute any shell command inside the Devbox. Synchronous calls block until completion. Async calls return immediately with an execution ID you can poll.

PYTHON

result = devbox.execute_sync(
    command="python -m pytest tests/ -v"
)

print(result.stdout)
print(result.exit_code)  # 0
print(result.duration_ms) # 1823

Response

PYTHON

{
  "id": "exec_xyz789",
  "status": "completed",
  "stdout": "47 passed in 1.82s",
  "stderr": "",
  "exit_code": 0,
  "duration_ms": 1823
}

Take a Snashot

Capture the full state of a running Devbox -- filesystem, installed packages, environment variables. Snapshots are delta-compressed, so storage cost scales with actual changes, not total disk size.

PYTHON

snapshot = devbox.snapshot(
    name="after-setup",
    metadata={"step": "post-install"}
)

print(snapshot.id)          # snap_def456
print(snapshot.size_bytes)  # 12582912

Response

PYTHON

{
  {
  "id": "snap_def456",
  "devbox_id": "dvb_abc123",
  "name": "after-setup",
  "size_bytes": 12582912,
  "created_at": "2025-01-15T08:31:22Z"
}

Fork from Snapshot

Launch new Devboxes from any snapshot. Each fork starts from identical state but runs independently. Use this for parallel evaluation, A/B testing, or competitive benchmarking.

PYTHON

# Fork into 3 parallel environments
forks = [
    client.devboxes.create(
        snapshot_id="snap_def456"
    )
    for _ in range(3)
]

# Each fork has identical filesystem state
for fork in forks:
    fork.execute_sync(command="python solve.py")

Suspend / Resume

Pause a Devbox when the agent is idle -- waiting for human review, a webhook, or a scheduled trigger. Zero compute cost while suspended. Resume picks up from exact prior state.

PYTHON

# Agent reaches a human-in-the-loop checkpoint
devbox.suspend()
# Status: "suspended" -- no compute charges

# Human approves via webhook callback
devbox.resume()
# Status: "running" -- exact prior state restored

result = devbox.execute_sync(
    command="python finalize.py"
)

Quickstart

FAQ'S Only one section

Everything You Need to Know

We’re dedicated to solving the complex challenges of productionizing AI for software engineering at scale.

The Execution platform for AI Agents

Manage, version, and measure your agents.

Every agent, organized in one place

Block 1 Title

Block 2 Title

Every agent, organized in one place

Block 4 Title

Validate Agent Behavior Across Multi-Step Workflows

Validate Agent Behavior Across Multi-Step Workflows

Three lines of code to production infrastructure

Build on the Primitives your
AI Agents Need

Everything You Need to Know

Get Started With Runloop

Get Started With Runloop

The Execution platform for AI Agents

Manage, version, and measure your agents.

Every agent, organized in one place

Block 1 Title

Block 2 Title

Every agent, organized in one place

Block 4 Title

Validate Agent Behavior Across Multi-Step Workflows

Validate Agent Behavior Across Multi-Step Workflows

Three lines of code to production infrastructure

Build on the Primitives your AI Agents Need

Everything You Need to Know

Get Started With Runloop

Get Started With Runloop

Build on the Primitives your
AI Agents Need