The Execution platform for AI Agents


ISO Logo
why runloop

Manage, version, and measure your agents.

The Runloop platform puts your agents in one place: a registry to organize them, versioning to track every iteration, and benchmarking to measure them. Teams compare agents head-to-head to evaluate cost and performance.

Text button?
Button Primary
BADGE

Every agent, organized in one place

A complete agent sandbox does three things: define and provision the environment, control its state across runs, and run anything the agent needs inside it. Runloop ships each as a composable primitive, so you can use one on its own or wire them together.

Block 1 Title

Devboxes: Hardware-isolated microVMs with full system access and sub-second startup. Your agent gets a real machine to work in, with nothing it breaks reaching anything else.

Blueprints: Your devbox environment as code: a Dockerfile, system setup commands, code mounts, build args, secrets, named contexts, and network policie.

Together, they define what your agent agent's environment looks like and provision it on demand

Configurable

Query the stream from any control surface: UI, API, CLI, SDK

Scale

Production-scale durability and availability

Find Out More
Benefit image.
dashboard card UI mockup

Block 2 Title

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Token Efficiency
01
Resume Copy  
02

Every agent, organized in one place

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Streamlined Surface Area:
Create a session, publish an event, subscribe to the stream. Hold the whole surface in your head.
Same Calls no matter the workload:
The surface stays the same no matter how complex the worflow is
Find Out More
Dashboard showing last month spending
Benefit image.

Block 4 Title

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Token Efficiency
01
Resume Copy  
02
feature detail row

Validate Agent Behavior Across Multi-Step Workflows

When an agent is tasked with resolving a GitHub issue, it makes dozens of sequential decisions: which files to read, what edits to make, which tests to run, whether to retry a failed command. A single misstep at step 14 can invalidate the preceding 13 steps. Runloop captures the complete decision trajectory so you can pinpoint exactly where agent reasoning diverges from expected behavior.

Sentry fires an exception alert

Full execution path recording with decision-point annotations

Replay and modify

Change parameters at any step and re-run from that point

Hands back a diagnosis

Run each scenario N times to distinguish real failures from noise

Regression detection

Compare current agent against baseline across identical scenarios

View Documentation
MCP Hub
MCP Hub
MCP Hub
MCP Hub
MCP Hub
MCP Hub
feature detail row inverse

Validate Agent Behavior Across Multi-Step Workflows

MCP Hub
MCP Hub
MCP Hub
MCP Hub
MCP Hub
MCP Hub

When an agent is tasked with resolving a GitHub issue, it makes dozens of sequential decisions: which files to read, what edits to make, which tests to run, whether to retry a failed command. A single misstep at step 14 can invalidate the preceding 13 steps. Runloop captures the complete decision trajectory so you can pinpoint exactly where agent reasoning diverges from expected behavior.

Sentry fires an exception alert

Full execution path recording with decision-point annotations

Replay and modify

Change parameters at any step and re-run from that point

Hands back a diagnosis

Run each scenario N times to distinguish real failures from noise

Regression detection

Compare current agent against baseline across identical scenarios

View Documentation
developer quickstart

Three lines of code to production infrastructure

Runloop is API-first. SDKs for Python and TypeScript. Full CLI. Every operation that works in the dashboard works through the API.

Create a Devbox

A Devbox is an isolated Linux MicroVM with its own filesystem, network, and process space. Create one from a Blueprint or from scratch.

PYTHON
import runloop

	client = runloop.Client(api_key="rl_...")
    devbox = client.devboxes.create(
    blueprint_id="bp_python312",
    launch_parameters={
        "keep_alive_time_seconds": 600,
        "architecture": "x86_64"
    }
)

print(devbox.id)       # dvb_abc123
print(devbox.status)   # "running"
  
  

Response

PYTHON
{
  "id": "dvb_abc123",
  "status": "running",
  "blueprint_id": "bp_python312",
  "architecture": "x86_64",
  "created_at": "2025-01-15T08:30:00Z"
}
  
  

Run a Command

Execute any shell command inside the Devbox. Synchronous calls block until completion. Async calls return immediately with an execution ID you can poll.

PYTHON
result = devbox.execute_sync(
    command="python -m pytest tests/ -v"
)

print(result.stdout)
print(result.exit_code)  # 0
print(result.duration_ms) # 1823
  
  

Response

PYTHON
{
  "id": "exec_xyz789",
  "status": "completed",
  "stdout": "47 passed in 1.82s",
  "stderr": "",
  "exit_code": 0,
  "duration_ms": 1823
}
  
  

Take a Snashot

Capture the full state of a running Devbox -- filesystem, installed packages, environment variables. Snapshots are delta-compressed, so storage cost scales with actual changes, not total disk size.

PYTHON
snapshot = devbox.snapshot(
    name="after-setup",
    metadata={"step": "post-install"}
)

print(snapshot.id)          # snap_def456
print(snapshot.size_bytes)  # 12582912
  

Response

PYTHON
{
  {
  "id": "snap_def456",
  "devbox_id": "dvb_abc123",
  "name": "after-setup",
  "size_bytes": 12582912,
  "created_at": "2025-01-15T08:31:22Z"
}
  
  

Fork from Snapshot

Launch new Devboxes from any snapshot. Each fork starts from identical state but runs independently. Use this for parallel evaluation, A/B testing, or competitive benchmarking.

PYTHON
# Fork into 3 parallel environments
forks = [
    client.devboxes.create(
        snapshot_id="snap_def456"
    )
    for _ in range(3)
]

# Each fork has identical filesystem state
for fork in forks:
    fork.execute_sync(command="python solve.py")
  

Suspend / Resume

Pause a Devbox when the agent is idle -- waiting for human review, a webhook, or a scheduled trigger. Zero compute cost while suspended. Resume picks up from exact prior state.

PYTHON
# Agent reaches a human-in-the-loop checkpoint
devbox.suspend()
# Status: "suspended" -- no compute charges

# Human approves via webhook callback
devbox.resume()
# Status: "running" -- exact prior state restored

result = devbox.execute_sync(
    command="python finalize.py"
)
  
Quickstart
FAQ'S Only one section

Everything You Need to Know

We’re dedicated to solving the complex challenges of productionizing AI for software engineering at scale.

How easy is it to integrate Runloop with existing AI development pipelines?
What makes Runloop's AI code execution infrastructure enterprise-grade?
How does Runloop ensure safe and secure code execution for AI agents?