🚀 Start for free and receive $25 in credits to accelerate your AI software engineering

Enterprise-Grade Sandbox Environments for AI Agents

Build, Benchmark, and Deploy AI Agents in the Cloud

Foundational AI Infrastructure

Build in Secure & Scalable Development Environments

Scalable VMs available on demand with robust connections like GitHub repositories and SSH secured data stores

Standardize Containers & Streamline Work with Blueprints

Construct SDEs to match every task or agent, from configuration settings to program packages

Public & Custom Benchmarks

Public Benchmarks Beyond SWE-Bench

Runloop provides automated benchmarking tools to evaluate AI agents on real-world coding tasks, ensuring measurable progress and increased reliability

Custom Defined Code Scenarios & Scoring Functions

Compound proprietary advantages by constructing custom benchmarks to refine agent's performance on your priorities

// AI INFRASTRUCTURE

AI Infrastructure with Performance, Scalability, Reliability

Turbo-charged infrastructure to accelerate AI Agent deployment

Performance At Scale

Run 10k+ parallel sandboxes.
10GB image startup time in <2s.
All with leading reliability guarantees.

Elastic Compute

Automatically scale up/down sandbox CPU or Memory based on your Agent needs in realtime.
Pay only for what you use.

Long-Lived Code Sandboxes

Devboxes live as long as you need them.
Let your AI agent work with you from the first to the last line of code.

Enterprise Grade Security

Fully Isolated MicroVMs with Network Controls so your AI SWE can develop without risks.
Fully SOC2 Certified.

Built-in Observability

Full logs for everything run in Devboxes, all easily accessible via a web Dashboard.
Optionally capture network traffic and agent trajectories with a configurable network proxy.

Our Cloud Or Yours

Use Runloop's fully hosted cloud solution or run in your own AWS/GCP account.

A dark-themed interface showing a 'Devbox ID' with a duration of 16 minutes and 15 seconds, and a completion status of 78%. The date range displayed is from August 31st, 4:56 AM to August 31st, 5:12 AM.

AI-Ready Devbox

Ensure your AI responds rapidly to user inputs.

Real-Time Monitoring

Test, debug, and observe AI coding processes in action.

Environment Snapshots

Capture and reproduce AI workspace states for iterative learning or forking for tree of thought.

Parallelization

Spin up 1000s of Devboxes in seconds.

A dark-themed code snippet displaying a printed output: 'Devbox created with ID' and a file path '/home/user/runloop' highlighted within a JSON-like structure

AI with the tools to be a 10x engineer

Advanced Code Understanding Tools

Language Server Integration

Enable AI to navigate code, syntax highlighting, and error detection.

Semantic Code Analysis

Empower AI to grasp complex code structures and best practices.

AI engineer from intern to star developer

AI Performance Tracking and Improvement

AI-Specific Metrics

Measure coding efficiency, accuracy, and output quality.

Comprehensive Analytics

Gain insights akin to tracking human developer productivity.

Continuous Improvement Framework

Refine and enhance AI tools based on performance data.

// TOOLS

Developer Tools for Agentic Workflows


client = Runloop(bearer_token="ak_71BhFKHw0inHn3I6cCVOz")

# Create a Repo Connection to https://github.com/psf/requests
repo_connection = client.repositories.create(owner="psf", name="requests")


client = Runloop(bearer_token="ak_71BhFKHw0inHn3I6cCVOz")

devbox = client.devboxes.create_and_await_running(name="snapshot-test-devbox")

# Add a file to the devbox
client.devboxes.write_file_contents(
    id=devbox.id,
    file_path="/home/user/test_file.txt",
    contents=f"Hello World! \nTimestamp: {str(time.time())}"
)

# Take snapshot
snapshot = client.devboxes.snapshot_disk(
    id=devbox.id, name="snapshot-1-with-test-file"
)


async def trajectory_executor():
    client = Runloop(bearer_token="ak_71BhFKHw0inHn3I6cCVOz")

    devbox = client.devboxes.create_and_await_running(name="trajectory-executor-devbox")

    async def execute_with_snapshot(command, snapshot_name):
        """Execute a command and create a snapshot before execution"""
        snapshot = client.devboxes.snapshot_disk(id=devbox.id, name=snapshot_name)

        # Execute the command
        result = client.devboxes.execute_sync(id=devbox.id, command=command)

        return {"snapshot": snapshot, "command_result": result}

    # Execute a series of commands with snapshots
    commands = [
        "echo 'Step 1: Initial setup' > /home/user/trajectory_log.txt",
        "echo 'Step 2: Installing package' >> /home/user/trajectory_log.txt && apt-get update",
        "echo 'Step 3: Creating directory' >> /home/user/trajectory_log.txt && mkdir -p /home/user/test_dir",
        "echo 'Step 4: Final step' >> /home/user/trajectory_log.txt && ls -la /home/user/",
    ]

    results = []
    for i, command in enumerate(commands):
        result = await execute_with_snapshot(command, f"before-step-{i}")
        results.append(result)
        time.sleep(2)  # Small delay between commands

    return {"devbox": devbox, "trajectory_results": results}


# Choose your agent and complete eval cases with our harness 
# @ https://github.com/runloopai/public_benchmarks_example

client = Runloop(bearer_token="ak_71BhFKHw0inHn3I6cCVOz")

# Start princeton-nlp/SWE-bench_Verified in a single line
benchmark_run = client.benchmarks.start_run(
    benchmark_id="bmd_2zmp3Mu3LhWu7yDVIfq3m",
    run_name="SWE-bench_Verified_run",
)


client = Runloop(bearer_token="ak_71BhFKHw0inHn3I6cCVOz")

devbox = client.devboxes.create_and_await_running(
    name="multi-language-boot-demo",
)

# Test Node.js/npm functionality
client.devboxes.execute_sync(
    id=devbox.id, command="npm --version && node --version"
)

# Create a simple Node.js project and run npm install
setup_commands = [
    "mkdir -p /home/user/nodejs-demo && cd /home/user/nodejs-demo",
    "npm init -y",
    "npm install express",
    "echo 'console.log(\"Node.js environment ready!\");' > index.js",
]

for command in setup_commands:
    result = client.devboxes.execute_sync(id=devbox.id, command=command)

# Test Python/venv functionality
client.devboxes.execute_sync(
    id=devbox.id, command="python3 --version && python3 -m venv --help"
)

# Create a Python virtual environment and install packages
python_commands = [
    "mkdir -p /home/user/python-demo && cd /home/user/python-demo",
    "python3 -m venv myenv",
    "source myenv/bin/activate && pip install requests flask",
    "source myenv/bin/activate && python3 -c \"import requests, flask; print('Python environment ready!')\"",
]

for command in python_commands:
    client.devboxes.execute_sync(id=devbox.id, command=command)

Access Your Code in Seconds

Repo Connect takes the lift out of connecting your AI agent to a code base

Instant Dev Environments with Snapshots

Instantly create, snapshot, and restore development environments for consistent setups.

Exec Snapshots

Run complex command sequences with automatic state capture for full control and easy rollback.

Run a Benchmark with Ease

Run an advanced performance benchmark in hours, not weeks

High-Performance Async Operations

Our API handles complex environment provisioning asynchronously, ensuring fast, scalable workflows.

// DEVBOXES

A Complete Infra Toolbox for AI SWEs

Everything you need to prototype your Agent for one use case and scale to thousands

Beyond Generating Code - Create Full AI Engineers

Devboxes enable agents do more than just evaluate code. Let agents work on features for you with full shell and build toolchains.
Simplify Deployments by deploying agents directly into the isolated environment for full control.

Browser + Computer Use out of the box

Use Runloop's Browser and Computer templates to enable your Agent to gather context using any developer tools or automate real world workflows.

Tunnels

Share local development servers or processes from the Devbox with others by opening secure web tunnels to expose local ports.

Snapshot, Fork, and Explore

Snapshot the full state of a Devbox at any point to save progress and resume later.
Create new forked Devboxes from the snapshot and allow your agent to explore different ideas in parallel.

Support Any Developer Environment

Use Runloop Blueprints to create custom environment images to meet any development requirements:
- Full Dockerfile compatibility
- Docker in Docker support
- Any architecture (x86, ARM, etc.) and any OS

Connect From Your IDE and Terminal

Connect to any Devbox directly from your IDE or Terminal using SSH.
Enable developers to take the lead with human-in-the-loop flows when an Agent needs low level guidance.

// BENCHMARKS

Benchmarking At Scale

Test Your Agents against existing Frontier Benchmarks like SWE bench in minutes.
Or create the next frontier benchmark yourself and share in one click.

Easily Run Agents on Existing Frontier Benchmarks

Run against popular benchmarks like SWE-Bench, SWE-Smith, Web Voyager, and more in just a few lines of code.

Run Large Benchmarks Fully in Parallel

Run thousands of test cases in parallel to improve your iteration loop speed. Quickly identify agent performance issues and improve.

Custom Benchmarks

Create your own benchmarks to test custom capabilities of models. Runloop supports all environments you might need from standalone coding cases to browser use to computer use.

AI-Ready Devbox

Ensure your AI responds rapidly to user inputs.

Real-Time Monitoring

Test, debug, and observe AI coding processes in action.

Environment Snapshots

Capture and reproduce AI workspace states for iterative learning or forking for tree of thought.

Parallelization

Spin up 1000s of Devboxes in seconds.

AI with the tools to be a 10x engineer

Advanced Code Understanding Tools

Language Server Integration

Enable AI to navigate code, syntax highlighting, and error detection.

Semantic Code Analysis

Empower AI to grasp complex code structures and best practices.

AI engineer from intern to star developer

AI Performance Tracking and Improvement

AI-Specific Metrics

Measure coding efficiency, accuracy, and output quality.

Comprehensive Analytics

Gain insights akin to tracking human developer productivity.

Continuous Improvement Framework

Refine and enhance AI tools based on performance data.

// FAQs

Everything You Need to Know

We’re a talented team of former Google and Stripe engineers dedicated to solving the complex challenges of productionizing AI for software engineering at scale.

How does Runloop support agentic AI workflows?

Why do AI coding agents need new infrastructure?

What types of AI use cases benefit from Runloop's infrastructure?

How does Runloop ensure safe and secure code execution for AI agents?

Why are AI coding agent benchmarks essential?

How does Runloop pricing work?

How easy is it to integrate Runloop with existing AI development pipelines?

What makes Runloop's AI code execution infrastructure enterprise-grade?

Is Runloop suitable for both individual developers and enterprises?

AI-Ready Devbox

Ensure your AI responds rapidly to user inputs.

Real-Time Monitoring

Test, debug, and observe AI coding processes in action.

Environment Snapshots

Capture and reproduce AI workspace states for iterative learning or forking for tree of thought.

Parallelization

Spin up 1000s of Devboxes in seconds.

AI with the tools to be a 10x engineer

Advanced Code Understanding Tools

Language Server Integration

Enable AI to navigate code, syntax highlighting, and error detection.

Semantic Code Analysis

Empower AI to grasp complex code structures and best practices.

AI engineer from intern to star developer

AI Performance Tracking and Improvement

AI-Specific Metrics

Measure coding efficiency, accuracy, and output quality.

Comprehensive Analytics

Gain insights akin to tracking human developer productivity.

Continuous Improvement Framework

Refine and enhance AI tools based on performance data.

// Programming Languages

Run AI-Generated Code in Production

Secure, scalable development environments ready in milliseconds.

Boot: 300ms

Auto-scaling

Secure sandbox

Production ready

Python Environment

Complete Python development environment

Core Tools

> Python 3.x runtime

> pip, conda package managers

> venv environment management

Development Tools

> Pytest test framework

> black code formatter

> mypy type checking

• Enterprise security • Native debugging

Boot: 300ms

Auto-scaling

Secure sandbox

Production ready

TypeScript Environment

Complete TypeScript development environment

Core Tools

> Node.js runtime

> npm, yarn package managers

> TypeScript compiler

Development Tools

> jest testing framework

> eslint linter

> prettier formatter

• Enterprise security • Native debugging

• Enterprise security • Instant scaling • Native debugging • Full system access •

Enterprise security • Instant scaling • Native debugging • Full system access

Boot: 300ms

Auto-scaling

Secure sandbox

Production ready

Java Environment

Complete Java development environment

Core Tools

> JDK environment

> maven, gradle build tools

> jar packaging support

Development Tools

> junit test framework

> checkstyle linter

> debugger integration

• Enterprise security • Native debugging

• Enterprise security • Instant scaling • Native debugging • Full system access •

Enterprise security • Instant scaling • Native debugging • Full system access •

Boot: 300ms

Auto-scaling

Secure sandbox

Production ready

C++ Environment

Complete C++ development environment

Core Tools

> gcc/clang compilers

> cmake build system

> package managers (conan/vcpkg)

Development Tools

> gtest/catch2 testing

> clang-format

> debugging tools

• Enterprise security • Native debugging

• Enterprise security • Instant scaling • Native debugging • Full system access •

Enterprise security • Instant scaling • Native debugging • Full system access •

Boot: 300ms

Auto-scaling

Secure sandbox

Production ready

Go Environment

Complete Go development environment

Core Tools

> Go toolchain

> module support

> dependency management

Development Tools

> go test framework

> golangci-lint

> delve debugger

• Enterprise security • Native debugging

Scale your AI Infrastructure
solution faster.

Stop building infrastructure. Start building your AI engineering product.

Contact Sales

Explore Docs

Enterprise-Grade Sandbox Environments for AI Agents

Foundational AI Infrastructure

Public & Custom Benchmarks

Get Started for Free

AI Infrastructure with Performance, Scalability, Reliability

Observable Development Environment for AI

Advanced Code Understanding Tools

AI Performance Tracking and Improvement

The Building Blocks for AI-Powered Developer Tools

Want to learn more about Runloop?

Observable Development Environment for AI

Advanced Code Understanding Tools

AI Performance Tracking and Improvement

Empower Your AI, Accelerate Your Innovation

Observable Development Environment for AI

Advanced Code Understanding Tools

AI Performance Tracking and Improvement

Developer Tools for Agentic Workflows

Access Your Code in Seconds

Instant Dev Environments with Snapshots

Exec Snapshots

Run a Benchmark with Ease

High-Performance Async Operations

A Complete Infra Toolbox for AI SWEs

Benchmarking At Scale

Observable Development Environment for AI

Advanced Code Understanding Tools

AI Performance Tracking and Improvement

Everything You Need to Know

Observable Development Environment for AI

Advanced Code Understanding Tools

AI Performance Tracking and Improvement

Run AI-Generated Code in Production

The Platform for AI-Driven Software Engineering Tools

AI Pair Programming Assistant

High-Performance Infrastructure

Contextual Code Analysis

Suggestion Quality Metrics

AI-Enhanced Code Review System

Parallel Processing Capabilities

Customizable Evaluation Criteria

Review Quality Assessments

Intelligent Test Generation Platform

Language-Agnostic Environments

Development Tool Integrations

Test Coverage Evaluations

Scale your AI Infrastructuresolution faster.

Observable Development Environment for AI

Advanced Code Understanding Tools

AI Performance Tracking and Improvement

Scale your AI Infrastructure
solution faster.