Benchmarks

The Execution platform for AI Agents

Reflex is the single pane for teams to run coding agents from one place. Create and deploy agents, develop automated multi-step flows, and manage permissions, costs, and performance across every team. Model-agnostic and allows multiple users to direct the same agent in real time. Every agent runs on Runloop's primitives: isolated execution, full observability, persistent state.

white gradient background
why runloop

Why We Built Reflex

Reflex began as an internal project when our primitives made running powerful agents at scale easy. The hard part was managing the flows once you had them: coordinating multi-step work, keeping teams in sync when deploying multiple agents.

So we built Reflex: a shared place where powerful agents work alongside people. Orchestrate complex flows and entire projects to push agents beyond task-by-task work.

Get Early Access
Button Primary
BADGE

Reflex in Action

A complete agent sandbox does three things: define and provision the environment, control its state across runs, and run anything the agent needs inside it. Runloop ships each as a composable primitive, so you can use one on its own or wire them together.

Block 1 Title

Devboxes: Hardware-isolated microVMs with full system access and sub-second startup. Your agent gets a real machine to work in, with nothing it breaks reaching anything else.

Blueprints: Your devbox environment as code: a Dockerfile, system setup commands, code mounts, build args, secrets, named contexts, and network policie.

Together, they define what your agent agent's environment looks like and provision it on demand

Configurable

Query the stream from any control surface: UI, API, CLI, SDK

Scale

Production-scale durability and availability

Find Out More
Benefit image.
dashboard card UI mockup

Provision Execution

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Token Efficiency
01
Resume Copy  
02

Launch Agent

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Streamlined Surface Area:
Create a session, publish an event, subscribe to the stream. Hold the whole surface in your head.
Same Calls no matter the workload:
The surface stays the same no matter how complex the worflow is
Find Out More
Dashboard showing last month spending
Benefit image.

Block 4 Title

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Token Efficiency
01
Resume Copy  
02
PRINCIPLES

A Unified Execution Model for AI Workflows

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Predictable Iteration

Changes can be tested in isolation and compared across runs, making progress measurable instead of anecdotal.

Clear Evaluation

Results are attributable to intentional differences—models, prompts, or code—not hidden state or infrastructure drift.

Scalable Execution

Workflows behave the same way at small scale and under parallel load, enabling reliable evaluation as systems grow.

Reflex Brings Runloop's Core Primatives Together

// Center This. Add another column for 5. Perhas change format? 

Execution
Lorem ipsum dolor sit amet, consectetur adipiscing.
View Integration
Agent Management
Lorem ipsum dolor sit amet, consectetur adipiscing.
View Integration
Coordination
Lorem ipsum dolor sit amet, consectetur adipiscing.
View Integration
Security // Add Measurement
Lorem ipsum dolor sit amet, consectetur adipiscing.
View Integration
feature detail row

Validate Agent Behavior Across Multi-Step Workflows

When an agent is tasked with resolving a GitHub issue, it makes dozens of sequential decisions: which files to read, what edits to make, which tests to run, whether to retry a failed command. A single misstep at step 14 can invalidate the preceding 13 steps. Runloop captures the complete decision trajectory so you can pinpoint exactly where agent reasoning diverges from expected behavior.

Sentry fires an exception alert

Full execution path recording with decision-point annotations

Replay and modify

Change parameters at any step and re-run from that point

Hands back a diagnosis

Run each scenario N times to distinguish real failures from noise

Regression detection

Compare current agent against baseline across identical scenarios

View Documentation
MCP Hub
MCP Hub
MCP Hub
MCP Hub
MCP Hub
MCP Hub
Threat Landscape

Prompt injection is the number one security risk in LLM applications

The OWASP Foundation identified prompt injection as the most critical vulnerability in large language model applications, noting that adversarial prompts can manipulate agents into exfiltrating credentials, bypassing access controls, and executing unauthorized actions. When agents hold API keys and database credentials, this vulnerability class becomes a credential theft vector.

OWASP Top 10 for Large Language Model Applications

OWASP Foundation, 2025

Read the full report
Content Radar

Sign up for Early Access

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam.

14 day trial – No credit card required

Thank you! Your submission has been received!
Ooops! Form submission failed.
Dashboard showing last month spending
Thomas C.
feature detail row

Validate Agent Behavior Across Multi-Step Workflows

When an agent is tasked with resolving a GitHub issue, it makes dozens of sequential decisions: which files to read, what edits to make, which tests to run, whether to retry a failed command. A single misstep at step 14 can invalidate the preceding 13 steps. Runloop captures the complete decision trajectory so you can pinpoint exactly where agent reasoning diverges from expected behavior.

Sentry fires an exception alert

Full execution path recording with decision-point annotations

Replay and modify

Change parameters at any step and re-run from that point

Hands back a diagnosis

Run each scenario N times to distinguish real failures from noise

Regression detection

Compare current agent against baseline across identical scenarios

View Documentation
MCP Hub
MCP Hub
MCP Hub
MCP Hub
MCP Hub
MCP Hub
FAQ'S Only one section

Everything You Need to Know

We’re dedicated to solving the complex challenges of productionizing AI for software engineering at scale.

How easy is it to integrate Runloop with existing AI development pipelines?
What makes Runloop's AI code execution infrastructure enterprise-grade?
How does Runloop ensure safe and secure code execution for AI agents?