30,000+

concurrent environments

<10ms

Credential Gateway latencyenvironments

50ms

command execution

<20ms

MCP Hub routing
THE EXECUTION BOTTLENECK
Fine-tuning a model for code generation requires execution. The model generates a candidate patch, that solution must be applied to a real codebase in a real environment, tests must run, and the pass/fail signal feeds back into the training process. For a typical RFT experiment, that is thousands to tens of thousands of sandbox instantiations per training run. Most teams cobble this together with Docker containers, custom orchestration scripts, and significant engineering effort that has nothing to do with the actual research. The infrastructure overhead delays experiments, limits scale, and contaminates training signals with noise.
Learn how Runloop solves this
-- Engineering Lead, Series B AI Startup
EXECUTION INFRASTRUCTURE FOR TRAINING LOOPS

Thousands of sandboxes, orchestrated for training signal generation

Runloop provisions isolated sandboxes at the scale training loops demand -- fast enough that environment provisioning does not bottleneck the experiment, isolated enough that concurrent executions do not interfere, and instrumented enough that every execution produces clean scoring signals. Snapshots enable efficient environment reuse across training steps.

```python
import runloop

# Configure execution environments for RFT training loop
blueprint = runloop.blueprints.get("python-ml-training-env")

# Submit scenario batch for reward signal generation
job = runloop.benchmark_jobs.create(
    name="rft-epoch-42-reward-signals",
    scenarios=candidate_solutions,
    blueprint_id=blueprint.id,
    config={
        "concurrency": 200,
        "timeout_seconds": 120,
        "retry_attempts": 1
    }
)
# Collect scoring signals for training loop
results = runloop.benchmark_jobs.wait(job.id)
reward_signals = [(r.scenario_id, r.score) for r in results.runs]
```
npm install @runloop/api-client
```typescript
import Runloop from 'runloop';

// Configure execution environments for RFT training loop
const blueprint = await runloop.blueprints.get('python-ml-training-env');

// Submit scenario batch for reward signal generation
const job = await runloop.benchmarkJobs.create({
  name: 'rft-epoch-42-reward-signals',
  scenarios: candidateSolutions,
  blueprintId: blueprint.id,
  config: {
    concurrency: 200,
    timeoutSeconds: 120,
    retryAttempts: 1
  }
});

// Collect scoring signals for training loop
const results = await runloop.benchmarkJobs.wait(job.id);
const rewardSignals = results.runs.map(r => [r.scenarioId, r.score]);
```
npm install @runloop/api-client
```bash
# Submit scenario batch for reward signal generation
runloop benchmark run \
  --name "rft-epoch-42-reward-signals" \
  --scenarios ./candidate-solutions.yaml \
  --blueprint "python-ml-training-env" \
  --concurrency 200 \
  --timeout 120 \
  --retry-attempts 1

# Collect scoring signals
runloop benchmark results --job rft-epoch-42-reward-signals --format json
```
npm install @runloop/api-client
EXECUTION INFRASTRUCTURE FOR TRAINING LOOPS

Thousands of sandboxes, orchestrated for training signal generation

Runloop provisions isolated sandboxes at the scale training loops demand -- fast enough that environment provisioning does not bottleneck the experiment, isolated enough that concurrent executions do not interfere, and instrumented enough that every execution produces clean scoring signals. Snapshots enable efficient environment reuse across training steps.

```python
import runloop

# Configure execution environments for RFT training loop
blueprint = runloop.blueprints.get("python-ml-training-env")

# Submit scenario batch for reward signal generation
job = runloop.benchmark_jobs.create(
    name="rft-epoch-42-reward-signals",
    scenarios=candidate_solutions,
    blueprint_id=blueprint.id,
    config={
        "concurrency": 200,
        "timeout_seconds": 120,
        "retry_attempts": 1
    }
)
# Collect scoring signals for training loop
results = runloop.benchmark_jobs.wait(job.id)
reward_signals = [(r.scenario_id, r.score) for r in results.runs]
```
npm install @runloop/api-client
```typescript
import Runloop from 'runloop';

// Configure execution environments for RFT training loop
const blueprint = await runloop.blueprints.get('python-ml-training-env');

// Submit scenario batch for reward signal generation
const job = await runloop.benchmarkJobs.create({
  name: 'rft-epoch-42-reward-signals',
  scenarios: candidateSolutions,
  blueprintId: blueprint.id,
  config: {
    concurrency: 200,
    timeoutSeconds: 120,
    retryAttempts: 1
  }
});

// Collect scoring signals for training loop
const results = await runloop.benchmarkJobs.wait(job.id);
const rewardSignals = results.runs.map(r => [r.scenarioId, r.score]);
```
npm install @runloop/api-client
```bash
# Submit scenario batch for reward signal generation
runloop benchmark run \
  --name "rft-epoch-42-reward-signals" \
  --scenarios ./candidate-solutions.yaml \
  --blueprint "python-ml-training-env" \
  --concurrency 200 \
  --timeout 120 \
  --retry-attempts 1

# Collect scoring signals
runloop benchmark results --job rft-epoch-42-reward-signals --format json
```
npm install @runloop/api-client
badge

Infrastructure primitives for RFT and SFT at scale

Four capabilities from the [Runloop Platform](/product) that eliminate the gap between training framework and execution environment.

Blueprint Environments

Define execution environments as code. Language runtimes, dependencies, and toolchains specified declaratively. Same Blueprints work for training, evaluation, and production.

Scoring Contracts

Define scoring criteria per scenario: pass/fail, numeric scores, or custom evaluation functions. Structured results API returns clean signal for every execution.

Parallel Orchestration

Configurable concurrency limits match your training throughput. Retry policies handle transient failures without corrupting signal. Submit a BenchmarkJobDef and the platform manages the fleet.

Credential Gateway

Opaque tokens injected into sandboxes. Real credentials never exposed. Devbox-bound tokens expire on termination. Secure access to private repos during training.

From candidate solutions to training signals

Four steps from evaluation design to reward signal collection.

Process card icon.
Radar.
01
Define Your Scenarios

Encode training tasks as scenario definitions with candidate solution, runtime environment (via Blueprint), and scoring contract. Same format for both RFT reward generation and SFT data validation.

Green light and grid of the progress.
chat bubble icon
Radar.
02
Configure Execution Parameters

Set concurrency, retry policy, and timeouts to match training loop throughput. Use a BenchmarkJobDef template for standardized runs or submit ad-hoc for experimental iterations.

White light and grid of the progress.
chat bubble icon
Radar.
02
Submit and Execute

Submit via API. The platform schedules across isolated sandboxes, manages the lifecycle, and streams progress. Each scenario runs in its own environment -- no shared state, no cross-contamination.

White light and grid of the progress.
chat bubble icon
Radar.
02
Collect Reward Signals

Retrieve structured results for every scenario: identifier, score, duration, pass/fail status. Feed signals directly into your training loop for weight updates.

White light and grid of the progress.

No other AI agent infrastructure platform offers CI/CD-integrated benchmark evaluation as a product capability. Regression testing for AI agents is not a feature competitors provide -- it is a category Runloop created.

The execution substrate that fine-tuning requires

Three capabilities no alternative provides

Training-Loop Speed

Standard container solutions require you to build scheduling, concurrency, retry, and result aggregation from scratch. Runloop provides this as infrastructure via the benchmark job orchestration layer.

Grounded Scoring

Other sandboxes provide execution but not scoring infrastructure. Runloop combines execution with scoring contracts, structured results, and scenario definitions. Reward signals come from the platform.

Training Data Security

Credential Gateway prevents candidate solutions from exfiltrating API keys. For regulated environments, VPC deployment keeps training data inside your boundary.

F.A.Q

Fine-tuning infrastructure questions

Common questions about using Runloop as the execution layer for RFT, SFT, and RL experiments.

Is Runloop a training framework or an execution environment?
How fast can Runloop provision environments for training loop iterations?
Does Runloop support GPU workloads for fine-tuning?
Can I use Runloop with OpenAI's or Anthropic's fine-tuning programs?
How does Runloop handle training data security for proprietary codebases?
More questions? Visit our docs or send us a message