Thousands of sandboxes, orchestrated for training signal generation
Runloop provisions isolated sandboxes at the scale training loops demand -- fast enough that environment provisioning does not bottleneck the experiment, isolated enough that concurrent executions do not interfere, and instrumented enough that every execution produces clean scoring signals. Snapshots enable efficient environment reuse across training steps.
```python
import runloop
# Configure execution environments for RFT training loop
blueprint = runloop.blueprints.get("python-ml-training-env")
# Submit scenario batch for reward signal generation
job = runloop.benchmark_jobs.create(
name="rft-epoch-42-reward-signals",
scenarios=candidate_solutions,
blueprint_id=blueprint.id,
config={
"concurrency": 200,
"timeout_seconds": 120,
"retry_attempts": 1
}
)
# Collect scoring signals for training loop
results = runloop.benchmark_jobs.wait(job.id)
reward_signals = [(r.scenario_id, r.score) for r in results.runs]
```
```typescript
import Runloop from 'runloop';
// Configure execution environments for RFT training loop
const blueprint = await runloop.blueprints.get('python-ml-training-env');
// Submit scenario batch for reward signal generation
const job = await runloop.benchmarkJobs.create({
name: 'rft-epoch-42-reward-signals',
scenarios: candidateSolutions,
blueprintId: blueprint.id,
config: {
concurrency: 200,
timeoutSeconds: 120,
retryAttempts: 1
}
});
// Collect scoring signals for training loop
const results = await runloop.benchmarkJobs.wait(job.id);
const rewardSignals = results.runs.map(r => [r.scenarioId, r.score]);
``````bash
# Submit scenario batch for reward signal generation
runloop benchmark run \
--name "rft-epoch-42-reward-signals" \
--scenarios ./candidate-solutions.yaml \
--blueprint "python-ml-training-env" \
--concurrency 200 \
--timeout 120 \
--retry-attempts 1
# Collect scoring signals
runloop benchmark results --job rft-epoch-42-reward-signals --format json
```Thousands of sandboxes, orchestrated for training signal generation
Runloop provisions isolated sandboxes at the scale training loops demand -- fast enough that environment provisioning does not bottleneck the experiment, isolated enough that concurrent executions do not interfere, and instrumented enough that every execution produces clean scoring signals. Snapshots enable efficient environment reuse across training steps.
```python
import runloop
# Configure execution environments for RFT training loop
blueprint = runloop.blueprints.get("python-ml-training-env")
# Submit scenario batch for reward signal generation
job = runloop.benchmark_jobs.create(
name="rft-epoch-42-reward-signals",
scenarios=candidate_solutions,
blueprint_id=blueprint.id,
config={
"concurrency": 200,
"timeout_seconds": 120,
"retry_attempts": 1
}
)
# Collect scoring signals for training loop
results = runloop.benchmark_jobs.wait(job.id)
reward_signals = [(r.scenario_id, r.score) for r in results.runs]
```
```typescript
import Runloop from 'runloop';
// Configure execution environments for RFT training loop
const blueprint = await runloop.blueprints.get('python-ml-training-env');
// Submit scenario batch for reward signal generation
const job = await runloop.benchmarkJobs.create({
name: 'rft-epoch-42-reward-signals',
scenarios: candidateSolutions,
blueprintId: blueprint.id,
config: {
concurrency: 200,
timeoutSeconds: 120,
retryAttempts: 1
}
});
// Collect scoring signals for training loop
const results = await runloop.benchmarkJobs.wait(job.id);
const rewardSignals = results.runs.map(r => [r.scenarioId, r.score]);
``````bash
# Submit scenario batch for reward signal generation
runloop benchmark run \
--name "rft-epoch-42-reward-signals" \
--scenarios ./candidate-solutions.yaml \
--blueprint "python-ml-training-env" \
--concurrency 200 \
--timeout 120 \
--retry-attempts 1
# Collect scoring signals
runloop benchmark results --job rft-epoch-42-reward-signals --format json
```Infrastructure primitives for RFT and SFT at scale
Four capabilities from the [Runloop Platform](/product) that eliminate the gap between training framework and execution environment.
Define execution environments as code. Language runtimes, dependencies, and toolchains specified declaratively. Same Blueprints work for training, evaluation, and production.

Define scoring criteria per scenario: pass/fail, numeric scores, or custom evaluation functions. Structured results API returns clean signal for every execution.

Configurable concurrency limits match your training throughput. Retry policies handle transient failures without corrupting signal. Submit a BenchmarkJobDef and the platform manages the fleet.

Opaque tokens injected into sandboxes. Real credentials never exposed. Devbox-bound tokens expire on termination. Secure access to private repos during training.

Environment-grounded reward signals at training scale
RFT uses execution outcomes as reward signals. The training loop generates candidate solutions, executes them in sandboxed environments, scores against defined criteria, and feeds that signal back to the model. The infrastructure requirements are demanding: thousands of scenarios per epoch, each requiring a clean, isolated environment. Execution must be fast -- a training loop that waits minutes per sandbox will never converge. Scoring must be reliable -- noisy signals from flaky environments corrupt the training process.

Generate and validate SFT training data through execution
SFT uses curated input-output pairs to teach a model specific behaviors. For code tasks, generating high-quality training data often requires execution -- running candidate solutions to verify they actually work before including them in the training set. Runloop supports SFT workflows by providing the execution environments needed to validate training examples at scale. Generate candidates with a base model, execute in sandboxes to verify correctness, and filter for high-quality examples.

From candidate solutions to training signals
Four steps from evaluation design to reward signal collection.
Encode training tasks as scenario definitions with candidate solution, runtime environment (via Blueprint), and scoring contract. Same format for both RFT reward generation and SFT data validation.

Set concurrency, retry policy, and timeouts to match training loop throughput. Use a BenchmarkJobDef template for standardized runs or submit ad-hoc for experimental iterations.

Submit via API. The platform schedules across isolated sandboxes, manages the lifecycle, and streams progress. Each scenario runs in its own environment -- no shared state, no cross-contamination.

Retrieve structured results for every scenario: identifier, score, duration, pass/fail status. Feed signals directly into your training loop for weight updates.

The execution substrate that fine-tuning requires
Three capabilities no alternative provides
Standard container solutions require you to build scheduling, concurrency, retry, and result aggregation from scratch. Runloop provides this as infrastructure via the benchmark job orchestration layer.

Other sandboxes provide execution but not scoring infrastructure. Runloop combines execution with scoring contracts, structured results, and scenario definitions. Reward signals come from the platform.

Credential Gateway prevents candidate solutions from exfiltrating API keys. For regulated environments, VPC deployment keeps training data inside your boundary.

Fine-tuning infrastructure questions
Common questions about using Runloop as the execution layer for RFT, SFT, and RL experiments.
Runloop is an execution environment, not a training framework. It does not manage model weights, compute gradients, or run training loops. Runloop provides the infrastructure layer that training frameworks call when they need to execute and score code. Your training loop generates candidate solutions, submits them to Runloop for execution and scoring via the API, and receives structured results back. This separation means Runloop works alongside any training framework: custom PyTorch training loops, provider-hosted fine-tuning programs, or proprietary research infrastructure.
Runloop provisions isolated sandboxes quickly, and snapshot-and-restore reduces environment setup time further for environments with large dependency trees [VERIFY: boot time]. For RFT workloads running thousands of scenarios per epoch, the orchestration layer manages a pool of concurrent environments so that provisioning latency does not become the bottleneck in the training loop. Concurrency is configurable per job to match your training throughput requirements.
Runloop's fine-tuning infrastructure focuses on the execution and scoring side of the training loop -- running candidate code solutions in isolated environments and returning reward signals. The GPU-intensive portion (forward pass, gradient computation, weight updates) runs on your existing training infrastructure. Runloop handles the part that GPU clusters cannot: provisioning thousands of isolated code execution environments, running and scoring candidate solutions, and returning structured signals to your training loop.
Runloop is designed to serve as the execution substrate for provider fine-tuning programs that require environment-grounded evaluation. When a provider's RFT program needs to execute candidate solutions and collect scoring signals, Runloop provides the sandboxed environments where that execution happens. The platform's scoring contracts, scenario definitions, and structured results API produce the signal format these programs consume. Contact sales to discuss integration with specific provider programs.
Fine-tuning on proprietary code requires that training data, candidate solutions, and scoring results stay within your control boundary. Runloop executes each scenario in an ephemeral sandbox destroyed after execution. The Credential Gateway injects repository access tokens as opaque, devbox-bound credentials. For organizations with strict data governance requirements, Runloop supports deployment inside your own cloud infrastructure.