Blog

The Latest in AI Development

Runloop provides infrastructure for building and deploying AI coding agents at scale. Explore tutorials, insights, and the future of AI-assisted development

Benchmarks

SWE-Bench Deep Dive: Unmasking the Limitations of a Popular Benchmark

Discover the hidden flaws in SWE-bench, a widely used benchmark for AI coding agents. Learn why deeper evaluation matters for real-world performance.

February 21, 2025

Model Performance

LLM Fine-Tuning Methods: Post-Training Optimization Techniques

Learn LLM fine-tuning methods like PEFT, LoRA, RLHF, and DPO, with practical tips to improve models after pre-training for real use.

February 16, 2025

AI Ecosystem

Latency VS Tokenization: The Trade-off Shaping LLM Research

Learn how latency vs tokenization trade-offs shape LLM design, with real examples and practical tips for developers.

February 11, 2025

Benchmarks

Evaluation != Benchmarking: Distinction in AI Generated Code

AI-generated code needs more than benchmarks. Real tests and evaluation help check quality, security, and performance in real use.

February 5, 2025

Benchmarks

Making Sure AI-Generated Code Actually Works

Learn how to verify AI-generated code with real tests, logs, and observability. Go beyond pass/fail to ensure reliability in production systems.

February 2, 2025

Model Performance

How Knowledge Distillation Powers Efficient AI Models

Knowledge distillation makes big AI models smaller and faster without losing much quality. It helps run powerful tech on phones and other small devices.

February 2, 2025

Benchmarks

Assessing AI Code Quality: 10 Critical Dimensions for Evaluation

Struggling to assess AI-generated code quality? Learn how to evaluate correctness, efficiency, and security for comprehensive results.

February 1, 2025

Benchmarks

Understanding LLM Code Benchmarks: From HumanEval to SWE-bench

See the evolution of AI code benchmarks from simple tests to SWE-bench and LiveCodeBench, measuring integration, system design, and engineering.

January 31, 2025

Coding Agents

Function-Calling VS Model Context Protocol (MCP): Complete Guide

Function-calling and MCP help shape LLM output for real use. They make AI more reliable, but each has different strengths and use cases.

January 27, 2025

Coding Agents

Model Context Protocol (MCP) - Understanding the Game-Changer

MCP lets LLMs plug into real tools and data. With support from GitHub, Slack, Cloudflare, and Sentry, it makes AI way more useful in real work.

January 25, 2025

Coding Agents

Mastering LLM Function Calling: A Guide to Enhancing AI Capabilities

Function calling lets LLMs do real actions, not just text. It can order stuff or automate tasks using JSON schemas and tools like LangChain.

January 23, 2025

Product

Runloop Devbox: The Future of AI-Driven Development Environments

Discover how Runloop Devboxes are revolutionizing software development with AI-optimized environments, advanced security features, and intelligent resource management for modern dev teams.

January 21, 2025

Product

Announcing: Transparent Proxy for Runloop Tunnels

If you're building a product on top of Runloop devboxes, your end users probably shouldn't see Runloop tunnel URLs. Today we're shipping a small but useful feature: a new X-Runloop-Host header that lets you front Runloop tunnels with your own domain without rewriting the request.

May 20, 2026

Product

Working with Runloop's Axons: Introducing Remote Agent SDK

Runloop Launches Remote Agents SDK: An SDK for Working with Remotely Hosted Agents

May 5, 2026

Product

Axons: Distributed Event Streams for Agents at Scale

Introducing Axons: Runloop’s secure, distributed event stream for scalable, multi-client agents with audit trails, structured state, and full suspend/resume.

April 27, 2026

Product

Runloop 🤝 OpenAI Agents SDK Provider Integration

The OpenAI Agents SDK fixes an important part of this by giving developers a clean way to define agent behavior. You can structure reasoning, attach tools, and build workflows that actually resemble programs.

April 15, 2026

Product

Wake on HTTP: Spin Up Agents On-Demand with Tunnels

Now you can run an agent, suspend the box, and wake it up again when you need it

April 2, 2026

Product

Safer Agents with Network Policies

This feature addresses a fundamental challenge: AI agents need controlled access to external resources. Network Policies make network access explicit, auditable, and enforceable—so teams can safely run agents, limit outbound traffic, and meet security and compliance requirements by default.

January 23, 2026

Product

Backends for AI Agents That Handle Sensitive Data (Runloop AI TaxMan Series)

A practical walkthrough of building production-ready AI agents for tax form document processing - with agentic architectural patterns you can apply to any compliance-sensitive use case.

January 21, 2026

Product

RLI - An interactive CLI for interacting with the Runloop.ai platform

rli released! rli is a TUI for developers who want the ability to manage devboxes without leaving their terminal, scriptable commands for automation, and faster workflows for repetitive tasks

January 19, 2026

Product

Object Store Generally Available

Discover the general availability of our Object Storage SDK: a first-class way to ship binaries, datasets, and runtime resources onto devboxes.

January 1, 2026

Model Performance

RAG in an Era of Fine-Tuning: Understanding RAFT's Evolution

RAFT mixes RAG and fine-tuning to boost LLM performance in specific fields. It improves accuracy and makes models faster and more useful for real tasks.

March 5, 2025

Model Performance

Q-Learning for LLMs: Smarter AI with Reinforcement Learning

Q-learning helps LLMs make choices and handle tasks. It improves reasoning and make AI more useful in jobs like coding agents.

March 4, 2025

Model Performance

Remember Reinforcement Learning? It's Never Been More Relevant

How reinforcement learning moved from research labs to powering modern LLMs and Runloop.ai’s self-improving agent workflows.

February 24, 2025

Model Performance

LLM Fine-Tuning Methods: Post-Training Optimization Techniques

Learn LLM fine-tuning methods like PEFT, LoRA, RLHF, and DPO, with practical tips to improve models after pre-training for real use.

February 16, 2025

Model Performance

How Knowledge Distillation Powers Efficient AI Models

Knowledge distillation makes big AI models smaller and faster without losing much quality. It helps run powerful tech on phones and other small devices.

February 2, 2025

AI Ecosystem

Functional Correctness: Ensuring AI-Generated Code Works

Discover how to ensure AI-generated code works flawlessly. Boost reliability, prevent costly errors, and build user trust with proven testing methods.

April 19, 2025