Why AI Agent Policy Isn't Enough

Last weekend, Jer Crane, founder of PocketOS, went viral after sharing how his coding AI agent made his production database disappear in nine seconds.

His agent was working on a routine staging task when it hit a credential mismatch. It decided the fastest path forward was to delete a Railway volume, roughly the equivalent of tearing down a building because you tried using the wrong key.

That should have been impossible. Agent policies specifically required human approval before any destructive action. At a minimum, the deletion should have been scoped to the staging environment.

But Jer's agent went ahead anyway. It found an API token in a completely unrelated file and used it to execute the deletion. Nine seconds later, everything was gone.

When Jer asked the agent to explain itself, it said: "I violated every principle I was given. I guessed instead of verifying. I ran a destructive action without being asked."

Instructions cannot guarantee enforcement

The instinct after hearing what Jer's agent did is to distrust agents entirely. Capabilities have moved faster than anyone anticipated, and the safety infrastructure hasn't kept up.

But pulling back isn't a real option. AI agents have proven genuinely capable of work that empowers individuals and small teams to operate at a pace and scale that would've been impossible just a few months ago. The pressure to deploy agents into increasingly complex, high-leverage workflows is intensifying. Leadership wants an AI strategy, while investors want efficiency gains. Competitors are shipping faster. Agents are going into real workflows whether the security infrastructure is ready or not.

The problem is that system prompt policy isn't enough to protect against failure when agents have real capabilities and real access to real systems. As Jer's example shows, policies can nudge behavior and prevent some mistakes, but they're not a safeguard against catastrophes when an agent can simply decide to ignore them.

Policy is text. Models read it, but they don't always enforce it. Under ambiguity or task pressure, in-context reasoning degrades. That is assuming genuine intentions in the first place.

Prompt injection is an active and growing attack surface. An agent that can ignore its rules by a confused task state can be argued into ignoring them by a malicious input, as text can always be manipulated by more text. The same flexibility that makes agents capable of handling ambiguous instructions makes them vulnerable to adversarial ones.

Jer isn't alone. Amazon reportedly dealt with its own AI agent deleting production code, contributing to an AWS outage last December. These aren't isolated incidents limited to scrappy teams or solo founders. They're an early signal of what happens at every scale when agents have more access than they need.

System prompts are advisory guidelines, not an enforcement layer. And as agents get trusted with more autonomy, more tools, and more access, that becomes an increasingly unreliable leash. This is a structural problem.

Catastrophe needs to be structurally impossible

The answer is to design agent safety at the infrastructure level from the start. Instead of trying to prevent bad behavior through instructions, constrain it by deliberately limiting what's possible in the first place. An agent can't violate rules it physically can't reach.

In practice, this has four layers.

Isolation inside a microVM: Each agent task runs in its own isolated compute environment: a lightweight virtual machine spun up for the task and torn down when it's done. The agent can't see other processes, can't access the broader host system, and can't persist anything outside its designated workspace. The environment itself is the boundary.

Network policies: Outbound network access is explicitly defined and enforced at the infrastructure level. The agent can only reach the endpoints it's been granted access to for this specific task. Everything else is blocked by the network layer, not a rule in the prompt. The connection simply never happens.

Scoped tokens: Credentials are provisioned per task and limited to the exact permissions the task requires. A token that can read a database cannot write to it. A token that can manage domains cannot touch volumes. The scope is defined at creation, not enforced by the model at runtime.

Tool restrictions: The set of tools available to an agent is explicitly provisioned for each task. If a tool isn't on the list, it doesn't exist from the agent's perspective. There's no instruction telling the agent not to use it. It simply isn't there.

Run the PocketOS incident back through that lens, with infrastructure-level controls in place. With a credential gateway, a domain-management token cannot call volumeDelete because that permission simply doesn't exist in the API layer. Even if every policy fails and the agent goes rogue, there's nothing to reach.

This means Jer's database is still there. The customers' reservations are still there. The founder is not spending Saturday scrambling to reconstruct bookings from Stripe receipts.

Agent trust is what turns capable into useful

The PocketOS story represents a tension that's only going to intensify: agents can do more, which means they need more access, and more access means more ways for something to go wrong.

That possibility will slow teams down. Without trust, teams design around agents rather than with them. The most capable agents get kept away from the systems where they'd generate real value, and they never get the chance to prove what they can actually do.

The gap between what agents can do and what teams are willing to let them do isn't a capability problem. It's a trust problem. And trust doesn't come from better instructions. It comes from an environment where the nine-second catastrophe is structurally impossible.

Why AI Agent Policy Isn't Enough

Why AI Agent Policy Isn't Enough

Instructions cannot guarantee enforcement

Catastrophe needs to be structurally impossible

Agent trust is what turns capable into useful

Enjoyed This Article?

Take a Look At Our Latest Blogs

Get Started With Runloop

Get Started With Runloop

Cookie Settings

Why AI Agent Policy Isn't Enough

Why AI Agent Policy Isn't Enough

Instructions cannot guarantee enforcement

Catastrophe needs to be structurally impossible

Agent trust is what turns capable into useful

Enjoyed This Article?

Take a Look At Our Latest Blogs

Get Started With Runloop

Get Started With Runloop