Agent libOS Kills Permission Escalation in Self-Evolving LLM Agents

Agent libOS completed all task plans and blocked every modeled unauthorized side effect across 27 deterministic safety benchmarks. That's zero permission-escalation incidents with a 7.0% conservative false-denial rate — meaning it's strict but not dumb.

What Self-Evolving Agents Break

LLM agents are evolving from fixed tool callers into long-running software actors. They accumulate memory, synthesize new tools with JIT code generation, fork child agents, attach remote resources, and checkpoint their entire state into reusable execution images. Each of these mechanisms exposes a new attack surface: if exposing an action also grants the authority to perform it, self-evolution becomes a permission-escalation path. The paper calls this out cleanly — every new capability you give an agent becomes a potential privilege escalation vector the moment the agent decides to evolve itself.

Capabilities, Not Sandboxes

Agent libOS represents each agent as an AgentProcess with explicit identity, object memory, message queues, a tool table, loaded Skills, a Deno/TypeScript JIT runtime, child processes, budgets, checkpoints, and capabilities. The critical invariant: model-visible affordances can evolve (new tools, new skills, new prompts), but resource authority changes only through audited runtime primitives. Filesystem, shell, human, memory, process, checkpoint, JSON-RPC, MCP, and PTY authority — none of these can be granted by an evolved action alone. The prototype enforces process-local namespaces, syscall-mediated JIT, trusted Runtime Modules, and object-bound PTY sessions. This is capability-based security applied to the agent runtime itself, not just wrapping a sandbox around it.

The Benchmark That Caught Wrappers Flat-Footed

The authors built 27 versioned deterministic tasks that model realistic self-evolution scenarios. Simple wrapper and sandbox baselines preserved task completion but failed most safety checks — they blocked nothing real. Agent libOS prevented all unauthorized side effects, and the 7% false-denial rate came from conservative overrides that a human-in-the-loop could clear. That false-denial rate is honest engineering: you want false positives in safety, not false negatives.

By making checkpoint-derived images and tool synthesis capability-controlled, Agent libOS turns self-evolution from a security headache into a safe, auditable pattern for long-running LLM agents.

Source: Agent libOS: A Runtime Substrate for Capability-Controlled Self-Evolving LLM Agents
Domain: arxiv.org

Agent libOS Kills Permission Escalation in Self-Evolving LLM Agents

What Self-Evolving Agents Break

Capabilities, Not Sandboxes

The Benchmark That Caught Wrappers Flat-Footed

More in Systems Engineering