Source linked

Queen-Bee-Architektur erreicht 96,4% Task-Erfolg mit Null Governance-Fehlern

Ein regiertes Multi-Agent-System namens Queen-Bee erreicht eine Task-Success-Rate von 96,4% über 59 Enterprise-Style-Tasks und behält gleichzeitig Null-Governance-Verstöße.

queen beemcpmodel context protocolmulti agent systemsenterprise governancearxiv

96.4% task success rate with zero governance failures across 59 enterprise-style tasks — that’s the headline from the Queen-Bee architecture, a governed multi-agent system designed to keep LLMs on a short leash inside enterprise environments.

The Queen Control Plane and BeeSpec Execution

Queen-Bee splits the agent brain into two distinct roles. A central Queen control plane handles capability retrieval, task planning, and compiles a structured BeeSpec — think of it as a signed work order that binds each subtask to specific tools, data tenants, and policy constraints. Specialized Bee agents then execute those BeeSpecs under constrained tool access. No Bee agent can reach outside its scoped permissions. The architecture plugs into Model Context Protocol (MCP) interfaces for private tools and internal knowledge, with tenant-scoped connectors and audit-backed runtime governance.

Evaluation on Enterprise-Style Benchmarks

The team tested Queen-Bee on 59 tasks spanning governance-sensitive requests, retrieval-driven provisioning, scoped local execution, and chemistry workflow integration. The retrieval-driven variant (Queen-Bee with a lightweight structured retriever for capability lookup) hit that 0.964 success rate while the static Queen-Bee baseline and a permissive single-agent baseline both fell behind. The paper also reports a multi-Bee chemistry workflow with explicit approval gating and a top-3 shortlist grounded in real upstream evidence — no hand-wavy results.

Hybrid retrieval and LLM-guided provisioning backends were tried but didn't beat the simple structured retriever on the current small, highly structured capability registry. That’s a pragmatic finding: richer isn’t always better when the domain is constrained.

Why This Matters for Enterprise Agent Platforms

Most agent demos show off raw capability — “look, it can book a flight!” — but enterprises care about who can do what, to which data, under whose approval. Queen-Bee directly addresses governed provisioning, isolation behavior, and scoped execution quality. The paper is upfront that this is prototype-level evidence, not a production deployment study. But the zero governance failures across 59 tasks is a concrete bar that future enterprise agent platforms will need to clear.

Enterprise deployments should stop evaluating agents solely on task completion and start measuring governed provisioning and artifact-aware coordination. Queen-Bee shows one viable path to that.


Source: Queen-Bee Agents: A BeeSpec-Centered Architecture for Governed Enterprise MCP Orchestration
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.