Source linked

SidConArena Benchmark Shows LLM Agents Flunk Positive-Sum Bargaining

A new three-phase benchmark reveals top LLMs misvalue resources, bargain passively, and fail at long-term investment planning - even when they win the tournament.

sidconarenallm agentsmulti agent systemsbenchmarkingnegotiationeconomic planning

In SidConArena, even the strongest frontier LLM agents consistently misvalue resources and bargain passively — a benchmark that exposes how far we are from agents that can handle real-world economic negotiation.

SidConArena, described in arXiv 2606.27397, formalizes a multi-player economy as a finite-horizon partially observable stochastic game with three coupled phases: natural-language negotiation with binding trades, deterministic converter-based production, and sealed-bid auctions for long-term assets. Agents negotiate deals using free-form language, then deploy those resources in a production step that converts inputs into outputs with fixed ratios, and finally bid in sealed-bid auctions for durable capital goods. The interface combines structured observations, phase-aware agent dispatching, a neural-symbolic action interface, and asynchronous execution — letting agents interact freely while keeping evaluation rule-grounded.

Three Phases That Expose Agent Weaknesses

Each phase targets a distinct failure mode. During negotiation, agents must create positive-sum surplus through binding trade deals — not just zero-sum haggling. The production phase requires converting raw resources into finished goods under known but dynamic conversion tables. The sealed-bid auction forces agents to value long-term assets whose payoff only materializes after several rounds.

Across homogeneous tournaments (all agents from the same model) and heterogeneous tournaments (mixed models), stronger frontier models consistently achieve higher economic outcomes. Yet the paper reports that all tested agents still misvalue resources — they consistently overpay for assets that won't pay off or undervalue critical production inputs. They bargain passively, rarely making first offers or proposing creative deals. And their long-horizon investment planning remains limited: agents fail to accumulate capital across rounds even when doing so would dominate short-term consumption.

What SidConArena Tells Us About Next Steps

The benchmark itself is a solid engineering contribution — the environment is available and reproducible. But the empirical gap it exposes is the real news. Current LLM agents can handle zero-sum games like Diplomacy or poker, but positive-sum negotiation with delayed returns is a fundamentally harder problem. The authors suggest that future work needs to explicitly train agents on resource valuation under uncertainty and multi-step planning, possibly using reinforcement learning with structured state representations.

SidConArena won't be the last word in agentic bargaining benchmarks, but it sets a clear bar: if your agent can't navigate a mixed-motive economy with delayed returns, it's not ready for real-world trade or contract negotiation.


Source: SidConArena: An Environment Evaluating Agents in Open-Ended,Positive-Sum Bargaining Game
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.