Personality composition in multi-agent LLM teams is far from a one-size-fits-all lever: low-agreeableness prompting tanks performance on open-ended collaboration and bargaining, yet barely registers on structured coding tasks. That's the core finding from a new study on arXiv (2606.27443) that systematically manipulated personality traits across frontier LLMs on three task domains: structured coding, open-ended research collaboration, and competitive bargaining.
Personality Shifts Communication, Not Outcomes — In Coding
When agents are prompted with low agreeableness, they produce adversarial language; high-agreeableness prompts yield cooperative chatter. Prior work confirmed those behavioral shifts, but nobody traced whether they actually changed objective task outcomes. The researchers found that in coding tasks, those large communication shifts had almost no effect on milestone completion. You can field an abrasive coder that argues with teammates, and the project still ships on time. The structure of the task — clear specs, deterministic pass/fail — insulates the outcome from social friction.
Open-Ended Tasks Reveal the Real Cost of Adversarial Agents
Switch to open-ended research collaboration or competitive bargaining, and the same low-agreeableness manipulation substantially degrades performance. Without a rigid process to enforce progress, adversarial language derails cooperation, blocks consensus, and sinks the final product. The effect is not subtle: the paper describes it as “substantial degradation.” Personality composition matters precisely when the task lacks a hard scaffold.
Design Implications for Multi-Agent Systems
For teams building multi-agent frameworks, this is a concrete design rule: tune personality prompts to the task structure. Gating agents on coding pipelines? You can save your token budget on cooperative prompting. Deploying agents for brainstorming, negotiation, or open-ended research? Invest in agreeableness prompts or risk watching your agents talk themselves into failure. The study also hints at limits: personality manipulation is not a universal knob — its effectiveness depends entirely on how much the task relies on free-form coordination.
The next step is obvious: feed these findings back into prompt engineering tools and agent orchestration layers so that systems automatically match personality composition to task type, rather than blindly applying one persona across all workflows.
Source: When Does Personality Composition Matter for Multi-Agent LLM Teams?
Domain: arxiv.org
Comments load interactively on the live page.