Source linked

CCKSは、MARLエージェントがコンセンサス制限を通じて悪いアドバイスを無視させます

Plug-and-play フレームワークは、対照的な学習を使用してコンセンサスモデルを構築し、エージェントが盲目的に教師に従うのを防ぎ、Google Research Football と StarCraft II の協力を強化します。

ccksmulti agent reinforcement learningcontrastive learninggoogle research footballstarcraft iidtde

Most action-advising schemes in decentralized multi-agent RL turn students into obedient copycats, degrading stability and performance when teacher and student aren't compatible.

The new CCKS framework from this paper fixes that by forcing agents to evaluate advice against a consensus-derived constraint before accepting it. Instead of blindly following a teacher, the agent scores candidate actions based on both shared knowledge and how well they align with a consensus model built during training.

Contrastive Learning Builds the Consensus from Local Observations

The trick is the consensus model itself. During training, each agent uses contrastive learning on its own local observations to construct a representation of what actions the group would agree on. No centralized oracle — just local signals yielding a shared understanding of cooperative behavior.

At action selection time, the agent runs a scoring function that weighs the teacher's recommendation against this consensus model. If the advice deviates too far from consensus, the agent explores or falls back to its own policy. This balances exploration with learning from experienced peers without the overfitting that plagues vanilla action advising.

Plug-and-Play Integration Beats DTDE Baselines in Two Hard Environments

CCKS is designed as a drop-in module for existing Decentralized Training and Decentralized Execution (DTDE) algorithms. No architecture rewrites, no new training loops.

The authors tested it in Google Research Football and the StarCraft II Multi-Agent Challenge (SMAC). Across both environments, CCKS-integrated agents improved cooperation efficiency, learning speed, and overall performance compared to current DTDE baselines. While the abstract doesn't cite specific win rates or reward numbers, the claim is consistent across two distinct, well-known benchmarks.

The code is available at https://github.com/yuanxpy/CCKS — meaning you can replicate or adapt this tomorrow. If you're building multi-agent systems where agents share advice, this consensus constraint is the simplest fix I've seen for the “blind follower” problem.


Source: CCKS: Consensus-based Communication and Knowledge Sharing
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.