CCKSは、MARLエージェントがコンセンサス制限を通じて悪いアドバイスを無視させます

Most action-advising schemes in decentralized multi-agent RL turn students into obedient copycats, degrading stability and performance when teacher and student aren't compatible.

The new CCKS framework from this paper fixes that by forcing agents to evaluate advice against a consensus-derived constraint before accepting it. Instead of blindly following a teacher, the agent scores candidate actions based on both shared knowledge and how well they align with a consensus model built during training.

Contrastive Learning Builds the Consensus from Local Observations

The trick is the consensus model itself. During training, each agent uses contrastive learning on its own local observations to construct a representation of what actions the group would agree on. No centralized oracle — just local signals yielding a shared understanding of cooperative behavior.

At action selection time, the agent runs a scoring function that weighs the teacher's recommendation against this consensus model. If the advice deviates too far from consensus, the agent explores or falls back to its own policy. This balances exploration with learning from experienced peers without the overfitting that plagues vanilla action advising.

Plug-and-Play Integration Beats DTDE Baselines in Two Hard Environments

CCKS is designed as a drop-in module for existing Decentralized Training and Decentralized Execution (DTDE) algorithms. No architecture rewrites, no new training loops.

The authors tested it in Google Research Football and the StarCraft II Multi-Agent Challenge (SMAC). Across both environments, CCKS-integrated agents improved cooperation efficiency, learning speed, and overall performance compared to current DTDE baselines. While the abstract doesn't cite specific win rates or reward numbers, the claim is consistent across two distinct, well-known benchmarks.

The code is available at https://github.com/yuanxpy/CCKS — meaning you can replicate or adapt this tomorrow. If you're building multi-agent systems where agents share advice, this consensus constraint is the simplest fix I've seen for the “blind follower” problem.

Source: CCKS: Consensus-based Communication and Knowledge Sharing
Domain: arxiv.org

CCKSは、MARLエージェントがコンセンサス制限を通じて悪いアドバイスを無視させます

Contrastive Learning Builds the Consensus from Local Observations

Plug-and-Play Integration Beats DTDE Baselines in Two Hard Environments

More in Artificial Intelligence