hammer: Multi-level coordination of reinforcement learning agents via learned messaging

Gupta, Nikunj; Srinivasaraghavan G.; Mohalik, Swarup; Kumar, Nishant; Taylor, Matthew E.

dc.contributor.author	Gupta, Nikunj
dc.contributor.author	Srinivasaraghavan G.
dc.contributor.author	Mohalik, Swarup
dc.contributor.author	Kumar, Nishant
dc.contributor.author	Taylor, Matthew E.
dc.date.accessioned	2024-02-08T12:06:24Z
dc.date.available	2024-02-08T12:06:24Z
dc.date.issued	2023-10-24
dc.identifier.issn	09410643
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/2853
dc.description	This paper published with affiliation IIT (BHU), Varanasi in Open Access Mode.	en_US
dc.description.abstract	Cooperative multi-agent reinforcement learning (MARL) has achieved significant results, most notably by leveraging the representation-learning abilities of deep neural networks. However, large centralized approaches quickly become infeasible as the number of agents scale, and fully decentralized approaches can miss important opportunities for information sharing and coordination. Furthermore, not all agents are equal—in some cases, individual agents may not even have the ability to send communication to other agents or explicitly model other agents. This paper considers the case where there is a single, powerful, central agent that can observe the entire observation space, and there are multiple, low-powered local agents that can only receive local observations and are not able to communicate with each other. The central agent’s job is to learn what message needs to be sent to different local agents based on the global observations, not by centrally solving the entire problem and sending action commands, but by determining what additional information an individual agent should receive so that it can make a better decision. In this work, we present our MARL algorithm hammer, describe where it would be most applicable, and implement it in the cooperative navigation and multi-agent walker domains. Empirical results show that (1) learned communication does indeed improve system performance, (2) results generalize to heterogeneous local agents, and (3) results generalize to different reward structures.	en_US
dc.description.sponsorship	This work commenced at Ericsson Research Laboratory Bangalore, and most of the follow-up work was done at the International Institute of Information Technology-Bangalore.3 Part of this work has taken place in the Intelligent Robot Learning (IRL) Laboratory at the University of Alberta, which is supported in part by research grants from Alberta Innovates; the Alberta Machine Intelligence Institute (Amii); a Canada CIFAR AI Chair, Amii; Compute Canada; Huawei; Mitacs; and NSERC. We would like to thank Laura Petrich, Shahil Mawjee and anonymous reviewers for comments and suggestions on this paper.	en_US
dc.language.iso	en	en_US
dc.publisher	Springer Science and Business Media	en_US
dc.relation.ispartofseries	Neural Computing and Applications;
dc.subject	Heterogeneous agent learning	en_US
dc.subject	Learning to communicate	en_US
dc.subject	Multi-agent reinforcement learning	en_US
dc.title	hammer: Multi-level coordination of reinforcement learning agents via learned messaging	en_US
dc.type	Article	en_US