Learning to Cooperate in Multi-Agent Environments

Photo by Matan Segev from Pexels

By Jiachen Yang

Over the years, human intelligence has evolved to work within a shared environment with other humans to do more than play Atari games or solve Rubik’s cubes alone in our rooms. The presence of other people demands our ability to handle a wide spectrum of complex interactions — we cooperate with colleagues on projects, compete against opponents in strategic team sports, and negotiate with other parties to settle contracts.

While we handle all of these competing priorities and tasks, there has been an increased effort in developing general intelligent agents to help us handle this complex environment. There will likely come a time when these agents also no longer have to work in isolation, but with each other.

Teaching AI to Work Together to Improve Performance

In our paper, CM3: Cooperative Multi-goal Multi-stage Multi-agent, Reinforcement Learning, which has been accepted to the International Conference on Learning Representations (ICLR) 2020, my collaborators and I propose an algorithm called CM3, which tackles two challenges within a novel curriculum learning framework, where progressing from an easier task to a harder task improves overall performance.

This work will help agents work in a fully-cooperative agent setting, which is particularly relevant to our daily lives.

In scenarios like coordinating a fleet of autonomous vehicles, optimizing distributed logistics, and controlling a city-wide network of traffic signals, we may wish to optimize a single global measurement of performance. However, the scale of the problem requires decentralized execution: each agent (e.g., a vehicle, a delivery robot, a traffic light controller) must take cooperative actions based on its own local observations of the environment, instead of relying on a single controller to issue commands. 

In addition, many real-world scenarios have a multi-goal nature: the global optimum is attained precisely when all agents cooperate to reach their individual goals. In autonomous driving for example, one vehicle may wish to exit a highway lane when another adjacent vehicle attempts to merge into the highway, and both must cooperate for mutual success. 

Giving Credit Where Credit is Due

This multi-goal, multi-agent setting poses two significant challenges. An inherent challenge in cooperative multi-agent learning is the problem of multi-agent credit assignment 

For example, we probably have all seen the scenario in soccer where a striker celebrates flamboyantly on the field after scoring a goal. How much credit really is due to this single-player? What about the second striker who made a perfect pass, the midfielder who broke past the opponent’s defense, or even the goalkeeper who kicked to a particular region of the field and started the whole sequence of play?

The credit assignment problem is already apparent in the case when all agents have a single team goal, and it is more pronounced in the multi-goal setting when agents must learn about their effect on one another’s individual goals.

To address credit assignment in the multi-goal setting, we propose an explicit mechanism for each agent to improve its behavior by evaluating the long-term impact of its own action on another agent’s goal. For example, if an agent vehicle blocked another vehicle from exiting a highway lane, the former would learn to avoid such noncooperative actions.

antipodal merge. random

Improving Exploration

The second challenge we address is exploration, which is a significant challenge of cooperative multi-goal learning. 

To improve exploration, we first train an agent to achieve individual goals in an induced single-agent setting without the presence of other agents. Once the agent has mastered or makes progress toward all goals, we initialize all agents with those pre-trained parameters and instantiate them in the multi-agent environment. The key intuition is that agents who can already act toward individual goals are better prepared to encounter those areas of multi-agent state space where cooperative solutions can be easily discovered with additional exploration. 

Evaluating CM3 

We experimentally evaluated CM3 on diverse and challenging simulated multi-goal multi-agent environments and found that CM3 learns significantly faster and is more robust than existing multi-agent algorithms. The video above shows a particular environment in which agents must navigate to individual goal locations that correspond to their own colors while avoiding collisions. We find that agents attain their individual goals while enabling the success of the group, by choosing cooperative rather than greedy actions. We believe these results are a promising step toward a more cooperative multi-agent future.


Jiachen Yang, Alireza Nakhaei, David Isele, Kikuo Fujimura, Hongyuan Zha. CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning. ICLR 2020.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.