A visionary approach to machine learning | MIT News

Imagine two teams competing on a soccer field. Players can work together to achieve a goal and compete against other players with conflicting interests. This is how the game works.

Creating artificial intelligence agents that can learn to compete and cooperate as effectively as humans remains a thorny problem. A key challenge is to enable AI agents to predict future behaviors of other agents if they all learn at the same time.

Because of the complexity of this problem, current approaches tend to be short-sighted; Agents can only guess at the next few moves of their teammates or competitors, leading to poor performance in the long run.

Researchers from MIT, the MIT-IBM Watson AI Lab and elsewhere have developed a new approach that gives AI agents a farsighted perspective. Their machine learning framework allows cooperative or competitive AI agents to think about what other agents will do as time approaches infinity, rather than just a few next steps. The agents then adjust their behavior accordingly to influence the future behavior of other agents and arrive at an optimal, long-term solution.

This framework could be used by a group of autonomous drones working together to find a lost hiker in a dense forest, or by self-driving cars striving to keep passengers safe by anticipating future movements of other vehicles that might be traveling driving on a busy highway.

“When AI agents cooperate or compete, what matters most is when their behaviors will converge at some point in the future. There are many transient behaviors along the way that are not very important in the long run. Achieving this converged behavior is what we’re really passionate about, and we now have a mathematical way to make it possible,” says Dong-Ki Kim, a graduate student at MIT’s Laboratory for Information and Decision Systems (LIDS) and lead author of one Article to describe this framework.

The senior author is Jonathan P. How, the Richard C. Maclaurin Professor of Aerospace and a member of the MIT-IBM Watson AI Lab. Co-authors include others from MIT-IBM Watson AI Lab, IBM Research, the Mila-Quebec Artificial Intelligence Institute, and Oxford University. The research results will be presented at the Conference on Neural Information Processing Systems.

Video thumbnailplay video

In this demo video, the red robot, trained with the researchers’ machine learning system, is able to defeat the green robot by learning more effective behaviors that take advantage of its opponent’s ever-changing strategy.

More agents, more problems

Researchers focused on a problem known as multiagent reinforcement learning. Reinforcement learning is a form of machine learning in which an AI agent learns through trial and error. Researchers reward the agent for “good” behavior that helps it achieve a goal. The agent adjusts its behavior to maximize this reward, eventually becoming an expert at a task.

But when many cooperative or competing agents are learning at the same time, things become increasingly complex. As agents consider more future moves by their peers and how their own behavior affects others, the problem soon requires far too much computing power to solve efficiently. For this reason, other approaches only focus on the short term.

“The AIs really want to think about the end of the game, but they don’t know when the game ends. They have to think about how to adjust their behavior ad infinitum so that they can win at some point in the future. Essentially, our paper proposes a new goal that will enable an AI to think about infinity,” says Kim.

However, since it is impossible to fit infinitely into an algorithm, the researchers designed their system so that agents focus on a future point where their behavior converges with that of other agents, known as equilibrium. An equilibrium point determines the long-term performance of agents, and multiple equilibria can exist in a multi-agent scenario. An effective agent thus actively influences the future behavior of other agents in such a way that they achieve a balance that is desirable from the agent’s point of view. If all agents influence each other, they converge to an overall concept that the researchers call an “active equilibrium”.

The machine learning framework they developed, known as FURTHER (which stands for FUlly Reinforcing active impact with averagE Reward), allows agents to learn how to adjust their behavior when interacting with other agents to achieve that active balance .

FURTHER uses two machine learning modules for this. The first, an inference engine, allows an agent to guess the future behavior of other agents and the learning algorithms they use based solely on their past actions.

This information is fed into the reinforcement learning module, which the agent uses to adjust its behavior and influence other agents in a way that maximizes its reward.

“The challenge was to think about infinity. We had to use a lot of different mathematical tools to make this possible and make some assumptions to make it work in practice,” says Kim.

Win in the long run

They tested their approach against other multi-agent reinforcement learning frameworks in several different scenarios, including a pair of robots fighting sumo-style and a battle pitting two 25-agent teams against each other. In both cases, the AI ​​agents using FURTHER were more likely to win the games.

Because their approach is decentralized, meaning the agents learn to win the games independently, it’s also more scalable than other methods that require a central computer to control the agents, explains Kim.

Researchers used games to test their approach, but FURTHER could be used to tackle any type of multi-agent problem. For example, it could be applied by economists who want to develop sound policies in situations where many interacting stakeholders have behaviors and interests that change over time.

Economics is one application Kim is particularly excited about. He also wants to delve deeper into the concept of active balance and continue to improve the FURTHER framework.

This research is funded in part by the MIT-IBM Watson AI Lab.


Leave a Reply

Your email address will not be published. Required fields are marked *