I. Introduction
Reinforcement learning (RL) is extending its boundaries to a variety of game genres. In player versus environment settings, such as those found in Atari 2600 games, RL agents have exceeded human level performance using various methods [5], [15], [16], [19]. Likewise, in player versus player (PVP) settings, neural networks combined with search-based methods beat the best human players in turn-based games with two or more players—such as Go, Chess [20], and Mahjong [28]. Recently, RL research in games has shifted focus to the PVP settings found in more complex video games such as StarCraft2 [24], Quake3 [10], and Dota2 [18]. Even grand-master level RL agents have been developed for StarCraft2 [29], which is a highly complex imperfect information game where an agent has to control multiple units at a time.