Nowadays, intelligent unmanned vehicles, such as unmanned aircraft and tanks, are involved in many complex tasks in the modern battlefield. They compose the networked intelligent systems with varying degrees of operational autonomy, which will continue to be used increasingly on the future battlefield. To deal with such a highly unstable environment, intelligent agents need to collaborate to explore the information and achieve the entire goal. In this paper, we will establish a novel comprehensive cooperative deep deterministic policy gradients (C2DDPG) algorithm by designing a special reward function for each agent to help collaboration and exploration. The agents will receive states information from their neighboring teammates to achieve better teamwork. The method is demonstrated in a real-time strategy game, StarCraft micromanagement, which is similar to a battlefield with two groups of units.
|