TY - GEN
T1 - Multi-vehicle Flocking Control with Deep Deterministic Policy Gradient Method
AU - Xu, Zhao
AU - Lyu, Yang
AU - Pan, Quan
AU - Hu, Jinwen
AU - Zhao, Chunhui
AU - Liu, Shuai
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/8/21
Y1 - 2018/8/21
N2 - Flocking control has been studied extensively along with the wide application of multi-vehicle systems. In this paper the Multi-vehicles System (MVS) flocking control with collision avoidance and communication preserving is considered based on the deep reinforcement learning framework. Specifically the deep deterministic policy gradient (DDPG) with centralized training and distributed execution process is implemented to obtain the flocking control policy. First, to avoid the dynamically changed observation of state, a three layers tensor based representation of the observation is used so that the state remains constant although the observation dimension is changing. A reward function is designed to guide the way-points tracking, collision avoidance and communication preserving. The reward function is augmented by introducing the local reward function of neighbors. Finally, a centralized training process which trains the shared policy based on common training set among all agents. The proposed method is tested under simulated scenarios with different setup.
AB - Flocking control has been studied extensively along with the wide application of multi-vehicle systems. In this paper the Multi-vehicles System (MVS) flocking control with collision avoidance and communication preserving is considered based on the deep reinforcement learning framework. Specifically the deep deterministic policy gradient (DDPG) with centralized training and distributed execution process is implemented to obtain the flocking control policy. First, to avoid the dynamically changed observation of state, a three layers tensor based representation of the observation is used so that the state remains constant although the observation dimension is changing. A reward function is designed to guide the way-points tracking, collision avoidance and communication preserving. The reward function is augmented by introducing the local reward function of neighbors. Finally, a centralized training process which trains the shared policy based on common training set among all agents. The proposed method is tested under simulated scenarios with different setup.
UR - http://www.scopus.com/inward/record.url?scp=85053128664&partnerID=8YFLogxK
U2 - 10.1109/ICCA.2018.8444355
DO - 10.1109/ICCA.2018.8444355
M3 - 会议稿件
AN - SCOPUS:85053128664
SN - 9781538660898
T3 - IEEE International Conference on Control and Automation, ICCA
SP - 306
EP - 311
BT - 2018 IEEE 14th International Conference on Control and Automation, ICCA 2018
PB - IEEE Computer Society
T2 - 14th IEEE International Conference on Control and Automation, ICCA 2018
Y2 - 12 June 2018 through 15 June 2018
ER -