TY - JOUR
T1 - Cooperative defense of autonomous surface vessels with quantity disadvantage using behavior cloning and deep reinforcement learning
AU - Sun, Siqing
AU - Li, Tianbo
AU - Chen, Xiao
AU - Dong, Huachao
AU - Wang, Xinjing
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/10
Y1 - 2024/10
N2 - Autonomous Surface Vessels (ASVs) excel at undertaking hazardous tasks, garnering significant attention recently. Particularly, ASV cooperative defense is a crucial application for protecting harbors and combating smugglers. Here, ASVs intercept intruders from reaching a protected region. Unlike most research, which assumes defenders with numerical advantages, this work considers a more practical defense mission with fewer defenders, defender damages, and intruders employing evasion strategies. However, interception challenges are also introduced, including ASV underactuated dynamics, a limited interception time window, and environmental nonstationarity. Directly applying existing defense methods to such missions may not achieve success. To surmount the challenges, we propose an ASV decision-making framework by integrating supervised learning and deep reinforcement learning. Initially, supervised learning uses actions from a bi-level controller to train ASVs, addressing underactuated dynamics and aiding policy convergence. Subsequently, deep reinforcement learning explores more effective policies to enhance interception rates. Furthermore, hybrid rewards are meticulously designed to drive policy optimizations while mitigating environmental nonstationarity. Finally, numerical simulations are carried out to verify the effectiveness of our approach.
AB - Autonomous Surface Vessels (ASVs) excel at undertaking hazardous tasks, garnering significant attention recently. Particularly, ASV cooperative defense is a crucial application for protecting harbors and combating smugglers. Here, ASVs intercept intruders from reaching a protected region. Unlike most research, which assumes defenders with numerical advantages, this work considers a more practical defense mission with fewer defenders, defender damages, and intruders employing evasion strategies. However, interception challenges are also introduced, including ASV underactuated dynamics, a limited interception time window, and environmental nonstationarity. Directly applying existing defense methods to such missions may not achieve success. To surmount the challenges, we propose an ASV decision-making framework by integrating supervised learning and deep reinforcement learning. Initially, supervised learning uses actions from a bi-level controller to train ASVs, addressing underactuated dynamics and aiding policy convergence. Subsequently, deep reinforcement learning explores more effective policies to enhance interception rates. Furthermore, hybrid rewards are meticulously designed to drive policy optimizations while mitigating environmental nonstationarity. Finally, numerical simulations are carried out to verify the effectiveness of our approach.
KW - Autonomous surface vessels
KW - Behavior cloning
KW - Cooperative defense
KW - Deep reinforcement learning
KW - Reward design
UR - http://www.scopus.com/inward/record.url?scp=85198558605&partnerID=8YFLogxK
U2 - 10.1016/j.asoc.2024.111968
DO - 10.1016/j.asoc.2024.111968
M3 - 文章
AN - SCOPUS:85198558605
SN - 1568-4946
VL - 164
JO - Applied Soft Computing
JF - Applied Soft Computing
M1 - 111968
ER -