Cooperative defense of autonomous surface vessels with quantity disadvantage using behavior cloning and deep reinforcement learning

Siqing Sun, Tianbo Li, Xiao Chen, Huachao Dong, Xinjing Wang

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Autonomous Surface Vessels (ASVs) excel at undertaking hazardous tasks, garnering significant attention recently. Particularly, ASV cooperative defense is a crucial application for protecting harbors and combating smugglers. Here, ASVs intercept intruders from reaching a protected region. Unlike most research, which assumes defenders with numerical advantages, this work considers a more practical defense mission with fewer defenders, defender damages, and intruders employing evasion strategies. However, interception challenges are also introduced, including ASV underactuated dynamics, a limited interception time window, and environmental nonstationarity. Directly applying existing defense methods to such missions may not achieve success. To surmount the challenges, we propose an ASV decision-making framework by integrating supervised learning and deep reinforcement learning. Initially, supervised learning uses actions from a bi-level controller to train ASVs, addressing underactuated dynamics and aiding policy convergence. Subsequently, deep reinforcement learning explores more effective policies to enhance interception rates. Furthermore, hybrid rewards are meticulously designed to drive policy optimizations while mitigating environmental nonstationarity. Finally, numerical simulations are carried out to verify the effectiveness of our approach.

Original languageEnglish
Article number111968
JournalApplied Soft Computing
Volume164
DOIs
StatePublished - Oct 2024

Keywords

  • Autonomous surface vessels
  • Behavior cloning
  • Cooperative defense
  • Deep reinforcement learning
  • Reward design

Fingerprint

Dive into the research topics of 'Cooperative defense of autonomous surface vessels with quantity disadvantage using behavior cloning and deep reinforcement learning'. Together they form a unique fingerprint.

Cite this