TY - JOUR
T1 - A plug-and-play fully on-the-job real-time reinforcement learning algorithm for a direct-drive tandem-wing experiment platforms under multiple random operating conditions
AU - Minghao, Zhang
AU - Bifeng, Song
AU - Xiaojun, Yang
AU - Liang, Wang
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/5/15
Y1 - 2025/5/15
N2 - This study addresses the motion control problem of the Direct-Drive Tandem-Wing Experiment Platform (DDTWEP), focusing on designing effective direct and transitional operating strategies for pitch, roll, and yaw under nonlinear, unsteady aerodynamic interference caused by high-frequency oscillations and closely spaced tandem wings by leveraging advanced artificial intelligence (AI) techniques. The Concerto Reinforcement Learning Extension (CRL2E) algorithm, a novel AI approach, is proposed to tackle this challenge, featuring the innovative Physics-Inspired Rule-Based Policy Composer strategy and experimental validation. The results demonstrate that the CRL2E algorithm maintains safety and efficiency throughout the training process, even with randomly initialized policy weights. In DDTWEP's plug-and-play, fully on-the-job motion control problem, the algorithm achieves a performance improvement of at least fourteen-fold and up to sixty-six-fold within the first five hundred interactions compared to Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Furthermore, to further verify the rationality and performance of the module and algorithm design, this study introduces two perturbations: Time-Interleaved Capability Perturbation and Composer Perturbation, and develops multiple algorithms for comparative experiments. The experimental results show that compared to existing Concerto Reinforcement Learning (CRL) frameworks, the CRL2E algorithm achieves an 8.3%–60.4% enhancement in tracking accuracy, a 36.11%–57.64% improvement in convergence speed over the CRL with Composer Perturbation algorithm, and a 43.52%–65.85% improvement over the CRL with Time-Interleaved Capability Perturbation and Composer Perturbation algorithms, indicating the rationality of the CRL2E algorithm design. Regarding generalizability, the CRL2E algorithm demonstrates significant applicability in quadrotor flight control, highlighting its potential versatility. From a technical affinity perspective, the CRL2E algorithm is well-suited for integrating pretraining techniques, demonstrating excellent safety and efficiency in addressing cross-task plug-and-play and fully on-the-job fine-tuning problems. Regarding deplorability, hardware requirements were analyzed through ten thousand runs on diverse edge computing platforms, computational models, and operating systems to guide real-world deployment. Based on the experimental results, a real-time hardware-in-the-loop simulation system was constructed to validate the algorithm's effectiveness under realistic conditions. Additionally, an innovative yaw mechanism and its corresponding system model are introduced in this study to enhance the complexity of the system dynamics. These contributions provide valuable insights for addressing motion control challenges in complex mechanical systems.
AB - This study addresses the motion control problem of the Direct-Drive Tandem-Wing Experiment Platform (DDTWEP), focusing on designing effective direct and transitional operating strategies for pitch, roll, and yaw under nonlinear, unsteady aerodynamic interference caused by high-frequency oscillations and closely spaced tandem wings by leveraging advanced artificial intelligence (AI) techniques. The Concerto Reinforcement Learning Extension (CRL2E) algorithm, a novel AI approach, is proposed to tackle this challenge, featuring the innovative Physics-Inspired Rule-Based Policy Composer strategy and experimental validation. The results demonstrate that the CRL2E algorithm maintains safety and efficiency throughout the training process, even with randomly initialized policy weights. In DDTWEP's plug-and-play, fully on-the-job motion control problem, the algorithm achieves a performance improvement of at least fourteen-fold and up to sixty-six-fold within the first five hundred interactions compared to Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Furthermore, to further verify the rationality and performance of the module and algorithm design, this study introduces two perturbations: Time-Interleaved Capability Perturbation and Composer Perturbation, and develops multiple algorithms for comparative experiments. The experimental results show that compared to existing Concerto Reinforcement Learning (CRL) frameworks, the CRL2E algorithm achieves an 8.3%–60.4% enhancement in tracking accuracy, a 36.11%–57.64% improvement in convergence speed over the CRL with Composer Perturbation algorithm, and a 43.52%–65.85% improvement over the CRL with Time-Interleaved Capability Perturbation and Composer Perturbation algorithms, indicating the rationality of the CRL2E algorithm design. Regarding generalizability, the CRL2E algorithm demonstrates significant applicability in quadrotor flight control, highlighting its potential versatility. From a technical affinity perspective, the CRL2E algorithm is well-suited for integrating pretraining techniques, demonstrating excellent safety and efficiency in addressing cross-task plug-and-play and fully on-the-job fine-tuning problems. Regarding deplorability, hardware requirements were analyzed through ten thousand runs on diverse edge computing platforms, computational models, and operating systems to guide real-world deployment. Based on the experimental results, a real-time hardware-in-the-loop simulation system was constructed to validate the algorithm's effectiveness under realistic conditions. Additionally, an innovative yaw mechanism and its corresponding system model are introduced in this study to enhance the complexity of the system dynamics. These contributions provide valuable insights for addressing motion control challenges in complex mechanical systems.
KW - Artificial intelligence
KW - Direct-drive tandem-wing
KW - Plug-and-play control
KW - Policy composer
KW - Time-interleaved control
UR - http://www.scopus.com/inward/record.url?scp=86000551267&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2025.110373
DO - 10.1016/j.engappai.2025.110373
M3 - 文章
AN - SCOPUS:86000551267
SN - 0952-1976
VL - 148
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 110373
ER -