A plug-and-play fully on-the-job real-time reinforcement learning algorithm for a direct-drive tandem-wing experiment platforms under multiple random operating conditions

Zhang Minghao, Song Bifeng, Yang Xiaojun, Wang Liang

Research output: Contribution to journalArticlepeer-review

Abstract

This study addresses the motion control problem of the Direct-Drive Tandem-Wing Experiment Platform (DDTWEP), focusing on designing effective direct and transitional operating strategies for pitch, roll, and yaw under nonlinear, unsteady aerodynamic interference caused by high-frequency oscillations and closely spaced tandem wings by leveraging advanced artificial intelligence (AI) techniques. The Concerto Reinforcement Learning Extension (CRL2E) algorithm, a novel AI approach, is proposed to tackle this challenge, featuring the innovative Physics-Inspired Rule-Based Policy Composer strategy and experimental validation. The results demonstrate that the CRL2E algorithm maintains safety and efficiency throughout the training process, even with randomly initialized policy weights. In DDTWEP's plug-and-play, fully on-the-job motion control problem, the algorithm achieves a performance improvement of at least fourteen-fold and up to sixty-six-fold within the first five hundred interactions compared to Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Furthermore, to further verify the rationality and performance of the module and algorithm design, this study introduces two perturbations: Time-Interleaved Capability Perturbation and Composer Perturbation, and develops multiple algorithms for comparative experiments. The experimental results show that compared to existing Concerto Reinforcement Learning (CRL) frameworks, the CRL2E algorithm achieves an 8.3%–60.4% enhancement in tracking accuracy, a 36.11%–57.64% improvement in convergence speed over the CRL with Composer Perturbation algorithm, and a 43.52%–65.85% improvement over the CRL with Time-Interleaved Capability Perturbation and Composer Perturbation algorithms, indicating the rationality of the CRL2E algorithm design. Regarding generalizability, the CRL2E algorithm demonstrates significant applicability in quadrotor flight control, highlighting its potential versatility. From a technical affinity perspective, the CRL2E algorithm is well-suited for integrating pretraining techniques, demonstrating excellent safety and efficiency in addressing cross-task plug-and-play and fully on-the-job fine-tuning problems. Regarding deplorability, hardware requirements were analyzed through ten thousand runs on diverse edge computing platforms, computational models, and operating systems to guide real-world deployment. Based on the experimental results, a real-time hardware-in-the-loop simulation system was constructed to validate the algorithm's effectiveness under realistic conditions. Additionally, an innovative yaw mechanism and its corresponding system model are introduced in this study to enhance the complexity of the system dynamics. These contributions provide valuable insights for addressing motion control challenges in complex mechanical systems.

Original languageEnglish
Article number110373
JournalEngineering Applications of Artificial Intelligence
Volume148
DOIs
StatePublished - 15 May 2025

Keywords

  • Artificial intelligence
  • Direct-drive tandem-wing
  • Plug-and-play control
  • Policy composer
  • Time-interleaved control

Fingerprint

Dive into the research topics of 'A plug-and-play fully on-the-job real-time reinforcement learning algorithm for a direct-drive tandem-wing experiment platforms under multiple random operating conditions'. Together they form a unique fingerprint.

Cite this