A plug-and-play fully on-the-job real-time reinforcement learning algorithm for a direct-drive tandem-wing experiment platforms under multiple random operating conditions

Zhang Minghao; Song Bifeng; Yang Xiaojun; Wang Liang

doi:10.1016/j.engappai.2025.110373

A plug-and-play fully on-the-job real-time reinforcement learning algorithm for a direct-drive tandem-wing experiment platforms under multiple random operating conditions

Zhang Minghao, Song Bifeng, Yang Xiaojun, Wang Liang

School of Aeronautics

Research output: Contribution to journal › Article › peer-review

Abstract

This study addresses the motion control problem of the Direct-Drive Tandem-Wing Experiment Platform (DDTWEP), focusing on designing effective direct and transitional operating strategies for pitch, roll, and yaw under nonlinear, unsteady aerodynamic interference caused by high-frequency oscillations and closely spaced tandem wings by leveraging advanced artificial intelligence (AI) techniques. The Concerto Reinforcement Learning Extension (CRL2E) algorithm, a novel AI approach, is proposed to tackle this challenge, featuring the innovative Physics-Inspired Rule-Based Policy Composer strategy and experimental validation. The results demonstrate that the CRL2E algorithm maintains safety and efficiency throughout the training process, even with randomly initialized policy weights. In DDTWEP's plug-and-play, fully on-the-job motion control problem, the algorithm achieves a performance improvement of at least fourteen-fold and up to sixty-six-fold within the first five hundred interactions compared to Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Furthermore, to further verify the rationality and performance of the module and algorithm design, this study introduces two perturbations: Time-Interleaved Capability Perturbation and Composer Perturbation, and develops multiple algorithms for comparative experiments. The experimental results show that compared to existing Concerto Reinforcement Learning (CRL) frameworks, the CRL2E algorithm achieves an 8.3%–60.4% enhancement in tracking accuracy, a 36.11%–57.64% improvement in convergence speed over the CRL with Composer Perturbation algorithm, and a 43.52%–65.85% improvement over the CRL with Time-Interleaved Capability Perturbation and Composer Perturbation algorithms, indicating the rationality of the CRL2E algorithm design. Regarding generalizability, the CRL2E algorithm demonstrates significant applicability in quadrotor flight control, highlighting its potential versatility. From a technical affinity perspective, the CRL2E algorithm is well-suited for integrating pretraining techniques, demonstrating excellent safety and efficiency in addressing cross-task plug-and-play and fully on-the-job fine-tuning problems. Regarding deplorability, hardware requirements were analyzed through ten thousand runs on diverse edge computing platforms, computational models, and operating systems to guide real-world deployment. Based on the experimental results, a real-time hardware-in-the-loop simulation system was constructed to validate the algorithm's effectiveness under realistic conditions. Additionally, an innovative yaw mechanism and its corresponding system model are introduced in this study to enhance the complexity of the system dynamics. These contributions provide valuable insights for addressing motion control challenges in complex mechanical systems.

Original language	English
Article number	110373
Journal	Engineering Applications of Artificial Intelligence
Volume	148
DOIs	https://doi.org/10.1016/j.engappai.2025.110373
State	Published - 15 May 2025

Keywords

Artificial intelligence
Direct-drive tandem-wing
Plug-and-play control
Policy composer
Time-interleaved control

Access to Document

10.1016/j.engappai.2025.110373

Cite this

@article{1a6bc2280e3143e3a0880961fc2cb430,

title = "A plug-and-play fully on-the-job real-time reinforcement learning algorithm for a direct-drive tandem-wing experiment platforms under multiple random operating conditions",

abstract = "This study addresses the motion control problem of the Direct-Drive Tandem-Wing Experiment Platform (DDTWEP), focusing on designing effective direct and transitional operating strategies for pitch, roll, and yaw under nonlinear, unsteady aerodynamic interference caused by high-frequency oscillations and closely spaced tandem wings by leveraging advanced artificial intelligence (AI) techniques. The Concerto Reinforcement Learning Extension (CRL2E) algorithm, a novel AI approach, is proposed to tackle this challenge, featuring the innovative Physics-Inspired Rule-Based Policy Composer strategy and experimental validation. The results demonstrate that the CRL2E algorithm maintains safety and efficiency throughout the training process, even with randomly initialized policy weights. In DDTWEP's plug-and-play, fully on-the-job motion control problem, the algorithm achieves a performance improvement of at least fourteen-fold and up to sixty-six-fold within the first five hundred interactions compared to Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Furthermore, to further verify the rationality and performance of the module and algorithm design, this study introduces two perturbations: Time-Interleaved Capability Perturbation and Composer Perturbation, and develops multiple algorithms for comparative experiments. The experimental results show that compared to existing Concerto Reinforcement Learning (CRL) frameworks, the CRL2E algorithm achieves an 8.3%–60.4% enhancement in tracking accuracy, a 36.11%–57.64% improvement in convergence speed over the CRL with Composer Perturbation algorithm, and a 43.52%–65.85% improvement over the CRL with Time-Interleaved Capability Perturbation and Composer Perturbation algorithms, indicating the rationality of the CRL2E algorithm design. Regarding generalizability, the CRL2E algorithm demonstrates significant applicability in quadrotor flight control, highlighting its potential versatility. From a technical affinity perspective, the CRL2E algorithm is well-suited for integrating pretraining techniques, demonstrating excellent safety and efficiency in addressing cross-task plug-and-play and fully on-the-job fine-tuning problems. Regarding deplorability, hardware requirements were analyzed through ten thousand runs on diverse edge computing platforms, computational models, and operating systems to guide real-world deployment. Based on the experimental results, a real-time hardware-in-the-loop simulation system was constructed to validate the algorithm's effectiveness under realistic conditions. Additionally, an innovative yaw mechanism and its corresponding system model are introduced in this study to enhance the complexity of the system dynamics. These contributions provide valuable insights for addressing motion control challenges in complex mechanical systems.",

keywords = "Artificial intelligence, Direct-drive tandem-wing, Plug-and-play control, Policy composer, Time-interleaved control",

author = "Zhang Minghao and Song Bifeng and Yang Xiaojun and Wang Liang",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier Ltd",

year = "2025",

month = may,

day = "15",

doi = "10.1016/j.engappai.2025.110373",

language = "英语",

volume = "148",

journal = "Engineering Applications of Artificial Intelligence",

issn = "0952-1976",

publisher = "Elsevier Ltd",

}

A plug-and-play fully on-the-job real-time reinforcement learning algorithm for a direct-drive tandem-wing experiment platforms under multiple random operating conditions. / Minghao, Zhang; Bifeng, Song; Xiaojun, Yang et al.
In: Engineering Applications of Artificial Intelligence, Vol. 148, 110373, 15.05.2025.

Research output: Contribution to journal › Article › peer-review

TY - JOUR

T1 - A plug-and-play fully on-the-job real-time reinforcement learning algorithm for a direct-drive tandem-wing experiment platforms under multiple random operating conditions

AU - Minghao, Zhang

AU - Bifeng, Song

AU - Xiaojun, Yang

AU - Liang, Wang

PY - 2025/5/15

Y1 - 2025/5/15

N2 - This study addresses the motion control problem of the Direct-Drive Tandem-Wing Experiment Platform (DDTWEP), focusing on designing effective direct and transitional operating strategies for pitch, roll, and yaw under nonlinear, unsteady aerodynamic interference caused by high-frequency oscillations and closely spaced tandem wings by leveraging advanced artificial intelligence (AI) techniques. The Concerto Reinforcement Learning Extension (CRL2E) algorithm, a novel AI approach, is proposed to tackle this challenge, featuring the innovative Physics-Inspired Rule-Based Policy Composer strategy and experimental validation. The results demonstrate that the CRL2E algorithm maintains safety and efficiency throughout the training process, even with randomly initialized policy weights. In DDTWEP's plug-and-play, fully on-the-job motion control problem, the algorithm achieves a performance improvement of at least fourteen-fold and up to sixty-six-fold within the first five hundred interactions compared to Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Furthermore, to further verify the rationality and performance of the module and algorithm design, this study introduces two perturbations: Time-Interleaved Capability Perturbation and Composer Perturbation, and develops multiple algorithms for comparative experiments. The experimental results show that compared to existing Concerto Reinforcement Learning (CRL) frameworks, the CRL2E algorithm achieves an 8.3%–60.4% enhancement in tracking accuracy, a 36.11%–57.64% improvement in convergence speed over the CRL with Composer Perturbation algorithm, and a 43.52%–65.85% improvement over the CRL with Time-Interleaved Capability Perturbation and Composer Perturbation algorithms, indicating the rationality of the CRL2E algorithm design. Regarding generalizability, the CRL2E algorithm demonstrates significant applicability in quadrotor flight control, highlighting its potential versatility. From a technical affinity perspective, the CRL2E algorithm is well-suited for integrating pretraining techniques, demonstrating excellent safety and efficiency in addressing cross-task plug-and-play and fully on-the-job fine-tuning problems. Regarding deplorability, hardware requirements were analyzed through ten thousand runs on diverse edge computing platforms, computational models, and operating systems to guide real-world deployment. Based on the experimental results, a real-time hardware-in-the-loop simulation system was constructed to validate the algorithm's effectiveness under realistic conditions. Additionally, an innovative yaw mechanism and its corresponding system model are introduced in this study to enhance the complexity of the system dynamics. These contributions provide valuable insights for addressing motion control challenges in complex mechanical systems.

AB - This study addresses the motion control problem of the Direct-Drive Tandem-Wing Experiment Platform (DDTWEP), focusing on designing effective direct and transitional operating strategies for pitch, roll, and yaw under nonlinear, unsteady aerodynamic interference caused by high-frequency oscillations and closely spaced tandem wings by leveraging advanced artificial intelligence (AI) techniques. The Concerto Reinforcement Learning Extension (CRL2E) algorithm, a novel AI approach, is proposed to tackle this challenge, featuring the innovative Physics-Inspired Rule-Based Policy Composer strategy and experimental validation. The results demonstrate that the CRL2E algorithm maintains safety and efficiency throughout the training process, even with randomly initialized policy weights. In DDTWEP's plug-and-play, fully on-the-job motion control problem, the algorithm achieves a performance improvement of at least fourteen-fold and up to sixty-six-fold within the first five hundred interactions compared to Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Furthermore, to further verify the rationality and performance of the module and algorithm design, this study introduces two perturbations: Time-Interleaved Capability Perturbation and Composer Perturbation, and develops multiple algorithms for comparative experiments. The experimental results show that compared to existing Concerto Reinforcement Learning (CRL) frameworks, the CRL2E algorithm achieves an 8.3%–60.4% enhancement in tracking accuracy, a 36.11%–57.64% improvement in convergence speed over the CRL with Composer Perturbation algorithm, and a 43.52%–65.85% improvement over the CRL with Time-Interleaved Capability Perturbation and Composer Perturbation algorithms, indicating the rationality of the CRL2E algorithm design. Regarding generalizability, the CRL2E algorithm demonstrates significant applicability in quadrotor flight control, highlighting its potential versatility. From a technical affinity perspective, the CRL2E algorithm is well-suited for integrating pretraining techniques, demonstrating excellent safety and efficiency in addressing cross-task plug-and-play and fully on-the-job fine-tuning problems. Regarding deplorability, hardware requirements were analyzed through ten thousand runs on diverse edge computing platforms, computational models, and operating systems to guide real-world deployment. Based on the experimental results, a real-time hardware-in-the-loop simulation system was constructed to validate the algorithm's effectiveness under realistic conditions. Additionally, an innovative yaw mechanism and its corresponding system model are introduced in this study to enhance the complexity of the system dynamics. These contributions provide valuable insights for addressing motion control challenges in complex mechanical systems.

KW - Artificial intelligence

KW - Direct-drive tandem-wing

KW - Plug-and-play control

KW - Policy composer

KW - Time-interleaved control

UR - http://www.scopus.com/inward/record.url?scp=86000551267&partnerID=8YFLogxK

U2 - 10.1016/j.engappai.2025.110373

DO - 10.1016/j.engappai.2025.110373

M3 - 文章

AN - SCOPUS:86000551267

SN - 0952-1976

VL - 148

JO - Engineering Applications of Artificial Intelligence

JF - Engineering Applications of Artificial Intelligence

M1 - 110373

ER -

A plug-and-play fully on-the-job real-time reinforcement learning algorithm for a direct-drive tandem-wing experiment platforms under multiple random operating conditions

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this