Concertorl: A reinforcement learning approach for finite-time single-life enhanced control and its application to direct-drive tandem-wing experiment platforms

Minghao Zhang; Bifeng Song; Changhao Chen; Xinyu Lang; Liang Wang

doi:10.1007/s10489-024-05720-7

Concertorl: A reinforcement learning approach for finite-time single-life enhanced control and its application to direct-drive tandem-wing experiment platforms

Minghao Zhang, Bifeng Song, Changhao Chen, Xinyu Lang, Liang Wang

School of Aeronautics

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence. Graphical abstract: (Figure presented.)

Original language	English
Pages (from-to)	13121-13159
Number of pages	39
Journal	Applied Intelligence
Volume	54
Issue number	24
DOIs	https://doi.org/10.1007/s10489-024-05720-7
State	Published - Dec 2024

Keywords

Control Precision
Online Training Stability
Policy composer
Reinforcement Learning
Time-Interleaved Control

Access to Document

10.1007/s10489-024-05720-7

Cite this

@article{c17a5bab56f74407a74753c7ce407ebc,

title = "Concertorl: A reinforcement learning approach for finite-time single-life enhanced control and its application to direct-drive tandem-wing experiment platforms",

abstract = "Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence. Graphical abstract: (Figure presented.)",

keywords = "Control Precision, Online Training Stability, Policy composer, Reinforcement Learning, Time-Interleaved Control",

author = "Minghao Zhang and Bifeng Song and Changhao Chen and Xinyu Lang and Liang Wang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.",

year = "2024",

month = dec,

doi = "10.1007/s10489-024-05720-7",

language = "英语",

volume = "54",

pages = "13121--13159",

journal = "Applied Intelligence",

issn = "0924-669X",

publisher = "Springer Netherlands",

number = "24",

}

TY - JOUR

T1 - Concertorl

T2 - A reinforcement learning approach for finite-time single-life enhanced control and its application to direct-drive tandem-wing experiment platforms

AU - Zhang, Minghao

AU - Song, Bifeng

AU - Chen, Changhao

AU - Lang, Xinyu

AU - Wang, Liang

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

PY - 2024/12

Y1 - 2024/12

N2 - Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence. Graphical abstract: (Figure presented.)

AB - Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence. Graphical abstract: (Figure presented.)

KW - Control Precision

KW - Online Training Stability

KW - Policy composer

KW - Reinforcement Learning

KW - Time-Interleaved Control

UR - http://www.scopus.com/inward/record.url?scp=85207324226&partnerID=8YFLogxK

U2 - 10.1007/s10489-024-05720-7

DO - 10.1007/s10489-024-05720-7

M3 - 文章

AN - SCOPUS:85207324226

SN - 0924-669X

VL - 54

SP - 13121

EP - 13159

JO - Applied Intelligence

JF - Applied Intelligence

IS - 24

ER -

Concertorl: A reinforcement learning approach for finite-time single-life enhanced control and its application to direct-drive tandem-wing experiment platforms

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this