基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法

Tao Zhang; Wen Tao Zhang; Ling Dai; Jing Yi Chen; Li Wang; Qian Ru Wei

doi:10.12263/DZXB.20211268

基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法

Translated title of the contribution: Integrated Modular Avionics System Reconstruction Method Based on Sequential Game Multi-agent Reinforcement Learning

Tao Zhang, Wen Tao Zhang, Ling Dai, Jing Yi Chen, Li Wang, Qian Ru Wei

School of Cybersecurity

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

6 Scopus citations

Abstract

Dynamic reconfiguration is an efficient fault-tolerant approach for integrated modular avionics(IMA) systems. The reconfiguration blueprint defines the application migration and resource reconfiguration scheme in the system failure environment, which is the key to reconfiguring and recovering the system function with minimum cost. How to generate effective reconfiguration blueprints rapidly and automatically in complex multi-level associated failure modes is the difficulty. This paper proposes an IMA system reconfiguration method based on sequential game multi-agent reinforcement learning to solve the problem. The sequential game model is introduced in this method. We define the application software needs to be migrated as the agent in the game. The sequence of sequential game is determined according to the priority of the application software. Aiming at the problem of competition and cooperation among multiple agents in the process of sequential game, the algorithm introduces policy gradient of reinforcement learning and optimizes the reconfiguration effect by controlling the action selection probability in interaction with the environment. The policy gradient Monte Carlo tree search algorithm based on biased estimation is applied to update game strategy, which solves the problems of oscillation, difficulty in convergence, long calculation time of the traditional policy gradient algorithm. Experimental results indicate that compared with differential evolution and Q-learning methods, the proposed algorithm has significant advantages in convergence and efficiency.

Translated title of the contribution	Integrated Modular Avionics System Reconstruction Method Based on Sequential Game Multi-agent Reinforcement Learning
Original language	Chinese (Traditional)
Pages (from-to)	954-966
Number of pages	13
Journal	Tien Tzu Hsueh Pao/Acta Electronica Sinica
Volume	50
Issue number	4
DOIs	https://doi.org/10.12263/DZXB.20211268
State	Published - Apr 2022

Access to Document

10.12263/DZXB.20211268

Cite this

@article{8f057e9cda8a4539b833d2e93d570dc8,

title = "基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法",

abstract = "Dynamic reconfiguration is an efficient fault-tolerant approach for integrated modular avionics(IMA) systems. The reconfiguration blueprint defines the application migration and resource reconfiguration scheme in the system failure environment, which is the key to reconfiguring and recovering the system function with minimum cost. How to generate effective reconfiguration blueprints rapidly and automatically in complex multi-level associated failure modes is the difficulty. This paper proposes an IMA system reconfiguration method based on sequential game multi-agent reinforcement learning to solve the problem. The sequential game model is introduced in this method. We define the application software needs to be migrated as the agent in the game. The sequence of sequential game is determined according to the priority of the application software. Aiming at the problem of competition and cooperation among multiple agents in the process of sequential game, the algorithm introduces policy gradient of reinforcement learning and optimizes the reconfiguration effect by controlling the action selection probability in interaction with the environment. The policy gradient Monte Carlo tree search algorithm based on biased estimation is applied to update game strategy, which solves the problems of oscillation, difficulty in convergence, long calculation time of the traditional policy gradient algorithm. Experimental results indicate that compared with differential evolution and Q-learning methods, the proposed algorithm has significant advantages in convergence and efficiency.",

keywords = "Integrated modular avionics(IMA) system, Monte Carlo tree search, Multi-agent reinforcement learning, Policy gradient, Reconfiguration, Sequential game",

author = "Tao Zhang and Zhang, {Wen Tao} and Ling Dai and Chen, {Jing Yi} and Li Wang and Wei, {Qian Ru}",

year = "2022",

month = apr,

doi = "10.12263/DZXB.20211268",

language = "繁体中文",

volume = "50",

pages = "954--966",

journal = "Tien Tzu Hsueh Pao/Acta Electronica Sinica",

issn = "0372-2112",

publisher = "Chinese Institute of Electronics",

number = "4",

}

TY - JOUR

T1 - 基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法

AU - Zhang, Tao

AU - Zhang, Wen Tao

AU - Dai, Ling

AU - Chen, Jing Yi

AU - Wang, Li

AU - Wei, Qian Ru

PY - 2022/4

Y1 - 2022/4

N2 - Dynamic reconfiguration is an efficient fault-tolerant approach for integrated modular avionics(IMA) systems. The reconfiguration blueprint defines the application migration and resource reconfiguration scheme in the system failure environment, which is the key to reconfiguring and recovering the system function with minimum cost. How to generate effective reconfiguration blueprints rapidly and automatically in complex multi-level associated failure modes is the difficulty. This paper proposes an IMA system reconfiguration method based on sequential game multi-agent reinforcement learning to solve the problem. The sequential game model is introduced in this method. We define the application software needs to be migrated as the agent in the game. The sequence of sequential game is determined according to the priority of the application software. Aiming at the problem of competition and cooperation among multiple agents in the process of sequential game, the algorithm introduces policy gradient of reinforcement learning and optimizes the reconfiguration effect by controlling the action selection probability in interaction with the environment. The policy gradient Monte Carlo tree search algorithm based on biased estimation is applied to update game strategy, which solves the problems of oscillation, difficulty in convergence, long calculation time of the traditional policy gradient algorithm. Experimental results indicate that compared with differential evolution and Q-learning methods, the proposed algorithm has significant advantages in convergence and efficiency.

AB - Dynamic reconfiguration is an efficient fault-tolerant approach for integrated modular avionics(IMA) systems. The reconfiguration blueprint defines the application migration and resource reconfiguration scheme in the system failure environment, which is the key to reconfiguring and recovering the system function with minimum cost. How to generate effective reconfiguration blueprints rapidly and automatically in complex multi-level associated failure modes is the difficulty. This paper proposes an IMA system reconfiguration method based on sequential game multi-agent reinforcement learning to solve the problem. The sequential game model is introduced in this method. We define the application software needs to be migrated as the agent in the game. The sequence of sequential game is determined according to the priority of the application software. Aiming at the problem of competition and cooperation among multiple agents in the process of sequential game, the algorithm introduces policy gradient of reinforcement learning and optimizes the reconfiguration effect by controlling the action selection probability in interaction with the environment. The policy gradient Monte Carlo tree search algorithm based on biased estimation is applied to update game strategy, which solves the problems of oscillation, difficulty in convergence, long calculation time of the traditional policy gradient algorithm. Experimental results indicate that compared with differential evolution and Q-learning methods, the proposed algorithm has significant advantages in convergence and efficiency.

KW - Integrated modular avionics(IMA) system

KW - Monte Carlo tree search

KW - Multi-agent reinforcement learning

KW - Policy gradient

KW - Reconfiguration

KW - Sequential game

UR - http://www.scopus.com/inward/record.url?scp=85130045134&partnerID=8YFLogxK

U2 - 10.12263/DZXB.20211268

DO - 10.12263/DZXB.20211268

M3 - 文章

AN - SCOPUS:85130045134

SN - 0372-2112

VL - 50

SP - 954

EP - 966

JO - Tien Tzu Hsueh Pao/Acta Electronica Sinica

JF - Tien Tzu Hsueh Pao/Acta Electronica Sinica

IS - 4

ER -

基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法

Abstract

Access to Document

Other files and links

Fingerprint

Cite this