基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法

Tao Zhang; Wen Tao Zhang; Ling Dai; Jing Yi Chen; Li Wang; Qian Ru Wei

doi:10.12263/DZXB.20211268

基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法

Tao Zhang, Wen Tao Zhang, Ling Dai, Jing Yi Chen, Li Wang, Qian Ru Wei

网络空间安全学院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

6 引用（Scopus）

摘要

Dynamic reconfiguration is an efficient fault-tolerant approach for integrated modular avionics(IMA) systems. The reconfiguration blueprint defines the application migration and resource reconfiguration scheme in the system failure environment, which is the key to reconfiguring and recovering the system function with minimum cost. How to generate effective reconfiguration blueprints rapidly and automatically in complex multi-level associated failure modes is the difficulty. This paper proposes an IMA system reconfiguration method based on sequential game multi-agent reinforcement learning to solve the problem. The sequential game model is introduced in this method. We define the application software needs to be migrated as the agent in the game. The sequence of sequential game is determined according to the priority of the application software. Aiming at the problem of competition and cooperation among multiple agents in the process of sequential game, the algorithm introduces policy gradient of reinforcement learning and optimizes the reconfiguration effect by controlling the action selection probability in interaction with the environment. The policy gradient Monte Carlo tree search algorithm based on biased estimation is applied to update game strategy, which solves the problems of oscillation, difficulty in convergence, long calculation time of the traditional policy gradient algorithm. Experimental results indicate that compared with differential evolution and Q-learning methods, the proposed algorithm has significant advantages in convergence and efficiency.

投稿的翻译标题	Integrated Modular Avionics System Reconstruction Method Based on Sequential Game Multi-agent Reinforcement Learning
源语言	繁体中文
页（从-至）	954-966
页数	13
期刊	Tien Tzu Hsueh Pao/Acta Electronica Sinica
卷	50
期	4
DOI	https://doi.org/10.12263/DZXB.20211268
出版状态	已出版 - 4月 2022

关键词

Integrated modular avionics(IMA) system
Monte Carlo tree search
Multi-agent reinforcement learning
Policy gradient
Reconfiguration
Sequential game

访问文件

10.12263/DZXB.20211268

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{8f057e9cda8a4539b833d2e93d570dc8,

title = "基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法",

abstract = "Dynamic reconfiguration is an efficient fault-tolerant approach for integrated modular avionics(IMA) systems. The reconfiguration blueprint defines the application migration and resource reconfiguration scheme in the system failure environment, which is the key to reconfiguring and recovering the system function with minimum cost. How to generate effective reconfiguration blueprints rapidly and automatically in complex multi-level associated failure modes is the difficulty. This paper proposes an IMA system reconfiguration method based on sequential game multi-agent reinforcement learning to solve the problem. The sequential game model is introduced in this method. We define the application software needs to be migrated as the agent in the game. The sequence of sequential game is determined according to the priority of the application software. Aiming at the problem of competition and cooperation among multiple agents in the process of sequential game, the algorithm introduces policy gradient of reinforcement learning and optimizes the reconfiguration effect by controlling the action selection probability in interaction with the environment. The policy gradient Monte Carlo tree search algorithm based on biased estimation is applied to update game strategy, which solves the problems of oscillation, difficulty in convergence, long calculation time of the traditional policy gradient algorithm. Experimental results indicate that compared with differential evolution and Q-learning methods, the proposed algorithm has significant advantages in convergence and efficiency.",

keywords = "Integrated modular avionics(IMA) system, Monte Carlo tree search, Multi-agent reinforcement learning, Policy gradient, Reconfiguration, Sequential game",

author = "Tao Zhang and Zhang, {Wen Tao} and Ling Dai and Chen, {Jing Yi} and Li Wang and Wei, {Qian Ru}",

year = "2022",

month = apr,

doi = "10.12263/DZXB.20211268",

language = "繁体中文",

volume = "50",

pages = "954--966",

journal = "Tien Tzu Hsueh Pao/Acta Electronica Sinica",

issn = "0372-2112",

publisher = "Chinese Institute of Electronics",

number = "4",

}

TY - JOUR

T1 - 基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法

AU - Zhang, Tao

AU - Zhang, Wen Tao

AU - Dai, Ling

AU - Chen, Jing Yi

AU - Wang, Li

AU - Wei, Qian Ru

PY - 2022/4

Y1 - 2022/4

N2 - Dynamic reconfiguration is an efficient fault-tolerant approach for integrated modular avionics(IMA) systems. The reconfiguration blueprint defines the application migration and resource reconfiguration scheme in the system failure environment, which is the key to reconfiguring and recovering the system function with minimum cost. How to generate effective reconfiguration blueprints rapidly and automatically in complex multi-level associated failure modes is the difficulty. This paper proposes an IMA system reconfiguration method based on sequential game multi-agent reinforcement learning to solve the problem. The sequential game model is introduced in this method. We define the application software needs to be migrated as the agent in the game. The sequence of sequential game is determined according to the priority of the application software. Aiming at the problem of competition and cooperation among multiple agents in the process of sequential game, the algorithm introduces policy gradient of reinforcement learning and optimizes the reconfiguration effect by controlling the action selection probability in interaction with the environment. The policy gradient Monte Carlo tree search algorithm based on biased estimation is applied to update game strategy, which solves the problems of oscillation, difficulty in convergence, long calculation time of the traditional policy gradient algorithm. Experimental results indicate that compared with differential evolution and Q-learning methods, the proposed algorithm has significant advantages in convergence and efficiency.

AB - Dynamic reconfiguration is an efficient fault-tolerant approach for integrated modular avionics(IMA) systems. The reconfiguration blueprint defines the application migration and resource reconfiguration scheme in the system failure environment, which is the key to reconfiguring and recovering the system function with minimum cost. How to generate effective reconfiguration blueprints rapidly and automatically in complex multi-level associated failure modes is the difficulty. This paper proposes an IMA system reconfiguration method based on sequential game multi-agent reinforcement learning to solve the problem. The sequential game model is introduced in this method. We define the application software needs to be migrated as the agent in the game. The sequence of sequential game is determined according to the priority of the application software. Aiming at the problem of competition and cooperation among multiple agents in the process of sequential game, the algorithm introduces policy gradient of reinforcement learning and optimizes the reconfiguration effect by controlling the action selection probability in interaction with the environment. The policy gradient Monte Carlo tree search algorithm based on biased estimation is applied to update game strategy, which solves the problems of oscillation, difficulty in convergence, long calculation time of the traditional policy gradient algorithm. Experimental results indicate that compared with differential evolution and Q-learning methods, the proposed algorithm has significant advantages in convergence and efficiency.

KW - Integrated modular avionics(IMA) system

KW - Monte Carlo tree search

KW - Multi-agent reinforcement learning

KW - Policy gradient

KW - Reconfiguration

KW - Sequential game

UR - http://www.scopus.com/inward/record.url?scp=85130045134&partnerID=8YFLogxK

U2 - 10.12263/DZXB.20211268

DO - 10.12263/DZXB.20211268

M3 - 文章

AN - SCOPUS:85130045134

SN - 0372-2112

VL - 50

SP - 954

EP - 966

JO - Tien Tzu Hsueh Pao/Acta Electronica Sinica

JF - Tien Tzu Hsueh Pao/Acta Electronica Sinica

IS - 4

ER -

基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法

摘要

关键词

访问文件

其它文件与链接

指纹

引用此