基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法

Tao Zhang, Wen Tao Zhang, Ling Dai, Jing Yi Chen, Li Wang, Qian Ru Wei

科研成果: 期刊稿件文章同行评审

6 引用 (Scopus)

摘要

Dynamic reconfiguration is an efficient fault-tolerant approach for integrated modular avionics(IMA) systems. The reconfiguration blueprint defines the application migration and resource reconfiguration scheme in the system failure environment, which is the key to reconfiguring and recovering the system function with minimum cost. How to generate effective reconfiguration blueprints rapidly and automatically in complex multi-level associated failure modes is the difficulty. This paper proposes an IMA system reconfiguration method based on sequential game multi-agent reinforcement learning to solve the problem. The sequential game model is introduced in this method. We define the application software needs to be migrated as the agent in the game. The sequence of sequential game is determined according to the priority of the application software. Aiming at the problem of competition and cooperation among multiple agents in the process of sequential game, the algorithm introduces policy gradient of reinforcement learning and optimizes the reconfiguration effect by controlling the action selection probability in interaction with the environment. The policy gradient Monte Carlo tree search algorithm based on biased estimation is applied to update game strategy, which solves the problems of oscillation, difficulty in convergence, long calculation time of the traditional policy gradient algorithm. Experimental results indicate that compared with differential evolution and Q-learning methods, the proposed algorithm has significant advantages in convergence and efficiency.

投稿的翻译标题Integrated Modular Avionics System Reconstruction Method Based on Sequential Game Multi-agent Reinforcement Learning
源语言繁体中文
页(从-至)954-966
页数13
期刊Tien Tzu Hsueh Pao/Acta Electronica Sinica
50
4
DOI
出版状态已出版 - 4月 2022

关键词

  • Integrated modular avionics(IMA) system
  • Monte Carlo tree search
  • Multi-agent reinforcement learning
  • Policy gradient
  • Reconfiguration
  • Sequential game

指纹

探究 '基于序贯博弈多智能体强化学习的综合模块化航空电子系统重构方法' 的科研主题。它们共同构成独一无二的指纹。

引用此