An adaptive algorithm for consensus improving in group decision making based on reinforcement learning

Zhang Hengsheng, Zhu Rui, Wang Quantao, Shi Haobin, Kao Shing Hwang

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

In group decision-making problems with reciprocal preference relations, the process of improving individual consistency and consensus degree among decision-makers is dynamic and iter ative. Traditional automatic consensus-reaching process has some problems, such as adopting a fixed strategy in a deterministic environment without considering the dynamics of the decision-making environment, the destruction of individual consistency, etc. To solve these problems, an adaptive consensus-reaching model in a dynamic environment is proposed in this paper. Firstly, a Q-learning algorithm is used to build an environment model for different decision-making states of matrix modification. On the premise of modifying the preference matrix with a small matrix deviation, the optimal modification strategy is learned to improve the consensus degree among decision-makers. Second, we propose a method to control individual consistency in the process of consensus reaching by using a reward function. Finally, several numerical examples are used to illustrate the effectiveness and feasibility of the proposed algorithm. The experimental results show that the proposed algorithm significantly improves the consensus degree of the decision-makers in a small matrix deviation and ensures that the decision-maker’s individual consistency is not destroyed.

Original languageEnglish
Pages (from-to)161-174
Number of pages14
JournalJournal of the Chinese Institute of Engineers, Transactions of the Chinese Institute of Engineers,Series A
Volume45
Issue number2
DOIs
StatePublished - 2022

Keywords

  • consensus
  • Group decision making
  • reinforcement learning

Fingerprint

Dive into the research topics of 'An adaptive algorithm for consensus improving in group decision making based on reinforcement learning'. Together they form a unique fingerprint.

Cite this