Multi-Agent Reward-Iteration Fuzzy Q-Learning

Lixiong Leng; Jingchen Li; Jinhui Zhu; Kao Shing Hwang; Haobin Shi

doi:10.1007/s40815-021-01063-4

Multi-Agent Reward-Iteration Fuzzy Q-Learning

Lixiong Leng, Jingchen Li, Jinhui Zhu, Kao Shing Hwang, Haobin Shi

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Fuzzy Q-learning extends Q-learning to continuous state space and has been applied to a wide range of applications such as robot control. But in a multi-agent system, the non-stationary environment makes joint policy challenging to converge. To give agents more suitable rewards in a multi-agent environment, a multi-agent reward-iteration fuzzy Q-learning (RIFQ) is proposed for multi-agent cooperative tasks. The state space is divided into three channels by the proposed state-divider with fuzzy logic. The reward of an agent is reshaped iteratively according to its state, and the update sequence is constructed by calculating the relation among states of different agents. Then, the value functions are updated top-down. By replacing the reward given by the environment with the reshaped reward, agents can avoid the most unreasonable punishments and receive rewards selectively. RIFQ provides a feasible reward relationship for multi-agents, which makes the training of multi-agent more steady. Several simulation experiments show that RIFQ is not limited by the number of agents and has a faster convergence speed than baselines.

源语言	英语
页（从-至）	1669-1679
页数	11
期刊	International Journal of Fuzzy Systems
卷	23
期	6
DOI	https://doi.org/10.1007/s40815-021-01063-4
出版状态	已出版 - 9月 2021

访问文件

10.1007/s40815-021-01063-4

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{c072f462f66e45be955ff7dfb33078d4,

title = "Multi-Agent Reward-Iteration Fuzzy Q-Learning",

abstract = "Fuzzy Q-learning extends Q-learning to continuous state space and has been applied to a wide range of applications such as robot control. But in a multi-agent system, the non-stationary environment makes joint policy challenging to converge. To give agents more suitable rewards in a multi-agent environment, a multi-agent reward-iteration fuzzy Q-learning (RIFQ) is proposed for multi-agent cooperative tasks. The state space is divided into three channels by the proposed state-divider with fuzzy logic. The reward of an agent is reshaped iteratively according to its state, and the update sequence is constructed by calculating the relation among states of different agents. Then, the value functions are updated top-down. By replacing the reward given by the environment with the reshaped reward, agents can avoid the most unreasonable punishments and receive rewards selectively. RIFQ provides a feasible reward relationship for multi-agents, which makes the training of multi-agent more steady. Several simulation experiments show that RIFQ is not limited by the number of agents and has a faster convergence speed than baselines.",

keywords = "Fuzzy Q-learning, Multi-agent reinforcement learning, Multi-agent system, Reward shaping",

author = "Lixiong Leng and Jingchen Li and Jinhui Zhu and Hwang, {Kao Shing} and Haobin Shi",

note = "Publisher Copyright: {\textcopyright} 2021, Taiwan Fuzzy Systems Association.",

year = "2021",

month = sep,

doi = "10.1007/s40815-021-01063-4",

language = "英语",

volume = "23",

pages = "1669--1679",

journal = "International Journal of Fuzzy Systems",

issn = "1562-2479",

publisher = "Springer International Publishing AG",

number = "6",

}

TY - JOUR

T1 - Multi-Agent Reward-Iteration Fuzzy Q-Learning

AU - Leng, Lixiong

AU - Li, Jingchen

AU - Zhu, Jinhui

AU - Hwang, Kao Shing

AU - Shi, Haobin

PY - 2021/9

Y1 - 2021/9

N2 - Fuzzy Q-learning extends Q-learning to continuous state space and has been applied to a wide range of applications such as robot control. But in a multi-agent system, the non-stationary environment makes joint policy challenging to converge. To give agents more suitable rewards in a multi-agent environment, a multi-agent reward-iteration fuzzy Q-learning (RIFQ) is proposed for multi-agent cooperative tasks. The state space is divided into three channels by the proposed state-divider with fuzzy logic. The reward of an agent is reshaped iteratively according to its state, and the update sequence is constructed by calculating the relation among states of different agents. Then, the value functions are updated top-down. By replacing the reward given by the environment with the reshaped reward, agents can avoid the most unreasonable punishments and receive rewards selectively. RIFQ provides a feasible reward relationship for multi-agents, which makes the training of multi-agent more steady. Several simulation experiments show that RIFQ is not limited by the number of agents and has a faster convergence speed than baselines.

AB - Fuzzy Q-learning extends Q-learning to continuous state space and has been applied to a wide range of applications such as robot control. But in a multi-agent system, the non-stationary environment makes joint policy challenging to converge. To give agents more suitable rewards in a multi-agent environment, a multi-agent reward-iteration fuzzy Q-learning (RIFQ) is proposed for multi-agent cooperative tasks. The state space is divided into three channels by the proposed state-divider with fuzzy logic. The reward of an agent is reshaped iteratively according to its state, and the update sequence is constructed by calculating the relation among states of different agents. Then, the value functions are updated top-down. By replacing the reward given by the environment with the reshaped reward, agents can avoid the most unreasonable punishments and receive rewards selectively. RIFQ provides a feasible reward relationship for multi-agents, which makes the training of multi-agent more steady. Several simulation experiments show that RIFQ is not limited by the number of agents and has a faster convergence speed than baselines.

KW - Fuzzy Q-learning

KW - Multi-agent reinforcement learning

KW - Multi-agent system

KW - Reward shaping

UR - http://www.scopus.com/inward/record.url?scp=85104501087&partnerID=8YFLogxK

U2 - 10.1007/s40815-021-01063-4

DO - 10.1007/s40815-021-01063-4

M3 - 文章

AN - SCOPUS:85104501087

SN - 1562-2479

VL - 23

SP - 1669

EP - 1679

JO - International Journal of Fuzzy Systems

JF - International Journal of Fuzzy Systems

IS - 6

ER -

Multi-Agent Reward-Iteration Fuzzy Q-Learning

摘要

访问文件

其它文件与链接

指纹

引用此