Multi-Agent Reward-Iteration Fuzzy Q-Learning

Lixiong Leng; Jingchen Li; Jinhui Zhu; Kao Shing Hwang; Haobin Shi

doi:10.1007/s40815-021-01063-4

Multi-Agent Reward-Iteration Fuzzy Q-Learning

Lixiong Leng, Jingchen Li, Jinhui Zhu, Kao Shing Hwang, Haobin Shi

School of Computer Science

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Fuzzy Q-learning extends Q-learning to continuous state space and has been applied to a wide range of applications such as robot control. But in a multi-agent system, the non-stationary environment makes joint policy challenging to converge. To give agents more suitable rewards in a multi-agent environment, a multi-agent reward-iteration fuzzy Q-learning (RIFQ) is proposed for multi-agent cooperative tasks. The state space is divided into three channels by the proposed state-divider with fuzzy logic. The reward of an agent is reshaped iteratively according to its state, and the update sequence is constructed by calculating the relation among states of different agents. Then, the value functions are updated top-down. By replacing the reward given by the environment with the reshaped reward, agents can avoid the most unreasonable punishments and receive rewards selectively. RIFQ provides a feasible reward relationship for multi-agents, which makes the training of multi-agent more steady. Several simulation experiments show that RIFQ is not limited by the number of agents and has a faster convergence speed than baselines.

Original language	English
Pages (from-to)	1669-1679
Number of pages	11
Journal	International Journal of Fuzzy Systems
Volume	23
Issue number	6
DOIs	https://doi.org/10.1007/s40815-021-01063-4
State	Published - Sep 2021

Keywords

Fuzzy Q-learning
Multi-agent reinforcement learning
Multi-agent system
Reward shaping

Access to Document

10.1007/s40815-021-01063-4

Cite this

@article{c072f462f66e45be955ff7dfb33078d4,

title = "Multi-Agent Reward-Iteration Fuzzy Q-Learning",

abstract = "Fuzzy Q-learning extends Q-learning to continuous state space and has been applied to a wide range of applications such as robot control. But in a multi-agent system, the non-stationary environment makes joint policy challenging to converge. To give agents more suitable rewards in a multi-agent environment, a multi-agent reward-iteration fuzzy Q-learning (RIFQ) is proposed for multi-agent cooperative tasks. The state space is divided into three channels by the proposed state-divider with fuzzy logic. The reward of an agent is reshaped iteratively according to its state, and the update sequence is constructed by calculating the relation among states of different agents. Then, the value functions are updated top-down. By replacing the reward given by the environment with the reshaped reward, agents can avoid the most unreasonable punishments and receive rewards selectively. RIFQ provides a feasible reward relationship for multi-agents, which makes the training of multi-agent more steady. Several simulation experiments show that RIFQ is not limited by the number of agents and has a faster convergence speed than baselines.",

keywords = "Fuzzy Q-learning, Multi-agent reinforcement learning, Multi-agent system, Reward shaping",

author = "Lixiong Leng and Jingchen Li and Jinhui Zhu and Hwang, {Kao Shing} and Haobin Shi",

note = "Publisher Copyright: {\textcopyright} 2021, Taiwan Fuzzy Systems Association.",

year = "2021",

month = sep,

doi = "10.1007/s40815-021-01063-4",

language = "英语",

volume = "23",

pages = "1669--1679",

journal = "International Journal of Fuzzy Systems",

issn = "1562-2479",

publisher = "Springer International Publishing AG",

number = "6",

}

TY - JOUR

T1 - Multi-Agent Reward-Iteration Fuzzy Q-Learning

AU - Leng, Lixiong

AU - Li, Jingchen

AU - Zhu, Jinhui

AU - Hwang, Kao Shing

AU - Shi, Haobin

PY - 2021/9

Y1 - 2021/9

N2 - Fuzzy Q-learning extends Q-learning to continuous state space and has been applied to a wide range of applications such as robot control. But in a multi-agent system, the non-stationary environment makes joint policy challenging to converge. To give agents more suitable rewards in a multi-agent environment, a multi-agent reward-iteration fuzzy Q-learning (RIFQ) is proposed for multi-agent cooperative tasks. The state space is divided into three channels by the proposed state-divider with fuzzy logic. The reward of an agent is reshaped iteratively according to its state, and the update sequence is constructed by calculating the relation among states of different agents. Then, the value functions are updated top-down. By replacing the reward given by the environment with the reshaped reward, agents can avoid the most unreasonable punishments and receive rewards selectively. RIFQ provides a feasible reward relationship for multi-agents, which makes the training of multi-agent more steady. Several simulation experiments show that RIFQ is not limited by the number of agents and has a faster convergence speed than baselines.

AB - Fuzzy Q-learning extends Q-learning to continuous state space and has been applied to a wide range of applications such as robot control. But in a multi-agent system, the non-stationary environment makes joint policy challenging to converge. To give agents more suitable rewards in a multi-agent environment, a multi-agent reward-iteration fuzzy Q-learning (RIFQ) is proposed for multi-agent cooperative tasks. The state space is divided into three channels by the proposed state-divider with fuzzy logic. The reward of an agent is reshaped iteratively according to its state, and the update sequence is constructed by calculating the relation among states of different agents. Then, the value functions are updated top-down. By replacing the reward given by the environment with the reshaped reward, agents can avoid the most unreasonable punishments and receive rewards selectively. RIFQ provides a feasible reward relationship for multi-agents, which makes the training of multi-agent more steady. Several simulation experiments show that RIFQ is not limited by the number of agents and has a faster convergence speed than baselines.

KW - Fuzzy Q-learning

KW - Multi-agent reinforcement learning

KW - Multi-agent system

KW - Reward shaping

UR - http://www.scopus.com/inward/record.url?scp=85104501087&partnerID=8YFLogxK

U2 - 10.1007/s40815-021-01063-4

DO - 10.1007/s40815-021-01063-4

M3 - 文章

AN - SCOPUS:85104501087

SN - 1562-2479

VL - 23

SP - 1669

EP - 1679

JO - International Journal of Fuzzy Systems

JF - International Journal of Fuzzy Systems

IS - 6

ER -

Multi-Agent Reward-Iteration Fuzzy Q-Learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this