RSMDP-based robust Q-learning for optimal path planning in a dynamic environment

Yunfei Zhang; Weilin Li; Clarence W. De Silva

doi:10.2316/Journal.206.2016.4.206-4255

RSMDP-based robust Q-learning for optimal path planning in a dynamic environment

Yunfei Zhang, Weilin Li, Clarence W. De Silva

School of Automation

University of British Columbia

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

This paper presents a robust Q-learning method for path planning in a dynamic environment. The method consists of three steps: at first, a regime-switching Markov decision process (RSMDP) is formed to present the dynamic environment; and second, a probabilistic roadmap (PRM) is constructed, integrated with the RSMDP, and stored as a graph whose nodes correspond to a collision-free world state for the robot; finally, an online Q-learning method with dynamic step size, which facilitates robust convergence of the Q-value iteration, is integrated with the PRM to determine an optimal path for reaching the goal. In this manner, the robot is able to use past experience for improving its performance in avoiding not only static obstacles but also moving obstacles, without knowing the nature of the obstacle motion. The use of regime switching in the avoidance of obstacles with unknown motion is particularly innovative. The developed approach is applied to a homecare robot in computer simulation. The results show that the online path planner with Q-learning is able to rapidly and successfully converge to the correct path.

Original language	English
Pages (from-to)	290-300
Number of pages	11
Journal	International Journal of Robotics and Automation
Volume	31
Issue number	4
DOIs	https://doi.org/10.2316/Journal.206.2016.4.206-4255
State	Published - 2016

Keywords

Markov decision process
Online Q-learning
Optimal path planning
Probabilistic roadmap
Unknown dynamic obstacles

Access to Document

10.2316/Journal.206.2016.4.206-4255

Cite this

@article{039091365ee14dcf95a98e143f7cefb6,

title = "RSMDP-based robust Q-learning for optimal path planning in a dynamic environment",

abstract = "This paper presents a robust Q-learning method for path planning in a dynamic environment. The method consists of three steps: at first, a regime-switching Markov decision process (RSMDP) is formed to present the dynamic environment; and second, a probabilistic roadmap (PRM) is constructed, integrated with the RSMDP, and stored as a graph whose nodes correspond to a collision-free world state for the robot; finally, an online Q-learning method with dynamic step size, which facilitates robust convergence of the Q-value iteration, is integrated with the PRM to determine an optimal path for reaching the goal. In this manner, the robot is able to use past experience for improving its performance in avoiding not only static obstacles but also moving obstacles, without knowing the nature of the obstacle motion. The use of regime switching in the avoidance of obstacles with unknown motion is particularly innovative. The developed approach is applied to a homecare robot in computer simulation. The results show that the online path planner with Q-learning is able to rapidly and successfully converge to the correct path.",

keywords = "Markov decision process, Online Q-learning, Optimal path planning, Probabilistic roadmap, Unknown dynamic obstacles",

author = "Yunfei Zhang and Weilin Li and {De Silva}, {Clarence W.}",

year = "2016",

doi = "10.2316/Journal.206.2016.4.206-4255",

language = "英语",

volume = "31",

pages = "290--300",

journal = "International Journal of Robotics and Automation",

issn = "0826-8185",

publisher = "Acta Press",

number = "4",

}

TY - JOUR

T1 - RSMDP-based robust Q-learning for optimal path planning in a dynamic environment

AU - Zhang, Yunfei

AU - Li, Weilin

AU - De Silva, Clarence W.

PY - 2016

Y1 - 2016

N2 - This paper presents a robust Q-learning method for path planning in a dynamic environment. The method consists of three steps: at first, a regime-switching Markov decision process (RSMDP) is formed to present the dynamic environment; and second, a probabilistic roadmap (PRM) is constructed, integrated with the RSMDP, and stored as a graph whose nodes correspond to a collision-free world state for the robot; finally, an online Q-learning method with dynamic step size, which facilitates robust convergence of the Q-value iteration, is integrated with the PRM to determine an optimal path for reaching the goal. In this manner, the robot is able to use past experience for improving its performance in avoiding not only static obstacles but also moving obstacles, without knowing the nature of the obstacle motion. The use of regime switching in the avoidance of obstacles with unknown motion is particularly innovative. The developed approach is applied to a homecare robot in computer simulation. The results show that the online path planner with Q-learning is able to rapidly and successfully converge to the correct path.

AB - This paper presents a robust Q-learning method for path planning in a dynamic environment. The method consists of three steps: at first, a regime-switching Markov decision process (RSMDP) is formed to present the dynamic environment; and second, a probabilistic roadmap (PRM) is constructed, integrated with the RSMDP, and stored as a graph whose nodes correspond to a collision-free world state for the robot; finally, an online Q-learning method with dynamic step size, which facilitates robust convergence of the Q-value iteration, is integrated with the PRM to determine an optimal path for reaching the goal. In this manner, the robot is able to use past experience for improving its performance in avoiding not only static obstacles but also moving obstacles, without knowing the nature of the obstacle motion. The use of regime switching in the avoidance of obstacles with unknown motion is particularly innovative. The developed approach is applied to a homecare robot in computer simulation. The results show that the online path planner with Q-learning is able to rapidly and successfully converge to the correct path.

KW - Markov decision process

KW - Online Q-learning

KW - Optimal path planning

KW - Probabilistic roadmap

KW - Unknown dynamic obstacles

UR - http://www.scopus.com/inward/record.url?scp=84983332083&partnerID=8YFLogxK

U2 - 10.2316/Journal.206.2016.4.206-4255

DO - 10.2316/Journal.206.2016.4.206-4255

M3 - 文章

AN - SCOPUS:84983332083

SN - 0826-8185

VL - 31

SP - 290

EP - 300

JO - International Journal of Robotics and Automation

JF - International Journal of Robotics and Automation

IS - 4

ER -

RSMDP-based robust Q-learning for optimal path planning in a dynamic environment

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this