TY - JOUR
T1 - RSMDP-based robust Q-learning for optimal path planning in a dynamic environment
AU - Zhang, Yunfei
AU - Li, Weilin
AU - De Silva, Clarence W.
PY - 2016
Y1 - 2016
N2 - This paper presents a robust Q-learning method for path planning in a dynamic environment. The method consists of three steps: at first, a regime-switching Markov decision process (RSMDP) is formed to present the dynamic environment; and second, a probabilistic roadmap (PRM) is constructed, integrated with the RSMDP, and stored as a graph whose nodes correspond to a collision-free world state for the robot; finally, an online Q-learning method with dynamic step size, which facilitates robust convergence of the Q-value iteration, is integrated with the PRM to determine an optimal path for reaching the goal. In this manner, the robot is able to use past experience for improving its performance in avoiding not only static obstacles but also moving obstacles, without knowing the nature of the obstacle motion. The use of regime switching in the avoidance of obstacles with unknown motion is particularly innovative. The developed approach is applied to a homecare robot in computer simulation. The results show that the online path planner with Q-learning is able to rapidly and successfully converge to the correct path.
AB - This paper presents a robust Q-learning method for path planning in a dynamic environment. The method consists of three steps: at first, a regime-switching Markov decision process (RSMDP) is formed to present the dynamic environment; and second, a probabilistic roadmap (PRM) is constructed, integrated with the RSMDP, and stored as a graph whose nodes correspond to a collision-free world state for the robot; finally, an online Q-learning method with dynamic step size, which facilitates robust convergence of the Q-value iteration, is integrated with the PRM to determine an optimal path for reaching the goal. In this manner, the robot is able to use past experience for improving its performance in avoiding not only static obstacles but also moving obstacles, without knowing the nature of the obstacle motion. The use of regime switching in the avoidance of obstacles with unknown motion is particularly innovative. The developed approach is applied to a homecare robot in computer simulation. The results show that the online path planner with Q-learning is able to rapidly and successfully converge to the correct path.
KW - Markov decision process
KW - Online Q-learning
KW - Optimal path planning
KW - Probabilistic roadmap
KW - Unknown dynamic obstacles
UR - http://www.scopus.com/inward/record.url?scp=84983332083&partnerID=8YFLogxK
U2 - 10.2316/Journal.206.2016.4.206-4255
DO - 10.2316/Journal.206.2016.4.206-4255
M3 - 文章
AN - SCOPUS:84983332083
SN - 0826-8185
VL - 31
SP - 290
EP - 300
JO - International Journal of Robotics and Automation
JF - International Journal of Robotics and Automation
IS - 4
ER -