TY - GEN
T1 - Deep Reinforcement Learning-based End-to-End Navigation of Mobile Robots With Reward Shaping
AU - Li, Yufeng
AU - Gao, Jian
AU - Chen, Yimin
AU - He, Yaozhen
AU - Min, Boxu
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This paper proposes an end-to-end autonomous navigation algorithm for unknown environments based on deep reinforcement learning (DRL), which maps the lidar data collected by the robot into control commands. The proposed LM-TD3 algorithm utilizes the Twin Delayed Deep Deterministic(TD3) policy gradient network as the backbone to generate robot action control in continuous spaces. Based on this, the Long Short-Term Memory (LSTM) neural network is introduced into the actor and critic networks, allowing the model to store long-term navigation experiences to increase its ability to perceive and handle surrounding obstacles. Furthermore, a novel reward function in DRL is designed to smooth the motion pose of the robot while controlling the robot to achieve target tracking. Finally, to enhance the early learning efficiency of the DRL network, a Hindsight Experience Replay (HER) strategy is designed specifically for the autonomous navigation system to enhance the convergence speed of the algorithm. To validate the effectiveness of the LM-TD3 algorithm with simulation experiments, scenarios of varying complexities are designed to verify the navigation ability. Compared with the TD3 algorithm, the proposed LMTD3 method can generate shorter paths with enhanced obstacle avoidance capabilities, while also maintaining more stable robot posture control.
AB - This paper proposes an end-to-end autonomous navigation algorithm for unknown environments based on deep reinforcement learning (DRL), which maps the lidar data collected by the robot into control commands. The proposed LM-TD3 algorithm utilizes the Twin Delayed Deep Deterministic(TD3) policy gradient network as the backbone to generate robot action control in continuous spaces. Based on this, the Long Short-Term Memory (LSTM) neural network is introduced into the actor and critic networks, allowing the model to store long-term navigation experiences to increase its ability to perceive and handle surrounding obstacles. Furthermore, a novel reward function in DRL is designed to smooth the motion pose of the robot while controlling the robot to achieve target tracking. Finally, to enhance the early learning efficiency of the DRL network, a Hindsight Experience Replay (HER) strategy is designed specifically for the autonomous navigation system to enhance the convergence speed of the algorithm. To validate the effectiveness of the LM-TD3 algorithm with simulation experiments, scenarios of varying complexities are designed to verify the navigation ability. Compared with the TD3 algorithm, the proposed LMTD3 method can generate shorter paths with enhanced obstacle avoidance capabilities, while also maintaining more stable robot posture control.
KW - Autonomous navigation
KW - Deep reinforcement learning
KW - Mobile robot
KW - Robot pose
UR - http://www.scopus.com/inward/record.url?scp=85215534693&partnerID=8YFLogxK
U2 - 10.1109/INDIN58382.2024.10774473
DO - 10.1109/INDIN58382.2024.10774473
M3 - 会议稿件
AN - SCOPUS:85215534693
T3 - IEEE International Conference on Industrial Informatics (INDIN)
BT - Proceedings - 2024 IEEE 22nd International Conference on Industrial Informatics, INDIN 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd IEEE International Conference on Industrial Informatics, INDIN 2024
Y2 - 18 August 2024 through 20 August 2024
ER -