Deep Reinforcement Learning-based End-to-End Navigation of Mobile Robots With Reward Shaping

Yufeng Li; Jian Gao; Yimin Chen; Yaozhen He; Boxu Min

doi:10.1109/INDIN58382.2024.10774473

Deep Reinforcement Learning-based End-to-End Navigation of Mobile Robots With Reward Shaping

Yufeng Li, Jian Gao, Yimin Chen, Yaozhen He, Boxu Min

航海学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

This paper proposes an end-to-end autonomous navigation algorithm for unknown environments based on deep reinforcement learning (DRL), which maps the lidar data collected by the robot into control commands. The proposed LM-TD3 algorithm utilizes the Twin Delayed Deep Deterministic(TD3) policy gradient network as the backbone to generate robot action control in continuous spaces. Based on this, the Long Short-Term Memory (LSTM) neural network is introduced into the actor and critic networks, allowing the model to store long-term navigation experiences to increase its ability to perceive and handle surrounding obstacles. Furthermore, a novel reward function in DRL is designed to smooth the motion pose of the robot while controlling the robot to achieve target tracking. Finally, to enhance the early learning efficiency of the DRL network, a Hindsight Experience Replay (HER) strategy is designed specifically for the autonomous navigation system to enhance the convergence speed of the algorithm. To validate the effectiveness of the LM-TD3 algorithm with simulation experiments, scenarios of varying complexities are designed to verify the navigation ability. Compared with the TD3 algorithm, the proposed LMTD3 method can generate shorter paths with enhanced obstacle avoidance capabilities, while also maintaining more stable robot posture control.

源语言	英语
主期刊名	Proceedings - 2024 IEEE 22nd International Conference on Industrial Informatics, INDIN 2024
出版商	Institute of Electrical and Electronics Engineers Inc.
ISBN（电子版）	9798331527471
DOI	https://doi.org/10.1109/INDIN58382.2024.10774473
出版状态	已出版 - 2024
活动	22nd IEEE International Conference on Industrial Informatics, INDIN 2024 - Beijing, 中国期限: 18 8月 2024 → 20 8月 2024

出版系列

姓名	IEEE International Conference on Industrial Informatics (INDIN)
ISSN（印刷版）	1935-4576

会议

会议	22nd IEEE International Conference on Industrial Informatics, INDIN 2024
国家/地区	中国
市	Beijing
时期	18/08/24 → 20/08/24

访问文件

10.1109/INDIN58382.2024.10774473

其它文件与链接

链接到 Scopus 的出版物

引用此

Li, Y., Gao, J., Chen, Y., He, Y., & Min, B. (2024). Deep Reinforcement Learning-based End-to-End Navigation of Mobile Robots With Reward Shaping. 在 Proceedings - 2024 IEEE 22nd International Conference on Industrial Informatics, INDIN 2024 (IEEE International Conference on Industrial Informatics (INDIN)). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/INDIN58382.2024.10774473

@inproceedings{19e6c8b1848f4f4795d845fed3de0806,

title = "Deep Reinforcement Learning-based End-to-End Navigation of Mobile Robots With Reward Shaping",

abstract = "This paper proposes an end-to-end autonomous navigation algorithm for unknown environments based on deep reinforcement learning (DRL), which maps the lidar data collected by the robot into control commands. The proposed LM-TD3 algorithm utilizes the Twin Delayed Deep Deterministic(TD3) policy gradient network as the backbone to generate robot action control in continuous spaces. Based on this, the Long Short-Term Memory (LSTM) neural network is introduced into the actor and critic networks, allowing the model to store long-term navigation experiences to increase its ability to perceive and handle surrounding obstacles. Furthermore, a novel reward function in DRL is designed to smooth the motion pose of the robot while controlling the robot to achieve target tracking. Finally, to enhance the early learning efficiency of the DRL network, a Hindsight Experience Replay (HER) strategy is designed specifically for the autonomous navigation system to enhance the convergence speed of the algorithm. To validate the effectiveness of the LM-TD3 algorithm with simulation experiments, scenarios of varying complexities are designed to verify the navigation ability. Compared with the TD3 algorithm, the proposed LMTD3 method can generate shorter paths with enhanced obstacle avoidance capabilities, while also maintaining more stable robot posture control.",

keywords = "Autonomous navigation, Deep reinforcement learning, Mobile robot, Robot pose",

author = "Yufeng Li and Jian Gao and Yimin Chen and Yaozhen He and Boxu Min",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 22nd IEEE International Conference on Industrial Informatics, INDIN 2024 ; Conference date: 18-08-2024 Through 20-08-2024",

year = "2024",

doi = "10.1109/INDIN58382.2024.10774473",

language = "英语",

series = "IEEE International Conference on Industrial Informatics (INDIN)",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "Proceedings - 2024 IEEE 22nd International Conference on Industrial Informatics, INDIN 2024",

}

Li, Y, Gao, J, Chen, Y, He, Y & Min, B 2024, Deep Reinforcement Learning-based End-to-End Navigation of Mobile Robots With Reward Shaping. 在 Proceedings - 2024 IEEE 22nd International Conference on Industrial Informatics, INDIN 2024. IEEE International Conference on Industrial Informatics (INDIN), Institute of Electrical and Electronics Engineers Inc., 22nd IEEE International Conference on Industrial Informatics, INDIN 2024, Beijing, 中国, 18/08/24. https://doi.org/10.1109/INDIN58382.2024.10774473

Deep Reinforcement Learning-based End-to-End Navigation of Mobile Robots With Reward Shaping. / Li, Yufeng; Gao, Jian; Chen, Yimin 等.
Proceedings - 2024 IEEE 22nd International Conference on Industrial Informatics, INDIN 2024. Institute of Electrical and Electronics Engineers Inc., 2024. (IEEE International Conference on Industrial Informatics (INDIN)).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Deep Reinforcement Learning-based End-to-End Navigation of Mobile Robots With Reward Shaping

AU - Li, Yufeng

AU - Gao, Jian

AU - Chen, Yimin

AU - He, Yaozhen

AU - Min, Boxu

PY - 2024

Y1 - 2024

N2 - This paper proposes an end-to-end autonomous navigation algorithm for unknown environments based on deep reinforcement learning (DRL), which maps the lidar data collected by the robot into control commands. The proposed LM-TD3 algorithm utilizes the Twin Delayed Deep Deterministic(TD3) policy gradient network as the backbone to generate robot action control in continuous spaces. Based on this, the Long Short-Term Memory (LSTM) neural network is introduced into the actor and critic networks, allowing the model to store long-term navigation experiences to increase its ability to perceive and handle surrounding obstacles. Furthermore, a novel reward function in DRL is designed to smooth the motion pose of the robot while controlling the robot to achieve target tracking. Finally, to enhance the early learning efficiency of the DRL network, a Hindsight Experience Replay (HER) strategy is designed specifically for the autonomous navigation system to enhance the convergence speed of the algorithm. To validate the effectiveness of the LM-TD3 algorithm with simulation experiments, scenarios of varying complexities are designed to verify the navigation ability. Compared with the TD3 algorithm, the proposed LMTD3 method can generate shorter paths with enhanced obstacle avoidance capabilities, while also maintaining more stable robot posture control.

AB - This paper proposes an end-to-end autonomous navigation algorithm for unknown environments based on deep reinforcement learning (DRL), which maps the lidar data collected by the robot into control commands. The proposed LM-TD3 algorithm utilizes the Twin Delayed Deep Deterministic(TD3) policy gradient network as the backbone to generate robot action control in continuous spaces. Based on this, the Long Short-Term Memory (LSTM) neural network is introduced into the actor and critic networks, allowing the model to store long-term navigation experiences to increase its ability to perceive and handle surrounding obstacles. Furthermore, a novel reward function in DRL is designed to smooth the motion pose of the robot while controlling the robot to achieve target tracking. Finally, to enhance the early learning efficiency of the DRL network, a Hindsight Experience Replay (HER) strategy is designed specifically for the autonomous navigation system to enhance the convergence speed of the algorithm. To validate the effectiveness of the LM-TD3 algorithm with simulation experiments, scenarios of varying complexities are designed to verify the navigation ability. Compared with the TD3 algorithm, the proposed LMTD3 method can generate shorter paths with enhanced obstacle avoidance capabilities, while also maintaining more stable robot posture control.

KW - Autonomous navigation

KW - Deep reinforcement learning

KW - Mobile robot

KW - Robot pose

UR - http://www.scopus.com/inward/record.url?scp=85215534693&partnerID=8YFLogxK

U2 - 10.1109/INDIN58382.2024.10774473

DO - 10.1109/INDIN58382.2024.10774473

M3 - 会议稿件

AN - SCOPUS:85215534693

T3 - IEEE International Conference on Industrial Informatics (INDIN)

BT - Proceedings - 2024 IEEE 22nd International Conference on Industrial Informatics, INDIN 2024

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 22nd IEEE International Conference on Industrial Informatics, INDIN 2024

Y2 - 18 August 2024 through 20 August 2024

ER -

Li Y, Gao J, Chen Y, He Y, Min B. Deep Reinforcement Learning-based End-to-End Navigation of Mobile Robots With Reward Shaping. 在 Proceedings - 2024 IEEE 22nd International Conference on Industrial Informatics, INDIN 2024. Institute of Electrical and Electronics Engineers Inc. 2024. (IEEE International Conference on Industrial Informatics (INDIN)). doi: 10.1109/INDIN58382.2024.10774473

Deep Reinforcement Learning-based End-to-End Navigation of Mobile Robots With Reward Shaping

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此