TY - JOUR
T1 - Robotic Locomotion Skill Learning Using Unsupervised Reinforcement Learning With Controllable Latent Space Partition
AU - He, Ziming
AU - Chen, Pengyu
AU - Shi, Haobin
AU - Li, Jingchen
AU - Hwang, Kao Shing
N1 - Publisher Copyright:
© 2005-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Effective skill learning in an unsupervised manner is one of the capabilities an intelligent agent or robot should have. The discovered task-agnostic skills can be fine-tuned to downstream long-horizon tasks to improve execution efficiency. Unfortunately, the self-learning of locomotion skills, which occurs naturally in infancy, has been slow to develop in robotics. The instability exhibited by existing skill-learning methods makes it difficult to directly apply to complex control tasks, such as humanoid robots. To acquire reliable robotic locomotion skills, this article proposes a controllable latent space partition framework to assist reinforcement learning in accomplishing practicability-oriented unsupervised skill discovery (PoSD). Specifically, we use the distance similarity measure of the trajectory feature space to introduce the indicative information of the expert demonstrations into the partitioning and mapping process of the latent space. In addition, the intrinsic subrewards based on contrastive learning and particle entropy are designed to promote skill diversity and encourage exploration. Finally, reinforcement learning completes the generation of skill-conditioned policy driven by composite intrinsic rewards. The performance investigation of our method is conducted on five robots with more than 15 skills. The results indicate that PoSD achieves noticeable improvements in adaptation efficiency and practicability compared with other SOTA unsupervised skill discovery methods.
AB - Effective skill learning in an unsupervised manner is one of the capabilities an intelligent agent or robot should have. The discovered task-agnostic skills can be fine-tuned to downstream long-horizon tasks to improve execution efficiency. Unfortunately, the self-learning of locomotion skills, which occurs naturally in infancy, has been slow to develop in robotics. The instability exhibited by existing skill-learning methods makes it difficult to directly apply to complex control tasks, such as humanoid robots. To acquire reliable robotic locomotion skills, this article proposes a controllable latent space partition framework to assist reinforcement learning in accomplishing practicability-oriented unsupervised skill discovery (PoSD). Specifically, we use the distance similarity measure of the trajectory feature space to introduce the indicative information of the expert demonstrations into the partitioning and mapping process of the latent space. In addition, the intrinsic subrewards based on contrastive learning and particle entropy are designed to promote skill diversity and encourage exploration. Finally, reinforcement learning completes the generation of skill-conditioned policy driven by composite intrinsic rewards. The performance investigation of our method is conducted on five robots with more than 15 skills. The results indicate that PoSD achieves noticeable improvements in adaptation efficiency and practicability compared with other SOTA unsupervised skill discovery methods.
KW - Deep reinforcement learning (DRL)
KW - robotic control
KW - skill discovery
KW - unsupervised reinforcement learning (URL)
UR - http://www.scopus.com/inward/record.url?scp=85207462074&partnerID=8YFLogxK
U2 - 10.1109/TII.2024.3468453
DO - 10.1109/TII.2024.3468453
M3 - 文章
AN - SCOPUS:85207462074
SN - 1551-3203
VL - 21
SP - 902
EP - 911
JO - IEEE Transactions on Industrial Informatics
JF - IEEE Transactions on Industrial Informatics
IS - 1
ER -