TY - JOUR
T1 - PDRL
T2 - Towards Deeper States and Further Behaviors in Unsupervised Skill Discovery by Progressive Diversity
AU - He, Ziming
AU - Song, Chao
AU - Li, Jingchen
AU - Shi, Haobin
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2024
Y1 - 2024
N2 - We present Progressive Diversity Reinforcement Learning (PDRL), an unsupervised reinforcement learning (URL) method for discovering diverse skills. PDRL encourages learning behaviors that span multiple steps, particularly by introducing 'deeper states'-states that require a longer sequence of actions to reach without repetition. To address the challenges of weak skill diversity and weak exploration in partially observable environments, PDRL employs two indications for skill learning to foster exploration and skill diversity, emphasizing each observation and sub-trajectory's accuracy compared to its predecessor. Skill latent variables are represented by mappings from states or trajectories, helping to distinguish and recover learned skills. This dual representation promotes exploration and skill diversity without additional modeling or prior knowledge. PDRL also integrates intrinsic rewards through a combination of observations and sub-trajectories, effectively preventing skill duplication. Experiments across multiple benchmarks show that PDRL discovers a broader range of skills compared to existing methods. Additionally, pre-training with PDRL accelerates fine-tuning in goal-conditioned reinforcement learning (GCRL) tasks, as demonstrated in Fetch robotic manipulation tasks.
AB - We present Progressive Diversity Reinforcement Learning (PDRL), an unsupervised reinforcement learning (URL) method for discovering diverse skills. PDRL encourages learning behaviors that span multiple steps, particularly by introducing 'deeper states'-states that require a longer sequence of actions to reach without repetition. To address the challenges of weak skill diversity and weak exploration in partially observable environments, PDRL employs two indications for skill learning to foster exploration and skill diversity, emphasizing each observation and sub-trajectory's accuracy compared to its predecessor. Skill latent variables are represented by mappings from states or trajectories, helping to distinguish and recover learned skills. This dual representation promotes exploration and skill diversity without additional modeling or prior knowledge. PDRL also integrates intrinsic rewards through a combination of observations and sub-trajectories, effectively preventing skill duplication. Experiments across multiple benchmarks show that PDRL discovers a broader range of skills compared to existing methods. Additionally, pre-training with PDRL accelerates fine-tuning in goal-conditioned reinforcement learning (GCRL) tasks, as demonstrated in Fetch robotic manipulation tasks.
KW - Reinforcement learning(RL)
KW - goal-conditioned reinforcement learning(GCRL)
KW - unsupervised skill discovery
UR - http://www.scopus.com/inward/record.url?scp=85205764425&partnerID=8YFLogxK
U2 - 10.1109/TCDS.2024.3471645
DO - 10.1109/TCDS.2024.3471645
M3 - 文章
AN - SCOPUS:85205764425
SN - 2379-8920
JO - IEEE Transactions on Cognitive and Developmental Systems
JF - IEEE Transactions on Cognitive and Developmental Systems
ER -