PDRL: Towards Deeper States and Further Behaviors in Unsupervised Skill Discovery by Progressive Diversity

Ziming He; Chao Song; Jingchen Li; Haobin Shi

doi:10.1109/TCDS.2024.3471645

PDRL: Towards Deeper States and Further Behaviors in Unsupervised Skill Discovery by Progressive Diversity

Ziming He, Chao Song, Jingchen Li, Haobin Shi

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

Abstract

We present progressive diversity reinforcement learning (PDRL), an unsupervised reinforcement learning (URL) method for discovering diverse skills. PDRL encourages learning behaviors that span multiple steps, particularly by introducing “deeper states”—states that require a longer sequence of actions to reach without repetition. To address the challenges of weak skill diversity and weak exploration in partially observable environments, PDRL employs two indications for skill learning to foster exploration and skill diversity, emphasizing each observation and subtrajectory's accuracy compared to its predecessor. Skill latent variables are represented by mappings from states or trajectories, helping to distinguish and recover learned skills. This dual representation promotes exploration and skill diversity without additional modeling or prior knowledge. PDRL also integrates intrinsic rewards through a combination of observations and subtrajectories, effectively preventing skill duplication. Experiments across multiple benchmarks show that PDRL discovers a broader range of skills compared to existing methods. Additionally, pretraining with PDRL accelerates fine-tuning in goal-conditioned reinforcement learning (GCRL) tasks, as demonstrated in Fetch robotic manipulation tasks.

Original language	English
Pages (from-to)	495-509
Number of pages	15
Journal	IEEE Transactions on Cognitive and Developmental Systems
Volume	17
Issue number	3
DOIs	https://doi.org/10.1109/TCDS.2024.3471645
State	Published - 2025

Keywords

Goal-conditioned reinforcement learning (GCRL)
reinforcement learning (RL)
unsupervised skill discovery

Access to Document

10.1109/TCDS.2024.3471645

Cite this

@article{b8ab9a82ef8c45dba90a4d6fb92e5aaf,

title = "PDRL: Towards Deeper States and Further Behaviors in Unsupervised Skill Discovery by Progressive Diversity",

abstract = "We present progressive diversity reinforcement learning (PDRL), an unsupervised reinforcement learning (URL) method for discovering diverse skills. PDRL encourages learning behaviors that span multiple steps, particularly by introducing “deeper states”—states that require a longer sequence of actions to reach without repetition. To address the challenges of weak skill diversity and weak exploration in partially observable environments, PDRL employs two indications for skill learning to foster exploration and skill diversity, emphasizing each observation and subtrajectory's accuracy compared to its predecessor. Skill latent variables are represented by mappings from states or trajectories, helping to distinguish and recover learned skills. This dual representation promotes exploration and skill diversity without additional modeling or prior knowledge. PDRL also integrates intrinsic rewards through a combination of observations and subtrajectories, effectively preventing skill duplication. Experiments across multiple benchmarks show that PDRL discovers a broader range of skills compared to existing methods. Additionally, pretraining with PDRL accelerates fine-tuning in goal-conditioned reinforcement learning (GCRL) tasks, as demonstrated in Fetch robotic manipulation tasks.",

keywords = "Goal-conditioned reinforcement learning (GCRL), reinforcement learning (RL), unsupervised skill discovery",

author = "Ziming He and Chao Song and Jingchen Li and Haobin Shi",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.",

year = "2025",

doi = "10.1109/TCDS.2024.3471645",

language = "英语",

volume = "17",

pages = "495--509",

journal = "IEEE Transactions on Cognitive and Developmental Systems",

issn = "2379-8920",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "3",

}

TY - JOUR

T1 - PDRL

T2 - Towards Deeper States and Further Behaviors in Unsupervised Skill Discovery by Progressive Diversity

AU - He, Ziming

AU - Song, Chao

AU - Li, Jingchen

AU - Shi, Haobin

PY - 2025

Y1 - 2025

N2 - We present progressive diversity reinforcement learning (PDRL), an unsupervised reinforcement learning (URL) method for discovering diverse skills. PDRL encourages learning behaviors that span multiple steps, particularly by introducing “deeper states”—states that require a longer sequence of actions to reach without repetition. To address the challenges of weak skill diversity and weak exploration in partially observable environments, PDRL employs two indications for skill learning to foster exploration and skill diversity, emphasizing each observation and subtrajectory's accuracy compared to its predecessor. Skill latent variables are represented by mappings from states or trajectories, helping to distinguish and recover learned skills. This dual representation promotes exploration and skill diversity without additional modeling or prior knowledge. PDRL also integrates intrinsic rewards through a combination of observations and subtrajectories, effectively preventing skill duplication. Experiments across multiple benchmarks show that PDRL discovers a broader range of skills compared to existing methods. Additionally, pretraining with PDRL accelerates fine-tuning in goal-conditioned reinforcement learning (GCRL) tasks, as demonstrated in Fetch robotic manipulation tasks.

AB - We present progressive diversity reinforcement learning (PDRL), an unsupervised reinforcement learning (URL) method for discovering diverse skills. PDRL encourages learning behaviors that span multiple steps, particularly by introducing “deeper states”—states that require a longer sequence of actions to reach without repetition. To address the challenges of weak skill diversity and weak exploration in partially observable environments, PDRL employs two indications for skill learning to foster exploration and skill diversity, emphasizing each observation and subtrajectory's accuracy compared to its predecessor. Skill latent variables are represented by mappings from states or trajectories, helping to distinguish and recover learned skills. This dual representation promotes exploration and skill diversity without additional modeling or prior knowledge. PDRL also integrates intrinsic rewards through a combination of observations and subtrajectories, effectively preventing skill duplication. Experiments across multiple benchmarks show that PDRL discovers a broader range of skills compared to existing methods. Additionally, pretraining with PDRL accelerates fine-tuning in goal-conditioned reinforcement learning (GCRL) tasks, as demonstrated in Fetch robotic manipulation tasks.

KW - Goal-conditioned reinforcement learning (GCRL)

KW - reinforcement learning (RL)

KW - unsupervised skill discovery

UR - http://www.scopus.com/inward/record.url?scp=85205764425&partnerID=8YFLogxK

U2 - 10.1109/TCDS.2024.3471645

DO - 10.1109/TCDS.2024.3471645

M3 - 文章

AN - SCOPUS:85205764425

SN - 2379-8920

VL - 17

SP - 495

EP - 509

JO - IEEE Transactions on Cognitive and Developmental Systems

JF - IEEE Transactions on Cognitive and Developmental Systems

IS - 3

ER -

PDRL: Towards Deeper States and Further Behaviors in Unsupervised Skill Discovery by Progressive Diversity

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this