TY - GEN
T1 - MJPR
T2 - 2025 IEEE International Conference on Robotics and Automation, ICRA 2025
AU - Wang, Zehan
AU - He, Ziming
AU - Wang, Zijia
AU - He, Hua
AU - Yang, Beiya
AU - Shi, Haobin
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Multi-modal reinforcement learning (RL) has been brought into focus due to its ability to provide complementary information from different sensors, enriching observations of agents. However, the introduction of multi-modal highdimensional observations brings challenges to sample efficiency. There is a lack of research on how to efficiently obtain multi-modal latent states while encouraging them to generate complementary information. To address this, we propose a representation learning method, Multi-modal Joint Predictive Representation (MJPR), which utilizes multi-modal interactive information to predict future latent states. The joint prediction method achieves the representation training for modalities and promotes each modality to generate complementary information related to predictions of each other. In addition, we introduce multi-modal loss balancing to prompt training equilibrium and cross-modal contrastive learning (CMCL) to align the modalities for effective modal interaction. We establish the multi-modal environments in the Deepmind Control suite (DMC) and Webots and compare our method with current RL representation methods. Experimental results show that MJPR outperforms state-of-the-art methods by an average of 12.0% on six subtasks in DMC environments. It outperforms advanced methods by 16.7% and 55.4% in simple tasks and complex tasks of Webots environment, respectively. Moreover, ablation experiments are established in the DMC environment to verify the importance of each module to MJPR.
AB - Multi-modal reinforcement learning (RL) has been brought into focus due to its ability to provide complementary information from different sensors, enriching observations of agents. However, the introduction of multi-modal highdimensional observations brings challenges to sample efficiency. There is a lack of research on how to efficiently obtain multi-modal latent states while encouraging them to generate complementary information. To address this, we propose a representation learning method, Multi-modal Joint Predictive Representation (MJPR), which utilizes multi-modal interactive information to predict future latent states. The joint prediction method achieves the representation training for modalities and promotes each modality to generate complementary information related to predictions of each other. In addition, we introduce multi-modal loss balancing to prompt training equilibrium and cross-modal contrastive learning (CMCL) to align the modalities for effective modal interaction. We establish the multi-modal environments in the Deepmind Control suite (DMC) and Webots and compare our method with current RL representation methods. Experimental results show that MJPR outperforms state-of-the-art methods by an average of 12.0% on six subtasks in DMC environments. It outperforms advanced methods by 16.7% and 55.4% in simple tasks and complex tasks of Webots environment, respectively. Moreover, ablation experiments are established in the DMC environment to verify the importance of each module to MJPR.
UR - https://www.scopus.com/pages/publications/105016640577
U2 - 10.1109/ICRA55743.2025.11128137
DO - 10.1109/ICRA55743.2025.11128137
M3 - 会议稿件
AN - SCOPUS:105016640577
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 4775
EP - 4781
BT - 2025 IEEE International Conference on Robotics and Automation, ICRA 2025
A2 - Ott, Christian
A2 - Admoni, Henny
A2 - Behnke, Sven
A2 - Bogdan, Stjepan
A2 - Bolopion, Aude
A2 - Choi, Youngjin
A2 - Ficuciello, Fanny
A2 - Gans, Nicholas
A2 - Gosselin, Clement
A2 - Harada, Kensuke
A2 - Kayacan, Erdal
A2 - Kim, H. Jin
A2 - Leutenegger, Stefan
A2 - Liu, Zhe
A2 - Maiolino, Perla
A2 - Marques, Lino
A2 - Matsubara, Takamitsu
A2 - Mavromatti, Anastasia
A2 - Minor, Mark
A2 - O'Kane, Jason
A2 - Park, Hae Won
A2 - Park, Hae-Won
A2 - Rekleitis, Ioannis
A2 - Renda, Federico
A2 - Ricci, Elisa
A2 - Riek, Laurel D.
A2 - Sabattini, Lorenzo
A2 - Shen, Shaojie
A2 - Sun, Yu
A2 - Wieber, Pierre-Brice
A2 - Yamane, Katsu
A2 - Yu, Jingjin
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 May 2025 through 23 May 2025
ER -