TY - JOUR
T1 - DR-IAL
T2 - Decoupling-to-recoupling guided interaction-aware learning for egocentric action recognition
AU - Shao, Jiang
AU - Zou, Xiaochun
AU - Zhao, Xinbo
AU - Zhang, Yanning
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2026/4
Y1 - 2026/4
N2 - In the domain of egocentric action recognition, current auxiliary-supervised static inter-action-aware learning methodologies demonstrate considerable shortcomings in addressing inter-individual action variability, temporal dynamics and inherent long-tailed distribution characteristics of egocentric datasets, largely attributable to rigid feature aggregation mechanisms. This rigidity leads to challenges in generalization, primarily due to an insufficient range of visual experiences. To address these limitations, we propose the Decoupling-to-Recoupling Guided Interactive-Aware Learning framework with Motion-Prompted Adaptive Fusion (DR-IAL). This novel framework mimics the dynamic plasticity inherent in human visual systems through a cognitive learning paradigm characterized by “Perception – Decoupling – Recoupling”. It utilizes a dual-pathway motion perception approach to effectively capture both temporal and spatial motion cues, thereby enabling the adaptive fusion of multi-level visual tempos. Furthermore, we integrate learnable Gaussian prior knowledge and differentiable thresholded binarization techniques to bolster feature robustness in critical interaction zones while minimizing background noise. Notably, We present spatiotemporal decoupling-to-recoupling algorithm that effectively separates orthogonal components utilizing attention masks. This algorithm calculates cross-instance similarity matrices to identify challenging “interactive foreground – contextual background” pairs. Additionally, it implements stochastic channel-mixing recoupling in conjunction with spatiotemporal alignment, all while maintaining interpretable attention distributions through the application of semantic-level label constraints. Empirical results demonstrate that our approach achieves state-of-the-art performance on established benchmarks, including EGTEA and EPIC-KITCHENS-100.
AB - In the domain of egocentric action recognition, current auxiliary-supervised static inter-action-aware learning methodologies demonstrate considerable shortcomings in addressing inter-individual action variability, temporal dynamics and inherent long-tailed distribution characteristics of egocentric datasets, largely attributable to rigid feature aggregation mechanisms. This rigidity leads to challenges in generalization, primarily due to an insufficient range of visual experiences. To address these limitations, we propose the Decoupling-to-Recoupling Guided Interactive-Aware Learning framework with Motion-Prompted Adaptive Fusion (DR-IAL). This novel framework mimics the dynamic plasticity inherent in human visual systems through a cognitive learning paradigm characterized by “Perception – Decoupling – Recoupling”. It utilizes a dual-pathway motion perception approach to effectively capture both temporal and spatial motion cues, thereby enabling the adaptive fusion of multi-level visual tempos. Furthermore, we integrate learnable Gaussian prior knowledge and differentiable thresholded binarization techniques to bolster feature robustness in critical interaction zones while minimizing background noise. Notably, We present spatiotemporal decoupling-to-recoupling algorithm that effectively separates orthogonal components utilizing attention masks. This algorithm calculates cross-instance similarity matrices to identify challenging “interactive foreground – contextual background” pairs. Additionally, it implements stochastic channel-mixing recoupling in conjunction with spatiotemporal alignment, all while maintaining interpretable attention distributions through the application of semantic-level label constraints. Empirical results demonstrate that our approach achieves state-of-the-art performance on established benchmarks, including EGTEA and EPIC-KITCHENS-100.
KW - Decoupling-to-recoupling
KW - Egocentric action recognition
KW - Interaction-aware learning
KW - Motion-prompted adaptive fusion
UR - https://www.scopus.com/pages/publications/105022211523
U2 - 10.1016/j.patcog.2025.112731
DO - 10.1016/j.patcog.2025.112731
M3 - 文章
AN - SCOPUS:105022211523
SN - 0031-3203
VL - 172
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 112731
ER -