TY - JOUR
T1 - Inter-view dual-domain guided stable diffusion for real-world stereo image super-resolution
AU - Zhang, Jingcheng
AU - Zhu, Yu
AU - Niu, Axi
AU - Sun, Jinqiu
AU - Zhang, Yanning
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2026/1/15
Y1 - 2026/1/15
N2 - Recently, pre-trained text-to-image diffusion models have shown promising potential in restoring texture details lost in low-resolution images due to their powerful generative ability. However, directly applying these models to real-world stereo image super-resolution (Real-SSR) neglects the essential consistency between left and right views, resulting in inconsistent stereo content and amplified visual artifacts. To address this problem, we propose a Complementary Semantic-aware Inter-view Dual-domain Guided Stable Diffusion (CSID-Diff) network for Real-SSR, which leverages complementary texture, structural, and semantic information from low-resolution stereo images to guide the diffusion model to generate high-quality results with inter-view consistency. Specifically, we propose a Dual-domain Guided ControlNet that establishes complementary interactions in both the image feature domain and the disparity domain, and fuses dual-domain features to enforce inter-view texture and structural consistency. To further address viewpoint-induced discrepancies in semantic information between left and right images, we introduce a Complementary Semantic Feature Extraction Module (CSFEM) to enforce inter-view semantic consistency. Extensive experiments demonstrate that our approach delivers superior stereo image reconstruction, achieving both high quality and inter-view consistency, outperforming state-of-the-art methods on both synthetic and real-world datasets.
AB - Recently, pre-trained text-to-image diffusion models have shown promising potential in restoring texture details lost in low-resolution images due to their powerful generative ability. However, directly applying these models to real-world stereo image super-resolution (Real-SSR) neglects the essential consistency between left and right views, resulting in inconsistent stereo content and amplified visual artifacts. To address this problem, we propose a Complementary Semantic-aware Inter-view Dual-domain Guided Stable Diffusion (CSID-Diff) network for Real-SSR, which leverages complementary texture, structural, and semantic information from low-resolution stereo images to guide the diffusion model to generate high-quality results with inter-view consistency. Specifically, we propose a Dual-domain Guided ControlNet that establishes complementary interactions in both the image feature domain and the disparity domain, and fuses dual-domain features to enforce inter-view texture and structural consistency. To further address viewpoint-induced discrepancies in semantic information between left and right images, we introduce a Complementary Semantic Feature Extraction Module (CSFEM) to enforce inter-view semantic consistency. Extensive experiments demonstrate that our approach delivers superior stereo image reconstruction, achieving both high quality and inter-view consistency, outperforming state-of-the-art methods on both synthetic and real-world datasets.
KW - Controlnet
KW - Stable diffusion
KW - Stereo image super-resolution
UR - https://www.scopus.com/pages/publications/105012260138
U2 - 10.1016/j.eswa.2025.129201
DO - 10.1016/j.eswa.2025.129201
M3 - 文章
AN - SCOPUS:105012260138
SN - 0957-4174
VL - 296
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 129201
ER -