Abstract
With the growing prevalence of screen content images in multimedia communication, efficient compression has become increasingly crucial. Unlike natural scene images, screen content typically contains rich text regions that exhibit unique characteristics and low correlation with surrounding non-text elements. The intricate mixture of text and non-text within images poses significant challenges for existing learned compression networks, as the text and non-text features are severely entangled in the latent domain along the channel dimension, leading to compromised reconstruction quality and suboptimal entropy estimation. In this paper, we propose a novel Disentangled Image Compression Architecture (DICA) that enhances the analysis module and the entropy model of existing compression architectures to address these limitations. First, we introduce a Disentangled Analysis Module (DAM) by augmenting original analysis modules with an additional text approximation branch and a disentangling network. They work in concert to disentangle latent features into text and non-text classes along the channel dimension, resulting in a more structured feature distribution that better aligns with compression requirements. Second, we propose a Disentangled Channel-Conditional Entropy Model (DCEM) that efficiently leverages the feature distribution bias introduced by DAM, thereby further improving compression performance. Experimental results demonstrate that the proposed DICA, along with DAM and DCEM can be integrated into various channel-conditional compression backbones, significantly improving their performance in screen content compression–particularly in hard-to-compress text regions. When integrated with an advanced WACNN backbone, our method achieves a 13% overall BD-Rate gain and a 16% BD-Rate gain in text regions on the SIQAD dataset.
| Original language | English |
|---|---|
| Pages (from-to) | 2505-2519 |
| Number of pages | 15 |
| Journal | IEEE Transactions on Circuits and Systems for Video Technology |
| Volume | 36 |
| Issue number | 2 |
| DOIs | |
| State | Published - 2026 |
Keywords
- Image compression
- latent feature disentanglement
- screen content image
Fingerprint
Dive into the research topics of 'Text and Non-Text Latent Feature Disentanglement for Screen Content Image Compression'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver