TY - JOUR
T1 - Diffusion Models
T2 - Unlocking the “4 Secrets” of High-Quality Image Generation
AU - Zhou, Tao
AU - Zhang, Zhe
AU - Zhang, Mingzhe
AU - Chai, Wenwen
AU - Xia, Yong
AU - Hu, Fuyuan
N1 - Publisher Copyright:
© 2026 by the authors.
PY - 2026/4
Y1 - 2026/4
N2 - The diffusion model (DM) is a hot topic in deep generative models and is widely applied in image generation. In diffusion models, there are four main “secrets” that affect high-quality image generation: constructing the diffusion model, improving the sampling velocity, designing the diffusion process, and guiding diffusion models. How should one construct the diffusion model? How can one improve the sampling velocity? How should one design the diffusion process? How should one guide diffusion models? These questions are critical to enhancing diffusion model performance. However, most existing review papers focus on applications, while discussion of the four key technical aspects remains limited. In response, this paper summarizes four key technologies and six representative application directions. First, the basic principles of diffusion models are reviewed from three perspectives: denoising diffusion probabilistic models, noise conditional score network models, and stochastic differential equation models. Second, key techniques for improving sampling velocity are summarized from three perspectives: non-Markovian sampling, knowledge distillation sampling, and discrete optimization sampling. Third, the diffusion process design is summarized from three perspectives: latent space, Transformer-based diffusion, and non-Euclidean space. Fourth, guidance strategies are summarized from three perspectives: classifier guidance, classifier-free guidance, and multimodal guidance. Fifth, the advantages and applications of diffusion models are discussed in high-quality text-to-image generation, high-quality text-to-video generation, and high-quality image-to-image generation. Finally, this paper discusses the challenges faced by diffusion models in image generation. Overall, this review systematically discusses the four “secrets” of diffusion models for image generation and provides a useful reference for future research in this field.
AB - The diffusion model (DM) is a hot topic in deep generative models and is widely applied in image generation. In diffusion models, there are four main “secrets” that affect high-quality image generation: constructing the diffusion model, improving the sampling velocity, designing the diffusion process, and guiding diffusion models. How should one construct the diffusion model? How can one improve the sampling velocity? How should one design the diffusion process? How should one guide diffusion models? These questions are critical to enhancing diffusion model performance. However, most existing review papers focus on applications, while discussion of the four key technical aspects remains limited. In response, this paper summarizes four key technologies and six representative application directions. First, the basic principles of diffusion models are reviewed from three perspectives: denoising diffusion probabilistic models, noise conditional score network models, and stochastic differential equation models. Second, key techniques for improving sampling velocity are summarized from three perspectives: non-Markovian sampling, knowledge distillation sampling, and discrete optimization sampling. Third, the diffusion process design is summarized from three perspectives: latent space, Transformer-based diffusion, and non-Euclidean space. Fourth, guidance strategies are summarized from three perspectives: classifier guidance, classifier-free guidance, and multimodal guidance. Fifth, the advantages and applications of diffusion models are discussed in high-quality text-to-image generation, high-quality text-to-video generation, and high-quality image-to-image generation. Finally, this paper discusses the challenges faced by diffusion models in image generation. Overall, this review systematically discusses the four “secrets” of diffusion models for image generation and provides a useful reference for future research in this field.
KW - denoising diffusion model
KW - diffusion model
KW - image generation
KW - noisy conditional scoring network
KW - score-based models
UR - https://www.scopus.com/pages/publications/105037214545
U2 - 10.3390/electronics15081755
DO - 10.3390/electronics15081755
M3 - 文献综述
AN - SCOPUS:105037214545
SN - 2079-9292
VL - 15
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 8
M1 - 1755
ER -