T-Person-GAN: Text-to-Person image generation with identity-consistency and manifold mix-up

Deyin Liu, Lin Yuanbo Wu, Bo Li, Ye Zhao, Zongyuan Ge, Jian Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we introduce an end-to-end solution for generating high-resolution person images based solely on textual descriptions. While text-to-image models have made great strides in generating images of objects like flowers and birds, creating person images presents a unique set of challenges: 1) Identity Consistency: For the same person, it's crucial that the generated images exhibit visual details that maintain identity consistency. This means that features like identity-related textures, clothing, and even footwear should be consistent across different images of the same person. 2) Discriminative Power: The generated person images need to be robust in the face of inter-person variations caused by visual ambiguities. To tackle these challenges, we propose a generative model that leverages two novel mechanisms: 1) T-Person-GAN-ID: This mechanism integrates a one-stream generator with an identity-preserving network. It regularizes the representations of generated data in their feature space to ensure identity-consistency. This ensures that images of the same person maintain their unique identity-related features. 2) T-Person-GAN-ID-MM: Manifold mix-up is introduced to create mixed images, which involves linear interpolation between generated images from different manifold identities. We further enforce these interpolated images to be linearly classified in the feature space, essentially learning a linear classification boundary that can perfectly separate images from two distinct identities. The proposed method demonstrates a significant improvement in the challenging task of generating person images from text descriptions. We achieve impressive results with a Fre´chet Inception Distance of 47.81, an Inception Score of 3.96, and a Visual-Semantic Similarity of 0.21 on the benchmark dataset.

Original languageEnglish
Article number128178
JournalExpert Systems with Applications
Volume288
DOIs
StatePublished - 1 Sep 2025

Keywords

  • Conditional generative adversarial networks
  • Manifold mix-up
  • Text-to-Person image generation

Fingerprint

Dive into the research topics of 'T-Person-GAN: Text-to-Person image generation with identity-consistency and manifold mix-up'. Together they form a unique fingerprint.

Cite this