Paper
23 May 2023 Semantic layout aware generative adversarial network for text-to-image generation
Jieyu Huang, YongHua Zhu, Zhuo Bi, Wenjun Zhang
Author Affiliations +
Proceedings Volume 12604, International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2022); 126041W (2023) https://doi.org/10.1117/12.2674685
Event: 2nd International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2022), 2022, Guangzhou, China
Abstract
Text-to-image(T2I) generation methods aim to synthesize a high-quality image which is semantically consistent with the given text descriptions. Previous (T2I) generative adversarial networks generally first create a low-resolution image with rough shapes and colors, and then refine the initial image into a high-resolution image. Most stacked architecture still remains two main problems. (1) The final images generated by these methods depend heavily on the quality of the initial image. If the initial one is not initialized correctly, the resulted image seems like a simple combination of visual features from several images scales. (2) The cross-modal fusion methods about text and image that previous works widely adopted is limited in the text-image fusion process. In the paper, we propose a novel generation model, which introduce a one-stage backbone directly generate high-quality images without multi generators and a novel semantic layout deep fusion network to sufficiently fuse text features and image features. Experiments on the challenging CUB and COCO-Stuff datasets demonstrates the ability of our model in generating images, regarding both semantic consistency with input text description and visual fidelity.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jieyu Huang, YongHua Zhu, Zhuo Bi, and Wenjun Zhang "Semantic layout aware generative adversarial network for text-to-image generation", Proc. SPIE 12604, International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2022), 126041W (23 May 2023); https://doi.org/10.1117/12.2674685
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image fusion

Semantics

Education and training

Computer vision technology

Image processing

Image quality

Data modeling

RELATED CONTENT

Face aging on SiGan
Proceedings of SPIE (March 16 2023)
A text driven image style transfer model based on CLIP...
Proceedings of SPIE (October 10 2023)

Back to Top