Semantic layout aware generative adversarial network for text-to-image generation

Jieyu Huang; YongHua Zhu; Zhuo Bi; Wenjun Zhang

doi:10.1117/12.2674685

23 May 2023 Semantic layout aware generative adversarial network for text-to-image generation

Jieyu Huang, YongHua Zhu, Zhuo Bi, Wenjun Zhang

Proceedings Volume 12604, International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2022); 126041W (2023) https://doi.org/10.1117/12.2674685
Event: 2nd International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2022), 2022, Guangzhou, China

Abstract

Text-to-image(T2I) generation methods aim to synthesize a high-quality image which is semantically consistent with the given text descriptions. Previous (T2I) generative adversarial networks generally first create a low-resolution image with rough shapes and colors, and then refine the initial image into a high-resolution image. Most stacked architecture still remains two main problems. (1) The final images generated by these methods depend heavily on the quality of the initial image. If the initial one is not initialized correctly, the resulted image seems like a simple combination of visual features from several images scales. (2) The cross-modal fusion methods about text and image that previous works widely adopted is limited in the text-image fusion process. In the paper, we propose a novel generation model, which introduce a one-stage backbone directly generate high-quality images without multi generators and a novel semantic layout deep fusion network to sufficiently fuse text features and image features. Experiments on the challenging CUB and COCO-Stuff datasets demonstrates the ability of our model in generating images, regarding both semantic consistency with input text description and visual fidelity.

Citation Download Citation

Jieyu Huang, YongHua Zhu, Zhuo Bi, and Wenjun Zhang "Semantic layout aware generative adversarial network for text-to-image generation", Proc. SPIE 12604, International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2022), 126041W (23 May 2023); https://doi.org/10.1117/12.2674685

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
5 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Image fusion

Semantics

Education and training

Computer vision technology

Image processing

Image quality

Data modeling

Show All Keywords

Keywords/Phrases

Search In:

Publication Years