Paper
3 February 2023 A CNN-LSTM-based model for fashion image aesthetic captioning
Binbin Yan
Author Affiliations +
Proceedings Volume 12511, Third International Conference on Computer Vision and Data Mining (ICCVDM 2022); 1251119 (2023) https://doi.org/10.1117/12.2660052
Event: Third International Conference on Computer Vision and Data Mining (ICCVDM 2022), 2022, Hulun Buir, China
Abstract
We propose a new task, how to describe apparel in an aesthetic way, which called fashion image aesthetic captioning. It can be beneficial to the E-commerce since there are tons of clothes needed captioned to capture customers’ eyes. It will also help people understand fashion better. We adopt the architecture of encoder-decoder as our baseline. We introduce two classifiers - color harmony classifier pretrained on AVA dataset as well as clothes type classifier to enable encoder to extract more correct features from clothes images. As for decoder we use LSTM with attention mechanism to generate sentences. Additionally, we build a new dataset containing 79,105 fashion images with aesthetic description and attributes. The experiment on the dataset shows great results of our model.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Binbin Yan "A CNN-LSTM-based model for fashion image aesthetic captioning", Proc. SPIE 12511, Third International Conference on Computer Vision and Data Mining (ICCVDM 2022), 1251119 (3 February 2023); https://doi.org/10.1117/12.2660052
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Computer programming

Data modeling

Neural networks

Performance modeling

RELATED CONTENT


Back to Top