Generating description with multi-feature and saliency maps of image

Lisha Liu; Chunna Tian; Ruiguo Zhang; Yuxuan Ding

doi:10.1117/12.2557584

3 January 2020 Generating description with multi-feature and saliency maps of image

Lisha Liu, Chunna Tian, Ruiguo Zhang, Yuxuan Ding

Proceedings Volume 11373, Eleventh International Conference on Graphics and Image Processing (ICGIP 2019); 113730Z (2020) https://doi.org/10.1117/12.2557584
Event: Eleventh International Conference on Graphics and Image Processing, 2019, Hangzhou, China

Abstract

Automatically generating the description of an image is a task that connects computer vision and natural language processing. It has gained more and more attention in the field of artificial intelligence. In this paper, we present a model that generates description for images based on RNN (recurrent neural network) with multi-feature weighted by object attention to represent images. We use LSTM (long short term memory), which is a RNN model, to translate multi-feature of images to text. Most existing methods use single CNN (convolution neural network) trained on ImageNet to extract image features which mainly focuses on objects in images. However, the context in the scene is also informative to image captioning. So we incorporate the scene feature extracted with CNN trained on Places205. We evaluate our model on MSCOCO dataset based on standard metrics. Experiments show that multi-feature performs better than single feature. In addition, the saliency weight on images emphasizes the salient objects in images as the subject in image descriptions. The results show that our model performs better than several state-of-the-art methods on image captioning.

Citation Download Citation

Lisha Liu, Chunna Tian, Ruiguo Zhang, and Yuxuan Ding "Generating description with multi-feature and saliency maps of image", Proc. SPIE 11373, Eleventh International Conference on Graphics and Image Processing (ICGIP 2019), 113730Z (3 January 2020); https://doi.org/10.1117/12.2557584

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
9 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Visualization

Feature extraction

Neural networks

Visual process modeling

Content addressable memory

Image retrieval

Image segmentation

Show All Keywords

Keywords/Phrases

Search In:

Publication Years