Visual question answering using hierarchical dynamic memory networks

Jiayu Shang; Shiren Li; Zhikui Duan; Junwei Huang

doi:10.1117/12.2302484

10 April 2018 Visual question answering using hierarchical dynamic memory networks

Jiayu Shang, Shiren Li, Zhikui Duan, Junwei Huang

Proceedings Volume 10615, Ninth International Conference on Graphic and Image Processing (ICGIP 2017); 106153V (2018) https://doi.org/10.1117/12.2302484
Event: Ninth International Conference on Graphic and Image Processing, 2017, Qingdao, China

Abstract

Visual Question Answering (VQA) is one of the most popular research fields in machine learning which aims to let the computer learn to answer natural language questions with images. In this paper, we propose a new method called hierarchical dynamic memory networks (HDMN), which takes both question attention and visual attention into consideration impressed by Co-Attention method, which is the best (or among the best) algorithm for now. Additionally, we use bi-directional LSTMs, which have a better capability to remain more information from the question and image, to replace the old unit so that we can capture information from both past and future sentences to be used. Then we rebuild the hierarchical architecture for not only question attention but also visual attention. What’s more, we accelerate the algorithm via a new technic called Batch Normalization which helps the network converge more quickly than other algorithms. The experimental result shows that our model improves the state of the art on the large COCO-QA dataset, compared with other methods.

Citation Download Citation

Jiayu Shang, Shiren Li, Zhikui Duan, and Junwei Huang "Visual question answering using hierarchical dynamic memory networks", Proc. SPIE 10615, Ninth International Conference on Graphic and Image Processing (ICGIP 2017), 106153V (10 April 2018); https://doi.org/10.1117/12.2302484

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
9 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Visualization

Neural networks

Computer programming

Visual process modeling

Convolution

Data modeling

Electronics

Show All Keywords

Keywords/Phrases

Search In:

Publication Years