Presentation + Paper
31 May 2022 Efficient multi-attribute image classification through context-driven networks
Author Affiliations +
Abstract
Performing many simultaneous tasks on a resource-limited device is challenging due to the limited amount of available computational resources. Efficient and universal model architectures are the key to solving this problem. Existing sub-fields of machine learning, such as Multi-Task Learning (MTL), have proven that learning multiple tasks with a single neural network architecture is possible and even has the potential to improve sample efficiency, memory efficiency, and can be less prone to overfitting. In Visual Question Answering (VQA), a model ingests multi-modal input to produce text-based responses in the context of an image. Our proposed architecture merges the MTL and VQA concepts to form TaskNet. TaskNet solves the visual MTL problem using an input task to provide context to the network and guide its attention mechanism towards providing a relevant response. Our approach saves memory without sacrificing performance relative to naively training independent models. TaskNet efficiently provides multiple fine-grained classifications on a single input image and seamlessly incorporates context-specific metadata to further boost performance in a world of high variance.
Conference Presentation
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Sean Banger, Ryan Ceresani, and Jason Twedt "Efficient multi-attribute image classification through context-driven networks", Proc. SPIE 12096, Automatic Target Recognition XXXII, 120960B (31 May 2022); https://doi.org/10.1117/12.2618977
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Image classification

Visualization

Image processing

Visual process modeling

Data modeling

Neural networks

Back to Top