Automatic Target Recognition (ATR) often confronts intricate visual scenes, necessitating models capable of discerning subtle nuances. Real-world datasets like the Defense Systems Information Analysis Center (DSIAC) ATR database exhibit unimodal characteristics, hindering performance, and lack contextual information for each frame. To address these limitations, we enrich the DSIAC dataset by algorithmically generating captions and proposing new train/test splits, thereby creating a rich multimodal training landscape. To effectively leverage these captions, we explore the integration of a vision-language model, specifically Contrastive Language-Image Pre-training (CLIP), which combines visual perception with linguistic descriptors. At the core of our methodology lies a homotopy-based multi-objective optimization technique, designed to achieve a harmonious balance between model precision, generalizability, and interpretability. Our framework, developed using PyTorch Lightning and Ray Tune for advanced distributed hyperparameter optimization, enhances models to meet the intricate demands of practical ATR applications. All code and data is available at https://github.com/sabraha2/ATR-CLIP-Multi-Objective-Homotopy-Optimization.
Deep learning has expedited important breakthroughs in research and commercial applications for next-generation technologies across many domains including Automatic Target Recognition (ATR). The success of these models in a specific application is often attributed to optimized hyperparameters: user-configured values controlling the model’s ability to learn from data. Tuning hyperparameters however remains a difficult and computationally expensive task contributing to deficient ATR model performance compared to set requirements. We present the efficacy of applying our developed hyperparameter optimization method to boost the effectiveness and performance of any given optimization method. Specifically, we use a generalized additive model surrogate homotopy hyperparameter optimization strategy to approximate regions of interest and trace minimal points over regions of the hyperparameter space instead of ineffectively evaluating the entire hyperparameter surface. We integrate our approach into SHADHO (Scalable Hardware-Aware Distributed Hyperparameter Optimization) a hyperparameter optimization framework that computes the relative complexity of each search space and then monitors the performance of the learning task over the trials. We demonstrate how our approach effectively finds optimal hyperparameters for object detection by conducting a model search to optimize multiple object detection algorithms on a subset of the DSIAC ATR Algorithm Development Image Database and finding models that achieve comparable or lower validation loss in fewer iterations than standard techniques and manual tuning practices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.