SPIE Journal Paper | 1 February 2024
Nikola Pižurica, Kosta Pavlović, Slavko Kovačević, Igor Jovančević, Miguel de Prado
KEYWORDS: Image segmentation, Data modeling, Mathematical optimization, Performance modeling, Education and training, Network architectures, Visual inspection, Defect detection, Visual process modeling, Deep learning
Visual inspection plays a pivotal role in numerous industrial production processes, and the pursuit of automation has surged with the rise of deep learning and convolutional neural networks (CNNs). Therein, the deployment of visual inspection CNNs on resource-constrained edge devices stands as a critical problem as these devices are the most affordable and well-suited for many industrial applications, e.g., production chains. Nonetheless, it faces challenges in meeting the computational demands of deep CNN models. Consequently, optimizing these models for efficient operation in such settings is imperative. Visual inspection tasks are often highly specialized, differing significantly from general computer vision tasks. As a result, state-of-the-art CNNs can be excessively large for achieving high accuracy on these specific datasets. To address this challenge, this paper introduces a novel approach utilizing neural architecture search (NAS) and hyperparameter optimization. We present the generic toolkit for NAS (GT-NAS), an open-source toolkit available for public use on GitLab (https://gitlab.com/pmf5/open-source/generic-toolkit-for-neural-architecture-search). We showcase the results of applying our methodology to two established state-of-the-art CNN models designed for surface defect detection, a problem that encompasses binary classification and segmentation of images.Our approach yields significantly smaller models relative to baselines, but with accuracy in line with the current state-of-the-art results, demonstrating the potential for enhanced efficiency in industrial visual inspection systems. In one experimental setting (optimizing the Mixed Supervision model on the KolektorSDD2 dataset), GT-NAS produced an architecture that is 6.2 times faster than the baseline while sacrificing only 0.25% of its average precision for binary classification. In another batch of experiments (optimizing the TriNet model on the SensumSODF dataset), GT-NAS also achieved remarkable results. It found a TriNet architecture five times smaller than the baseline, at a small cost of a 0.25% drop in the ROC-AUC classification score on the capsule subset of the SensumSODF dataset. Furthermore, on the softgel subset of the same dataset, GT-NAS produced a model that was 2.7 times smaller than the baseline, yet 0.19% more precise.