31 October 2024 Lightweight Vision Transformer for damaged wheat detection and classification using spectrograms
Hao Lin, Min Guo, Miao Ma
Author Affiliations +
Abstract

Grain is one of the basic human necessities, and its quality and safety directly impact human dietary health. Various issues occur during grain storage, primarily mold and pest infestation. With the development of artificial intelligence, increasingly, more technologies are applied to grain detection and classification. Transformer-based models are becoming popular in grain detection. Although transformer models exhibit excellent performance, they are often large and cumbersome, limiting practical applications. We propose a framework named KD-ASF based on intermediate layer knowledge distillation and one-shot neural architecture search, to optimize the hyperparameters of vision transformer (ViT) for detecting and classifying molded wheat kernels (MDK), Insect-Damaged wheat kernels (IDK), and undamaged wheat kernels (UDK). In KD-ASF, we use the ViT model as our teacher network. Next, we design a search space containing adjustable hyperparameters of transformer building blocks. The super-network stacks maximum transformer building blocks and is trained under the guidance of the teacher network. Subsequently, the trained super-network undergoes evolutionary search, and the resulting networks are used for classifying different wheat kernels. We conducted experiments using a five-fold cross-validation approach and obtained an F1 score of 97.13%, and the last model parameter size is only 5.94M. The results demonstrate that this method not only outperforms the majority of neural networks in terms of performance but also has a significantly smaller model size than most network models. Its lightweight nature facilitates easy deployment and application. These findings indicate that the structure of KD-ASF is feasible and effective.

© 2024 SPIE and IS&T
Hao Lin, Min Guo, and Miao Ma "Lightweight Vision Transformer for damaged wheat detection and classification using spectrograms," Journal of Electronic Imaging 33(5), 053063 (31 October 2024). https://doi.org/10.1117/1.JEI.33.5.053063
Received: 17 July 2024; Accepted: 9 October 2024; Published: 31 October 2024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Education and training

Visual process modeling

Head

Performance modeling

Network architectures

Deep convolutional neural networks

Back to Top