2 May 2024 Spiking ViT: spiking neural networks with transformer—attention for steel surface defect classification
Liang Gong, Hang Dong, Xinyu Zhang, Xin Cheng, Fan Ye, Liangchao Guo, Zhenghui Ge
Author Affiliations +
Abstract

Throughout the steel production process, a variety of surface defects inevitably occur. These defects can impair the quality of steel products and reduce manufacturing efficiency. Therefore, it is crucial to study and categorize the multiple defects on the surface of steel strips. Vision transformer (ViT) is a unique neural network model based on a self-attention mechanism that is widely used in many different disciplines. Conventional ViT ignores the specifics of brain signaling and instead uses activation functions to simulate genuine neurons. One of the fundamental building blocks of a spiking neural network is leaky integration and fire (LIF), which has biodynamic characteristics akin to those of a genuine neuron. LIF neurons work in an event-driven manner such that higher performance can be achieved with less power. The goal of this work is to integrate ViT and LIF neurons to build and train an end-to-end hybrid network architecture, spiking vision transformer (S-ViT), for the classification of steel surface defects. The framework relies on the ViT architecture by replacing the activation functions used in ViT with LIF neurons, constructing a global spike feature fusion module spiking transformer encoder as well as a spiking-MLP classification head for implementing the classification functionality and using it as a basic building block of S-ViT. Based on the experimental results, our method has demonstrated outstanding classification performance across all metrics. The overall test accuracies of S-ViT are 99.41%, 99.65%, 99.54%, and 99.77% on NEU-CLSs, and 95.70%, 95.93%, 96.94%, and 97.19% on XSDD. S-ViT achieves superior classification performance compared to convolutional neural networks and recent findings. Its performance is also improved relative to the original ViT model. Furthermore, the robustness test results of S-ViT show that S-ViT still maintains reliable accuracy when recognizing images that contain Gaussian noise.

© 2024 SPIE and IS&T
Liang Gong, Hang Dong, Xinyu Zhang, Xin Cheng, Fan Ye, Liangchao Guo, and Zhenghui Ge "Spiking ViT: spiking neural networks with transformer—attention for steel surface defect classification," Journal of Electronic Imaging 33(3), 033001 (2 May 2024). https://doi.org/10.1117/1.JEI.33.3.033001
Received: 21 September 2023; Accepted: 8 April 2024; Published: 2 May 2024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Neurons

Laser induced fluorescence

Transformers

Neural networks

Education and training

Data modeling

Feature extraction

Back to Top