The inspection of solder joints on printed circuit boards is a difficult task because defects inside the joints cannot be observed directly. In addition, because anomalous samples are rarely obtained in a general anomaly detection situation, many methods use only normal samples in the learning phase. However, sometimes a small number of anomalous samples are available for learning. We propose a method to improve performance using a small number of anomalous samples for training in such situations. Specifically, our proposal is an anomaly detection method using an adversarial autoencoder (AAE) and Hotelling’s T-squared distribution. First, the AAE learns features of the solder joint following the standard Gaussian distribution from a large number of normal samples and a small number of anomalous samples. Then, the anomaly score of a solder joint is calculated by Hotelling’s T-squared method from the features learned by the AAE. Finally, anomaly detection is performed by thresholding using this anomaly score. In experiments, we show that our method performs anomaly detection with few false positives in such situations. Moreover, we confirmed that our method outperforms the conventional method using handcrafted features and a one-class support vector machine. |
1.IntroductionInspection of the solder joints on a printed circuit board (PCB) is challenging because such defects cannot be observed directly due to the solder joints being sandwiched between the PCB and an integrated circuit (IC) chip. To solve this problem, automated x-ray inspection, which can perform nondestructive inspection, is generally employed.1,2 In our method, we employed an automated x-ray inspection that collects sliced images of the solder joints by x-ray computed tomography (CT) scans on the x-ray inspection machine and detects defects in the solder joints. In recent years, automatic visual inspection systems using machine learning, especially deep learning, have been studied as a method of classifying normal and anomalous samples. This is motivated by the fact that inspection by human experts is problematic, with fatigue possibly causing the expert to miss anomalous samples. One of the most popular anomaly detection methods using machine learning is a one-class support vector machine (OCSVM).3 This method requires handcrafted features extracted by human experts in advance. Then, the extracted features are input to the trained OCSVM, and inputs are classified by the output of OCSVM. In this case, OCSVM is trained with only normal samples, but it has the disadvantage of the feature needing to be designed by human experts in advance and requiring redesign of the feature extraction method when the product specification is changed. When deep learning methods are used, because product images are directly inputted to neural networks, extracting features by human experts is not required. Therefore, even if the product specification is changed, only network retraining is required; thus the operating cost can be greatly reduced. In general, one of the anomaly detection methods using deep learning is to classify normal and anomalous samples using a binary classifier.4,5 However, in anomaly detection for industrial products, it is difficult to guarantee enough anomalous product samples for training the classifier because defects rarely occur on the production line. Therefore, anomaly detection is generally performed using only normal data.3,6 However, because a small number of anomalous samples is sometimes available for the learning phase, improvement of performance can be expected by adding anomalous samples to the training dataset. In this method, normal samples as well as a small number of anomalous samples were used for learning. In particular, our method extracts features following the standard Gaussian distribution by an adversarial autoencoder (AAE)7 from such imbalanced samples. Furthermore, anomaly scores are calculated from the features by Hotelling’s T-squared method8 and each solder joint is classified by an anomaly score threshold. In this experiment, we show that our method is superior to the method using handcrafted features and OCSVM on the imbalanced samples. Our contribution is a method that detects defects from a large number of normal samples and a small number of anomalous samples during the quality inspection of industrial products. 2.Related WorkRecently, the x-ray CT method has been mainly used to detect anomalies in PCB solder joints because they cannot be observed directly. The x-rays pass through the PCB because it consists of materials with low atomic weight, but solder joints are imaged because they have high atomic weight.9 For example, the solder ball portion of the solder joints is represented as voxel data to obtain the condition of the solder joints using two-dimensional x-ray CT images taken from multiple directions.10 The voxel data are input to a three-dimensional convolutional neural network and classified by the output of the network. However, in typical anomaly detection tasks, a neural network classifier has the problem of requiring both normal and anomalous samples for the training stage, and their prediction performance is unstable for unknown anomalous samples not seen in training samples. Therefore, training methods that can produce satisfactory classification results when only normal samples or a small number of anomalous samples are used are needed. A previously developed anomaly detection method uses an OCSVM in the latent space of extracted features. However, this has some disadvantages. The feature extraction method must be designed beforehand, and the features are changed by every target. To solve this problem, an autoencoder,11 which is a model of a neural network, extracts the features in the latent space from the input samples automatically. In anomaly detection methods using an autoencoder, methods based on reconstruction error6 and the normal condition model in the latent space12 are used. The proposed method belongs to the latter approach. In the former method, the networks are usually trained with only normal samples. As a consequence, the networks can reconstruct normal samples with small reconstruction errors; however, anomalous samples cannot be reconstructed, and the reconstruction errors become large. Hence, the samples are classified by a threshold for reconstruction errors. In the latter method, a normal model is defined in the latent space, and the likelihood of an input sample being in this space is calculated to classify it. In Ref. 12, test samples are classified by a threshold not only for the reconstruction error but also for the likelihood for a Gaussian distribution of the features extracted by the AAE. Compared with our method, it is different in terms of thresholding on the reconstruction error and likelihood, rather than on anomaly scores calculated by Hotelling’s T-squared method. 3.Proposed Method3.1.X-Ray Computed TomographyBecause the solder joints sandwiched between the PCB and the IC chip cannot be inspected directly, we obtain sliced images of the solder joints with x-ray CT. When the IC chip and the PCB are joined, many solder joints are formed. Our approach is to detect each solder joint and cut out these places in advance to capture sliced images of each solder joint. The number of sliced images is taken from each solder joint, and we define these sliced images as the sample for one solder joint. Hence, only anomalous solder joints can be treated as anomalous samples in each PCB where anomalous solder joints exist. An overview of the method for capturing sliced images of a solder joint is shown in Fig. 1. We took eight sliced images from one solder joint. Each is assigned a layer number corresponding to its image layer. Examples of the captured sliced images of normal samples are shown in Fig. 2(a), and Fig. 2(b) shows anomalous samples. 3.2.Hotelling’s T-Squared MethodThe number of anomalous samples is much smaller than the number of normal samples; thus, the normal model is defined from only normal samples or a small number of anomalous samples. If it is assumed that the normal model generated from the dataset and each is represented by the parameter , the negative log-likelihood probability of unknown sample is defined as an anomaly score in the following equation: In the normal model , the probability density of the normal samples is high and that of the anomalous samples is low. Therefore, the anomaly scores of the former are low and those of the latter are high, and it is possible to classify normal and anomalous samples by a threshold on the anomaly score. Hotelling’s T-squared distribution is an anomaly detection method that can be applied to a dataset following a Gaussian distribution. Here, of is calculated as Eq. (2) using the two parameters of the Gaussian distribution, latent vector and variance-covariance matrix : The last term of Eq. (2) is equal to the Mahalanobis distance. Moreover, if and , the dataset follows the standard Gaussian distribution, and is calculated by the following equation: The last term of Eq. (3) is equal to the Euclidean distance. In Hotelling’s T-squared method, follows the chi-square distribution with the degree of freedom of and the scale factor of 1. The chi-square distribution with is shown in Fig. 3.In Fig. 3, the graph shows the likelihood of the value of sampled from the normal model following the standard Gaussian distribution. When an value is high, the probability of being a normal sample is low; therefore, the sample can be regarded as anomalous. Hence, it is possible to classify normal and anomalous samples by predetermining any upper probability on the graph and setting a one-dimensional threshold. 3.3.Adversarial AutoencoderAlthough images are high-dimensional data, they can be compressed to lower-dimensional features in the latent space. This is because normal samples are assumed to have common features. An autoencoder is a low-dimensional feature extractor for neural networks. An autoencoder is composed of two networks: encoder (En) and decoder (De). En is trained to extract features as latent vector from input , where is the data distribution of the input samples. De is trained to reconstruct the input from . The loss function is shown as follows: Principal component analysis13 is another conventional dimensional reduction method, but it can only map linearly from the high-dimensional space to the low-dimensional latent space. The autoencoder enables nonlinear mapping using activation functions and deep layers. This leads to the model extracting more representative features of complex structured data because the projection functions En and De are more flexible. Although low-dimensional features of input samples can be acquired by the autoencoder, the distribution of the features in the latent space cannot be specified. Therefore, to apply the Hotelling’s T-squared method described in Sec. 3.2 to the distribution of the features extracted by the autoencoder, we employ an AAE consisting of the autoencoder and discriminator networks shown in Fig. 4. The AAE allows for matching of the distribution of the latent space to an arbitrary distribution by an adversarial manner.7 To incorporate Hotelling’s T-squared method to the deep generative model, we train the AAE with an adversarial loss between the distribution of the encoded latent vectors and the standard Gaussian distribution. Furthermore, we assume the real-world situation in which a large number of normal samples and a small number of anomalous samples are available. The adversarial training with such imbalanced samples facilitates the normal samples being mapped to the high density of the standard Gaussian distribution and the anomalous samples being mapped to the low density. This means that the AAE constructs a normal model that follows the standard Gaussian distribution in the latent space. Therefore, it is possible to apply Hotelling’s T-squared method in the latent space. The reason for defining the arbitrary distribution as a standard Gaussian distribution is to simplify the anomaly score calculations described in Sec. 3.2. The discriminator is trained to determine whether the input vector is sampled from latent distribution or from standard Gaussian distribution . In contrast, the En is trained to approximate to . These actions are called adversarial training and are defined in a loss function as Eq. (5). The En is trained to minimize and the De is trained to maximize function , and means cross entropy between a subscript and square brackets. Discriminator updates its own parameters to output when input vector is sampled from and output when is sampled from . Therefore, when the discriminator maximizes Eq. (5), it can determine whether the input is sampled from or . The loss function of the discriminator is transformed from Eq. (5) to Eq. (6) as follows: In contrast, Eq. (5) is minimized when the En can approximate to sufficiently, and the loss function of the En can be transformed from Eq. (5) to Eq. (7) as follows: To summarize, the AAE is trained to repeat the following procedure: 4.ExperimentsWe performed experiments with the proposed method using the AAE and x-ray CT images of solder joints on PCBs. The anomaly detection procedure of the proposed method is as follows:
The architecture used in the experiments is shown in Fig. 5. Each of the sliced images consisted of eight-layer images, as shown in Fig. 1. We resized the sliced images to and input sliced images to the AAE network as eight channels. We compared our method with a method using handcrafted features and OCSVM. This is to show that our method using features extracted automatically by AAE is superior to the classification by machine learning using features designed by human experts. The handcrafted features designed by human experts were four-dimensional features: the substrate area, head-in-pillow area, circularity, and luminance ratio. The experimental results are shown in Table 1. In this table, the result for handcrafted features + OCSVM was with the condition of using all normal samples for training the OCSVM, and we show another result of training the OCSVM with fewer normal samples in Table 2. This result shows that the accuracy improved as the number of training samples increased; however, it was inferior to the proposed method even if all normal samples were used for training the OCSVM. The AAE architecture used in the experiments is shown in Fig. 5. The inputs to the network were . We used the AAE parameters of batch size = 64 and epoch = 100, and the OCSVM parameter and radial basis function kernel. Our code is available at https://github.com/rearwist3/aae_solder_tf. We chose 100 epochs empirically by observing the performance of the model every 20 epochs over 200 epochs. Figure 6(a) contains all of the results, and Fig. 6(b) omits the results at 20 epochs to show the details of the false positive rate (FPR) from 40 to 200 epochs. Because low FPR was obtained at 100 epochs and 120 epochs with 10 anomalous training samples, we chose 100 epochs. In the network, the computation time of the learning phase through 100 epochs was on an RTX 2080 Ti GPU. Table 1Comparison of our AAE method (ours) with handcrafted features + OCSVM.
Table 2Comparison of results of handcrafted features + OCSVM with different number of samples used for training OCSVM. The rows denote the results for using each number of training samples. (small, medium, and large).
We set the threshold as 100% true positive rate (TPR) in both models to avoid classifying anomalous samples as normal. FPR of handcrafted features + OCSVM was 1.10% after training with 3,510,000 normal samples. In contrast, AAE + Hotelling’s T-squared method could be trained with only 40,000 normal and 10 anomalous samples, and it could classify normal and anomalous samples with fewer false positives. To verify the results, we selected 10 anomalous training samples at random three times and trained the network with each dataset for 100 epochs. Mean and standard deviation of the resulting FPR are . We show the results when training the network with 0, 20, 50, and 100 anomalous samples to prove that including anomalous samples in the training dataset improves anomaly detection performance and to find the optimal balance between normal and anomalous samples for training the network. The resulting FPR is 5.15%, 0.93%, 0.10%, and 1.25%, respectively. The results confirm that including anomalous samples in the training dataset is effective in the anomaly detection method and the case with 10 anomalous training samples had the best performance and the fewest anomalous training samples. Moreover, we compared our method with classification by a binary classifier, which is a typical anomaly detection method using deep learning. By this experiment, we show the effectiveness of the proposed method under the condition in which a sufficient number of anomalous samples for the training classifier cannot be guaranteed. The result when the binary classifier is trained with a large number of normal and a small number of anomalous samples is shown in Table 3. The classifier could not classify normal and anomalous samples when the number of normal and anomalous training samples was imbalanced. Moreover, we show the result for the binary classifier under the condition in which the number of normal and anomalous samples is not imbalanced in Table 3. In this experiment, we reduced the number of normal samples to match the number of anomalous samples to equalize each sample class and then trained the classifier. Neither result was as good as that of the proposed method, and we thus conclude that the proposed method is effective when the number of anomalous samples is small. Table 3The results when the samples were classified by the binary classifier. The first row denotes the results of training the classifier with the imbalanced dataset (without undersampling). The second row denotes the results of undersampling the dataset (with undersampling).
5.ConclusionIn this paper, we propose a method for inspecting solder joints on PCBs by an anomaly detection method using an AAE. We captured sliced images of solder joints using x-ray CT, and the sliced image features following the standard Gaussian distribution were extracted by the AAE. Defects were detected by applying Hotelling’s T-squared method to these features. Experimental results showed that the AAE could classify normal and anomalous samples with few false positives even when the number of data samples was small. However, when compressing high-dimensional data to low-dimensional space, the number of low latent dimensions required for the full expression of high-dimensional data depends on the inputs, and we need to select the optimal number of latent dimensions for every dataset. Statistical implementation of methods to optimize the number of latent dimensions will be studied in future work. ReferencesA. Teramoto et al.,
“Development of high speed oblique x-ray CT system for printed circuit board,”
SICE Trans. Ind. Appl., 6
(9), 72
–77
(2007). Google Scholar
Z. C. Feng et al.,
“Characterization of solder defects on package on packages with AXI systems for inspection quality improvement,”
(2015). Google Scholar
B. Schölkopf et al.,
“Estimating the support of a high-dimensional distribution,”
Neural Comput., 13
(7), 1443
–1471
(2001). https://doi.org/10.1162/089976601750264965 NEUCEB 0899-7667 Google Scholar
D. Soukup and R. Huber-Mörk,
“Convolutional neural networks for steel surface defect detection from photometric stereo images,”
in Int. Symp. Vis. Comput.,
668
–677
(2014). Google Scholar
D. Weimer, B. Scholz-Reiter and M. Shpitalni,
“Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection,”
CIRP Ann., 65
(1), 417
–420
(2016). https://doi.org/10.1016/j.cirp.2016.04.072 CIRAAT 0007-8506 Google Scholar
M. Sakurada and T. Yairi,
“Anomaly detection using autoencoders with nonlinear dimensionality reduction,”
in Proc. MLSDA 2014 2nd Workshop Mach. Learn. for Sens. Data Anal.,
(2014). Google Scholar
A. Makhzani et al.,
“Adversarial autoencoders,”
in Proc. Int. Conf. Learn. Represent. Workshop,
(2016). Google Scholar
H. Hotelling,
“The generalization of student’s ratio,”
Breakthroughs in Statistics, 54
–65 Springer, New York
(1992). Google Scholar
G. Leinbach and S. Oresjo,
“The Why, Where, What, How, and When of Automated X-Ray Inspection,”
Agilent Technologies, Loveland, Colorado
(2001). Google Scholar
B.-J. Lin et al.,
“Use 3D convolutional neural network to inspect solder ball defects,”
in Int. Conf. Neural Inf. Process.,
263
–274
(2018). Google Scholar
G. E. Hinton and R. R. Salakhutdinov,
“Reducing the dimensionality of data with neural networks,”
Science, 313
(5786), 504
–507
(2006). https://doi.org/10.1126/science.1127647 SCIEAS 0036-8075 Google Scholar
L. Beggel, M. Pfeiffer and B. Bischl,
“Robust anomaly detection in images using adversarial autoencoders,”
(2019). Google Scholar
H. Hotelling,
“Analysis of a complex of statistical variables into principal components,”
J. Educ. Psychol., 24
(6), 417
(1933). https://doi.org/10.1037/h0071325 JLEPA5 1939-2176 Google Scholar
BiographyKeisuke Goto is a graduate student at Gifu University of Natural Science and Technology. He received his BS degree from Gifu University of Faculty of Engineering in 2018. His research interests are visual inspection and deep learning. Kunihito Kato is an associate professor at Gifu University of Faculty of Engineering. He received his BS and MS degrees and his PhD from Chukyo University of Faculty of Information Science in 1993, 1995, and 1996, respectively. He has been a faculty member at University of Maryland Institute for Advanced Computer Studies. His research interests include image processing, pattern analysis, and computer vision. |