|
1.IntroductionImage classification is crucial in the interpretation of remote sensing images with high spatial resolution (HSR).1 The availability of HSR remote sensing imagery obtained from satellites (e.g., WorldView-2, IKONOS, QuickBird, ZY-3C, GF-1, and GF-2) increases the possibility of accurate Earth observations. Such HSR imagery provides highly valuable geometric and detailed information, which is important for various applications, such as precision agriculture, security applications, and damage assessment for environmental disasters and land use.2 In these applications, mapping a high-resolution image for land use and land cover (LULC) is particularly relevant. In terms of LULC classification using remote sensing images, Landsat series satellite imagery with medium resolution is important in regional LULC and land use/cover change studies.3–7 In processing high-resolution remote sensing images, numerous classification algorithms, such as the object-oriented approach,8–10 based on the classification of a support vector machine (SVM)11–13 and Markov random fields (MRF)14–18 are being developed. Local features19–23 have been successfully applied to image retrieval, semantic segmentation, and scene understanding. These features gained popularity in the remote sensing community because of their robustness in rotation, scale changes, and occlusion. Sparse coding is one of the most effective approaches to group local features and performs well in object categorization, scene-level land use classification, etc.24–36 The sparse coding method combined with max-pooling and spatial pyramid matching (SPM) can be used to learn midlevel features. In this approach, a class type is represented by the distribution of a set of visual words, which are usually obtained by unsupervised -means clustering of a set of low-level feature descriptors. However, visual words are learned in an unsupervised manner, resulting in less discriminative midlevel features. This characteristic reduces the accuracy of classification. Several conventional low-level features, such as spectral features, are neglected in the building of midlevel features. Some studies have resolved this drawback and effectively incorporated spectral and local features.33,34 Hu et al.37 developed a method that combines convolutional neural networks (CNN) and sparse coding to learn discriminative features for scene-level land use classification, and impressive results were obtained when the total accuracy reached about 96%. However, this method is limited by the lack of information of the LULC class type, because the parameters of a CNN model are estimated by the ImageNet dataset.38 In addition to feature learning, the selection of a classifier is particularly important for LULC classification based on high-resolution remote sensing images. Many classification methods, such as maximum likelihood, MRF, and SVM models, have been developed. The SVM classifier is widely used for various computer vision tasks and LULC classification, because this model has shown advantages on high-dimension feature space. MRF39 and conditional random field (CRF)40 are structured output models that consider interactions of random variables. These approaches have been successfully developed in remote sensing14–17,41 and computer vision communities.42–49 Moser et al.14 proposed an LULC classification for high-resolution remote sensing images based on the MRF model. However, the results of this model always exhibit an oversmoothed appearance.9,48 Another drawback of the MRF is its difficulty in processing high-dimension feature space. The CRF model overcomes these drawbacks and shows advantages on image classification and semantic segmentation. Thus, we establish an LULC classification framework for HSR remote sensing images by exploiting labeled data based on midlevel feature learning and the SVM classifier to achieve multifeature soft-probability feature descriptors, and we employ a CRF classification method to jointly model the unary and pairwise costs. In this paper, a multifeature soft-probability cascading and CRF (MFSC-CRF) classification model is designed to learn discriminative midlevel features in a supervised manner. First, we extracted the spectral, gray-level co-occurrence matrix (GLCM), and dense scale-invariant feature transform (DSIFT) features as low-level feature descriptors. Three types of midlevel feature descriptors are achieved by adopting sparse coding, superpixel segmentation, and max-pooling methods. Then, the probability that some labeled samples belong to LULC classes can be calculated. The three probability values are cascaded to construct the feature descriptors for each superpixel. Finally, the CRF model is introduced to generate the LULC classification. The supervised learned feature descriptors can be obtained using the SVM classifier with training samples. This classifier has been demonstrated to effectively incorporate low-level features. Using the CRF classifier, the local spatial relationship between the neighboring superpixels is considered by combining the learned feature descriptor. Thus, the proposed method achieves better classification results than traditional methods. The rest of this paper is structured as follows. In Sec. 2, the proposed method for midlevel feature learning and soft-probability cascading and CRF classification is presented. In Sec. 3, the experiments on the rural residential area dataset of Wuhan are discussed. Conclusions are drawn in Sec. 4. 2.MFSC-CRF Classification FrameworkAn HSR remote image classification framework for LULC classification is proposed. This method is based on midlevel feature learning by integrating sparse coding and the CRF method to utilize spectral, structural, and spatial contextual information. Three kinds of typical features, namely, GLCM, DSIFT, and spectral features, are selected to construct the low-level features. The whole pipeline of the MFSC-CRF classification framework consists of two main steps, namely, feature learning and CRF classification (Fig. 1). Midlevel feature descriptors are achieved during the feature learning step using the three features by combining sparse coding, SPM, and max-pooling method. The probability can be calculated by the SVM classifier using the training samples. The resulting probability values form the new discriminative feature descriptors. During CRF classification, the CRF model is introduced to classify the superpixels according to the land cover class types. The probability feature descriptor from the first step is considered in this step, and an SVM classifier is adopted to construct the unary potentials. The pairwise potentials can be acquired by calculating the distance between neighboring superpixels. The graph-cut-based -expansion algorithm is executed to obtain the classification result of the CRF models. 2.1.Midlevel Feature DescriptorsAs discussed above, three typical features are adopted for the low-level feature descriptors, and the details are described as follows.
The low-level feature descriptors are extracted from images, and each feature descriptor has size . The visual dictionary of visual words obtained by unsupervised -means clustering algorithm can be defined as follows: where each is represented as a linear classifier with bias and calculated as follows: An encoding scheme based on the classification score obtained by each dictionary word is used, instead of sparse coding to encode each descriptor. This step is suggested in Ref. 23. If is a descriptor vector, its coding vector corresponding to dictionary is given as follows: Intuitively, the descriptor should be similar only to a few words in the dictionary if the visual words of dictionary are sufficiently discriminative. Therefore, the vector is expected to have only a few values that are greater than zero.Given a dictionary and a set of segmented superpixel regions over an image, we represent the image by spatial max-pooling. For each superpixel region, of image , where represents the number of superpixels extracted from the image, let be a descriptor vector extracted from region , where indexes the image pixels extracted from region . Thus, given a dictionary , region can be encoded using max spatial pooling, as follows: where represents the midlevel feature descriptor of superpixel . represents the midlevel feature descriptor of image . If the midlevel features of the pixels in the segmentation region are more similar to some of the visual words, these features can be used to represent the characteristics of the region, and the similarity is measured for the whole region.2.2.Probability Feature DescriptorsLet be the midlevel feature vector of an image. This feature represents a vector in a -dimensional space with a dictionary . If three different types of features (DSIFT, spectral band, and GLCM) are used in the sparse coding phase, then an image can be represented by three different corresponding vectors. That is, each image can be represented by the following vectors: where , , and are the dictionaries extracted from the DSIFT and spectral features, represents the superpixels, and , , and are the dictionary sizes. These two kinds of midlevel features combined with training samples are used to estimate the SVM classifier parameters and calculate the probability of vectors belonging to each LULC class, respectively.The probability vectors of the different midlevel feature descriptors can be represented as follows: where KL represents the number of land cover classes. The MFSC feature descriptors for the final classification are given as follows: where represents the size of the feature descriptors, and this value is thrice the number of LULC classes. The size of MFSC feature descriptors is much smaller than the size of midlevel feature descriptors as in Eq. (5).2.3.CRF Classification ModelThe CRF model for the final classification of high-resolution remote sensing images is proposed. The CRF is defined over a set of superpixels extracted from the image . Each superpixel is associated with a class label . The labeling of the image is denoted by the vector . The interaction among various superpixels of the CRF is captured by the set of edges , where each edge corresponds to a pair of superpixels that share a boundary. The CRF energy, which consists of unary and pairwise costs, can be formulated as follows: where and are the relative weights of the unary and pairwise potentials, respectively.The unary potential, which is expressed as in Eq. (8), models the cost of assigning a class label to superpixel in image . This potential is defined as the score of a kernel SVM classifier for class applied to an MFSC feature vector of superpixel described in Eq. (7). The classifier for class is trained using the MFSC feature vector extracted from the superpixels in the training set. This vector is labeled as . The radial basis function (RBF)- kernel is adopted for SVM classification. The pairwise potential, , models the cost of assigning labels and to the neighboring superpixels and , respectively. When a CRF formulation is used for classification, the pairwise potentials are usually used to ensure the smoothness of the label assignments. A contrast sensitive cost is used as follows: where is the length of the shared boundary between superpixels and , and and are the gray mean values of superpixels and , respectively. The parameters in Eq. (8), and , are estimated by the cutting plane method, the details of which are described in Ref. 49. The classification result of the CRF models could be achieved by solving Eq. (8).3.Experimental ResultsWe conduct experiments using the high-resolution aerial images to evaluate the effectiveness of the proposed MFSC-CRF framework for LULC classification. Based on the study of Jain et al.’s49 work, comparative experiments are conducted by combining feature descriptors and classification methods. We compared the different methods using single-object class accuracy and total accuracy. The low-level feature, midlevel feature, and classifier associated with SF-SVM, U-SVM, GLCM-SVM, MFSC-SVM, SF-CRF, U-CRF, GLCM-CRF, and MFSC-CRF are reported in Table 1. The details are described as follows.
Table 1Information of different classification methods.
The experimental results are evaluated using three kinds of accuracies, namely, the accuracy of each class, overall accuracy (OA), and kappa coefficient (Kappa). OA is the fraction of correctly classified pixels, based on all pixels of that ground-truth class. For a fair comparison, the classification results with the highest OA are selected for all classification algorithms. The effect of the number of training samples is further investigated in relation to the MFSC-CRF model. 3.1.Experimental Data Description3.1.1.Experimental datasets (testing site 1)The first test image is captured over the rural residential area in Wuhan city, Hubei Province, China, through unmanned aerial vehicle aerial photography, including red, green, and blue three spectral bands. The image is of , with spatial resolution of 0.2 m and three multispectral channels. An overview of this dataset is shown in Fig. 2(a). The corresponding ground truth is shown in Fig. 2(b). The testing image was segmented to 52,654 superpixels using the simple linear iterative clustering method. Six classes of interest, namely, low vegetation, homestead, farmland, waterbody, road, and woodland, are considered and listed in Table 2. Rural homestead is the main type of rural residential land and is more scattered. This class contains various houses, walls, and other facilities with spatial correlation and semantic structure characteristics. The other five class types are mainly land cover types. A total of 100 training samples for each LULC class type is used from the reference ground-truth data, and the remaining samples are used to evaluate the accuracy. The results are shown in Table 2. Table 2Class information of Wuhan rural residential area dataset of testing site 1.
3.1.2.Experimental datasets (testing site 2)This testing image is also captured over the rural residential area in Wuhan city, Hubei Province, China. The image is of , with spatial resolution of 0.2 m and three multispectral channels. Compared with testing site 1, testing site 2 is larger and has a more complex scene. More trees are around the homesteads in this rural residential area, and the shadow effect is more obvious. This image is a challenging task for LULC classification. The ground-truth image corresponding to the high resolution image (HRI) has been classified manually into the six most common LULC classes. The classification data (label images) are shown in Fig. 3(b). The testing image was segmented to 92,441 superpixels. Similar to testing site 1, six classes of interest are considered and described in Table 3, which also shows the number of the training and testing samples for each class. The training samples are randomly chosen from the reference ground-truth data and are shown in Table 3. The dictionary size is set to 500, and 20,000 pixels are randomly selected for the training dictionary via the -means clustering method. A total of 500 training samples per LULC class is randomly selected for classifier parameters (Table 3). Table 3Class information of Wuhan rural residential area dataset of testing site 2.
3.2.Experimental Results and Analysis for Testing Site 1The experimental results for testing site 1 are reported to validate the effectiveness of the proposed MFSC-CRF for LULC classification. The classification accuracies of the various midlevel feature learning methods, namely, SF-SVM, GLCM-SVM, U-SVM, MFSC-SVM, GLCM-CRF, and U-CRF, which are different combinations of low-level feature descriptors and classifier, are compared. The SVM classifier with RBF kernel has been proven to be successful in supervised classification of high-dimensional HRI data. Among the SVM-based methods, MFSC-SVM achieves better classification results than the other three methods [Figs. 4(c)–4(f)]. However, the SVM algorithm, in which any neighborhood spatial contextual information is not considered, results in high isolated salt-and-pepper classification noise, because neighborhood interactions are not considered in the algorithms. For the MFSC-CRF algorithm, which is proposed to combine different effective features, the oversmoothing is less serious in Fig. 4(e), as is shown in the red boxes of Figs. 4(e) and 4(h). Moreover, the boundaries of homestead are better preserved. By contrast, SF-SVM is more focused on the spectral information. Thus, the classification remarkably depends less on the structural information, which probably explains the misclassification of U-CRF. The quantitative performances with the highest classification accuracies obtained by SF-SVM, U-SVM, GLCM-SVM, MFSC-SVM, SF-CRF, U-CRF, GLCM-CRF, and MFSC-CRF are reported in Table 4. The best result of each column are in bold. The results show that the algorithms in which spatial contextual information are considered significantly outperformed the SVM classification in classification accuracy. Moreover, the accuracy of MFSC-CRF is higher than the three other CRF-based classification methods (i.e., SF-CRF, U-CRF, and GLCM-CRF), indicating that the MFSC-CRF can adaptively incorporate different low-level feature descriptors. With GLCM as the low-level feature descriptor, the GLCM-CRF method achieves much higher accuracy than the SF-SVM, SF-CRF, U-SVM, and U-CRF. This result shows that GLCM can be very effective for LULC classification. In the dataset of the testing site 1 of Wuhan rural residential area (Table 4), the reported quantitative performance of MFSC-CRF exhibits the improvement in OA. Additionally, the 21% higher accuracy (from 64.9% to 86.3%) of MFSC-CRF compared with U-SVM shows that MFSC-CRF focuses more on spatial contextual information. Thus, spatial contextual information and other effective feature descriptors should be considered. Finally, the MFSC-CRF obtains the highest accuracy. Table 4Classification accuracy for Wuhan rural residential area using dataset of testing site 1 with different classifiers.
Figure 5 shows the confusion matrices of different classification methods with various feature descriptors and classifiers. The methods, which used only spectral features as low-level feature descriptors (SF-SVM and SF-CRF), misclassified homestead to road with 14%. The reason is that the two LULC types have similar spectral characteristics, and all belong to the impermeable surface. The GLCM- (GLCM-SVM and GLCM-CRF) and MFSC-based methods (MFSC-SVM and MFSC-CRF) are less serious than the SF-based methods. The MFSC-CRF method incorporates different low-level feature descriptors and results in 89% accuracy for homestead. 3.3.Experimental Results and Analysis for Testing Site 2The resulting maps for the visual classification for this testing image are shown in Figs. 6(a)–6(h). The quantitative classification results of the different classification methods are shown in Table 5 (The best result of each column is in bold) and Figs. 7(a)–7(h). The proposed MFSC-CRF method achieves the highest OA and Kappa than SF-SVM, U-SVM, GLCM-SVM, MFSC-SVM, SF-CRF, U-CRF, and GLCM-CRF. Compared with SF-SVM and U-SVM, the MFSC-SVM method achieves remarkably enhanced OA and homestead accuracy. Compared with GLCM-SVM, the classification accuracy of the MFSC-SVM method shows improvement for each LULC class. Considering neighborhood spatial contextual information, the quantitative performance of MFSC-CRF shows 0.1% accuracy improvement (from 87.4% to 87.5%) compared with MFSC-SVM method. Table 5Classification accuracy for Wuhan rural residential area dataset of testing site 2 with different classifiers.
3.4.Parameter Sensitivity AnalysisThe performance of the proposed MFSC-CRF method is further evaluated using different numbers of training samples. Testing image 1 is selected for parameter sensitivity analysis, and the effects of training sample numbers on the MFSC-CRF algorithms are examined. Different sizes ranging from 100 to 1000 are tested with an interval of 100 for each LULC class. As shown in Fig. 8, the classification accuracy of MFSC-CRF initially increases for the datasets with gradual increase in the number of training samples per class (from 85.6% to 93.2%). The classification accuracy of MFSC-CRF is slightly higher than GLCM-CRF (from 84.0% to 92.0%) and MFSC-SVM (from 85.0% to 92.8%) classification approaches with Wuhan rural residential area dataset of testing site 1. The accuracy then remains roughly constant when the training sample number is set to 900 but slightly decreases. Moreover, the classification accuracy of the proposed method remains higher than the other seven methods at each training number. The training samples are randomly selected from the overall ground truth, and the remaining samples are used to evaluate the classification accuracies. The experiments show that the classification accuracies of the methods incorporating spatial contextual information (i.e., SF-CRF, U-CRF, GLCM-CRF, and the proposed MFSC-CRF) are all better than SVM-based classification methods. Moreover, the MFSC-CRF method is more robust than the other classification methods with different training samples. 4.ConclusionA classification method for HSR remote sensing images based on MFSC and CRF models is proposed. The proposed MFSC-CRF method can effectively incorporate spectral, structural, and textural features, as well as spatial contextual information. Midlevel feature learning based on sparse coding is very important in image classification, and the proposed feature combination method can significantly improve the classification accuracy by effectively combining three complementary features, namely, DSIFT, spectral bands, and GLCM. Experiments on the Wuhan residential area datasets also show that the GLCM features can achieve more promising results than the original spectral features. This method is an open model, very convenient to cascade different features to improve the accuracy of image classification. Recently, the convolution neural network is widely used in image classification and achieved good results. However, the convolution neural network model requires a large number of training samples to train the parameters. Therefore, our next step is to use a small amount of training samples to fine-tune the convolution neural network model so that it can be effectively applied to remote sensing image classification applications. ReferencesD. Li, L. Zhang and G. S. Xia,
“Automatic analysis and mining of remote sensing big data,”
Acta Geod. Cartogr. Sin., 43 1211
–1216
(2014). Google Scholar
G. S. Xia et al.,
“Structural high-resolution satellite image indexing,”
in ISPRS TC VII Symp. 100 Years ISPRS,
298
–303
(2010). Google Scholar
P. Gong et al.,
“Finer resolution observation and monitoring of global land cover: first mapping results with Landsat TM and ETM+ data,”
Int. J. Remote Sens., 34
(7), 2607
–2654
(2013). http://dx.doi.org/10.1080/01431161.2012.748992 IJSEDK 0143-1161 Google Scholar
Z. Zhu et al.,
“Assessment of spectral, polarimetric, temporal, and spatial dimensions for urban and peri-urban land cover classification using Landsat and SAR data,”
Remote Sens. Environ., 117 72
–82
(2012). http://dx.doi.org/10.1016/j.rse.2011.07.020 RSEEA7 0034-4257 Google Scholar
A. Paul, M. A. Peter and J. C. Paul,
“Fine spatial resolution simulated satellite sensor imagery for land cover mapping in the United Kingdom,”
Remote Sens. Environ., 68 206
–216
(1999). http://dx.doi.org/10.1016/S0034-4257(98)00112-6 RSEEA7 0034-4257 Google Scholar
O. Debeir et al.,
“Textural and contextual land-cover classification using single and multiple classifier system,”
Photogramm. Eng. Remote Sens., 68 597
–605
(2002). Google Scholar
K. Jia et al.,
“Land cover classification of finer resolution remote sensing data integrating temporal features from time series coarser resolution data,”
ISPRS J. Photogramm. Remote Sens., 93 49
–55
(2014). http://dx.doi.org/10.1016/j.isprsjprs.2014.04.004 IRSEE9 0924-2716 Google Scholar
T. Blaschke et al.,
“Geographic object-based image analysis towards a new paradigm,”
ISPRS J. Photogramm. Remote Sens., 87 180
–191
(2014). http://dx.doi.org/10.1016/j.isprsjprs.2013.09.014 IRSEE9 0924-2716 Google Scholar
Y. Zhong, J. Zhao and L. Zhang,
“A hybrid object-oriented conditional random field classification framework for high spatial resolution remote sensing imagery,”
IEEE Trans. Geosci. Remote Sens., 52
(11), 7023
–7037
(2014). http://dx.doi.org/10.1109/TGRS.2014.2306692 IGRSD2 0196-2892 Google Scholar
K. Biro et al.,
“Exploitation of TerraSAR-X data for land use/land cover analysis using object-oriented classification approach in the African Sahel Area, Sudan,”
J. Indian Soc. Remote Sens., 41
(3), 539
–553
(2013). http://dx.doi.org/10.1007/s12524-012-0230-7 Google Scholar
M. Ustuner, F. Balik Sanli and B. Dixon,
“Application of support vector machines for land use classification using high-resolution rapid eye images: a sensitivity analysis,”
Eur. J. Remote Sens., 48 403
–422
(2015). http://dx.doi.org/10.5721/EuJRS20154823 Google Scholar
S. D. Jawak et al.,
“Advancement in land cover classification using very high resolution remotely sensed 8-band WorldView-2 satellite data,”
Int. J. Earth Sci. Eng., 6
(2), 1742
–1749
(2013). Google Scholar
X. Huang and L. Zhang,
“An SVM ensemble approach combining spectral, structural, and semantic features for the classification of high-resolution remotely sensed imagery,”
IEEE Trans. Geosci. Remote Sens., 51
(1), 257
–272
(2013). http://dx.doi.org/10.1109/TGRS.2012.2202912 IGRSD2 0196-2892 Google Scholar
G. Moser, S. B. Serpico and J. A. Benediktsson,
“Land-cover mapping by Markov modeling of spatial–contextual information in very-high-resolution remote sensing images,”
Proc. IEEE, 101
(3), 631
–651
(2013). http://dx.doi.org/10.1109/JPROC.2012.2211551 IEEPAD 0018-9219 Google Scholar
L. Wang and Q. Wang,
“Subpixel mapping using Markov random field with multiple spectral constraints from subpixel shifted remote sensing images,”
IEEE Geosci. Remote Sens. Lett., 10
(3), 598
–602
(2013). http://dx.doi.org/10.1109/LGRS.2012.2215573 IGRSBY 1545-598X Google Scholar
A. Voisin et al.,
“Classification of very high resolution SAR images of urban areas using copulas and texture in a hierarchical Markov random field model,”
IEEE Geosci. Remote Sens. Lett., 10
(1), 96
–100
(2013). http://dx.doi.org/10.1109/LGRS.2012.2193869 IGRSBY 1545-598X Google Scholar
W. Yang et al.,
“SAR-based terrain classification using weakly supervised hierarchical Markov aspect models,”
IEEE Trans. Image Process., 21
(9), 4232
–4243
(2012). http://dx.doi.org/10.1109/TIP.2012.2199127 IIPRE4 1057-7149 Google Scholar
X. L. Li et al.,
“A survey on scene image classification,”
Sci. Sin. Inf., 45 827
–848
(2015). http://dx.doi.org/10.1360/N112014-00286 Google Scholar
H. Bay et al.,
“Speeded-up robust features (SURF),”
Comput. Vis. Image Understanding, 110
(3), 346
–359
(2008). http://dx.doi.org/10.1016/j.cviu.2007.09.014 CVIUF4 1077-3142 Google Scholar
G. S. Xia, J. Delon and Y. Gousseau,
“Shape-based invariant texture indexing,”
Int. J. Comput. Vision, 88
(3), 382
–403
(2010). http://dx.doi.org/10.1007/s11263-009-0312-3 IJCVEQ 0920-5691 Google Scholar
G. S. Xia, J. Delon and Y. Gousseau,
“Accurate junction detection and characterization in natural images,”
Int. J. Comput. Vision, 106
(1), 31
–56
(2014). http://dx.doi.org/10.1007/s11263-013-0640-1 IJCVEQ 0920-5691 Google Scholar
D. G. Lowe,
“Distinctive image features from scale-invariant key points,”
Int. J. Comput. Vision, 60 91
–110
(2004). http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94 IJCVEQ 0920-5691 Google Scholar
N. Dalal and B. Triggs,
“Histograms of oriented gradients for human detection,”
in Proc. IEEE Conf. on Computer Vision and Pattern Recognition,
886
–893
(2005). http://dx.doi.org/10.1109/CVPR.2005.177 Google Scholar
H. Goncalves, L. Corte-Real and J. Goncalves,
“Automatic image registration through image segmentation and SIFT,”
IEEE Trans. Geosci. Remote Sens., 49
(7), 2589
–2600
(2011). http://dx.doi.org/10.1109/TGRS.2011.2109389 IGRSD2 0196-2892 Google Scholar
A. M. Cheriyadat,
“Unsupervised feature learning for aerial scene classification,”
IEEE Trans. Geosci. Remote Sens., 52
(1), 439
–451
(2014). http://dx.doi.org/10.1109/TGRS.2013.2241444 IGRSD2 0196-2892 Google Scholar
F. F. Li and P. Perona,
“Bayesian hierarchy model for learning natural scene categories,”
in Proc. IEEE Conf. on Computer Vision and Pattern Recognition,
524
–531
(2005). http://dx.doi.org/10.1109/CVPR.2005.16 Google Scholar
Y. Boureau et al.,
“Ask the locals: multi-way local pooling for image recognition,”
in IEEE Int. Conf. on Computer Vision (ICCV),
2651
–2658
(2011). http://dx.doi.org/10.1109/ICCV.2011.6126555 Google Scholar
Y. Cao et al.,
“Spatial-bag-of-features,”
in Proc. IEEE Conf. on Computer Vision and Pattern Recognition,
3352
–3359
(2011). http://dx.doi.org/10.1109/CVPR.2010.5540021 Google Scholar
Y. Huang et al.,
“Salient coding for image classification,”
in Proc. IEEE Conf. on Computer Vision and Pattern Recognition,
1753
–1760
(2011). http://dx.doi.org/10.1109/CVPR.2011.5995682 Google Scholar
Z. L. Jiang, Z. Lin and L. S. Davis,
“Label consistent K-SVD: learning a discriminative dictionary for recognition,”
IEEE Trans. Pattern Anal. Mach. Intell., 35
(11), 2651
–2664
(2013). http://dx.doi.org/10.1109/TPAMI.2013.88 ITPIDJ 0162-8828 Google Scholar
W. Yang, X. Yin and G. S. Xia,
“Learning high-level features for satellite image classification with limited labelled samples,”
IEEE Trans. Geosci. Remote Sens., 53
(8), 4472
–4482
(2015). http://dx.doi.org/10.1109/TGRS.2015.2400449 IGRSD2 0196-2892 Google Scholar
H. Lobel, R. Vidal and A. Soto,
“Learning shared, discriminative, and compact representations for visual recognition,”
IEEE Trans. Pattern Anal. Mach. Intell., 37
(11), 2218
–2231
(2015). http://dx.doi.org/10.1109/TPAMI.2015.2408349 ITPIDJ 0162-8828 Google Scholar
G. F. Sheng et al.,
“High-resolution satellite scene classification using a sparse coding based multiple feature combination,”
Int. J. Remote Sens., 33
(8), 2395
–2412
(2012). http://dx.doi.org/10.1080/01431161.2011.608740 IJSEDK 0143-1161 Google Scholar
K. Qi et al.,
“Land-use scene classification in high-resolution remote sensing images using improved correlatons,”
IEEE Geosci. Remote Sens. Lett., 12
(12), 2403
–2407
(2015). http://dx.doi.org/10.1109/LGRS.2015.2478966 IGRSBY 1545-598X Google Scholar
S. S. Chen and Y. L. Tian,
“Pyramid of spatial relations for scene-level land use classification,”
IEEE Trans. Geosci. Remote Sens., 53
(4), 1947
–1957
(2015). http://dx.doi.org/10.1109/TGRS.2014.2351395 IGRSD2 0196-2892 Google Scholar
F. Hu et al.,
“Unsupervised feature learning via spectral clustering of multidimensional patches for remotely sensed scene classification,”
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 8
(5), 2015
–2030
(2015). http://dx.doi.org/10.1109/JSTARS.2015.2444405 1939-1404 Google Scholar
F. Hu et al.,
“Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery,”
Remote Sens., 7
(11), 14680
–14707
(2015). http://dx.doi.org/10.3390/rs71114680 Google Scholar
K. Nogueira, O. A. B. Penatti and J. A. D. Santos,
“Towards better exploiting convolutional neural networks for remote sensing scene classification,”
Pattern Recogn., 61 539
–556
(2017). http://dx.doi.org/10.1016/j.patcog.2016.07.001 PTNRA8 0031-3203 Google Scholar
J. Besag,
“Spatial interaction and the statistical analysis of lattice systems,”
J. R. Stat. Soc. Ser. B, 36
(2), 192
–236
(1974). http://dx.doi.org/10.2307/2984812 JSTBAJ 0035-9246 Google Scholar
J. Lafferty, A. McCallum and F. Pereira,
“Conditional random fields: probabilistic models for segmenting and labeling sequence data,”
in Int. Conf. on Machine Learning,
(2001). Google Scholar
G. S. Xia, C. He and H. Sun,
“Integration of synthetic aperture radar image segmentation method using Markov random field on region adjacency graph,”
IET Radar Sonar Navig., 1
(5), 348
–353
(2007). http://dx.doi.org/10.1049/iet-rsn:20060128 IRSNBX 1751-8784 Google Scholar
B. Y. Liu and X. M. He,
“Multiclass semantic video segmentation with object-level active inference,”
in IEEE Conf. on Computer Vision and Pattern Recognition,
(2015). http://dx.doi.org/10.1109/CVPR.2015.7299057 Google Scholar
X. M. He and G. Stephen,
“An exemplar-based CRF for multi-instance object segmentation,”
in IEEE Conf. on Computer Vision and Pattern Recognition,
(2014). http://dx.doi.org/10.1109/CVPR.2014.45 Google Scholar
Y. S. Ming, H. D. Li and X. M. He,
“Connected contours: a contour completion model that respects closure-effect,”
in IEEE Conf. on Computer Vision and Pattern Recognition,
(2012). http://dx.doi.org/10.1109/CVPR.2012.6247755 Google Scholar
X. M. He, R. Zemel and M. Carreira-Perpinan,
“Multiscale conditional random fields for image labelling,”
in IEEE Conf. on Computer Vision and Pattern Recognition,
(2004). http://dx.doi.org/10.1109/CVPR.2004.1315232 Google Scholar
V. Michele and F. Vittorio,
“Semantic segmentation of urban scenes by learning local class interactions,”
in IEEE Conf. on Computer Vision and Pattern Recognition Workshops,
(2015). http://dx.doi.org/10.1109/CVPRW.2015.7301377 Google Scholar
V. Michele and F. Vittorio,
“Structured prediction for urban scene semantic segmentation with geographic context,”
in Joint Urban Remote Sensing Event,
(2015). Google Scholar
P. Zhong and R. Wang,
“Learning conditional random fields for classification of hyperspectral images,”
IEEE Trans. Image Process., 19
(7), 1890
–1907
(2010). http://dx.doi.org/10.1109/TIP.2010.2045034 IIPRE4 1057-7149 Google Scholar
A. Jain et al.,
“Visual dictionary learning for joint object categorization and segmentation,”
Lect. Notes Comput. Sci., 7576
(5), 718
–731
(2012). http://dx.doi.org/10.1007/978-3-642-33715-4 LNCSD9 0302-9743 Google Scholar
A. Bosch, A. Zisserman and X. Munoz,
“Image classification using random forests and ferns,”
in IEEE Int. Conf. on Computer Vision,
1
–8
(2007). http://dx.doi.org/10.1109/ICCV.2007.4409066 Google Scholar
Y. Chen, N. M. Nasrabadi and T. D. Tran,
“Hyperspectral image classification using dictionary-based sparse representation,”
IEEE Trans. Geosci. Remote Sens., 49
(10), 3973
–3985
(2011). http://dx.doi.org/10.1109/TGRS.2011.2129595 IGRSD2 0196-2892 Google Scholar
BiographyBin Zhang received his BS, MS, and PhD degrees from the School of Electronic Information, Wuhan University, in 2007, 2009, and 2013, respectively. He is currently working at China University of Geosciences. His research interests include image classification, scene-level land use classification, and deep learning. |