Open Access
1 December 2017 Land use and land cover classification for rural residential areas in China using soft-probability cascading of multifeatures
Bin Zhang, Yueyan Liu, Zuyu Zhang, Yonglin Shen
Author Affiliations +
Abstract
A multifeature soft-probability cascading scheme to solve the problem of land use and land cover (LULC) classification using high-spatial-resolution images to map rural residential areas in China is proposed. The proposed method is used to build midlevel LULC features. Local features are frequently considered as low-level feature descriptors in a midlevel feature learning method. However, spectral and textural features, which are very effective low-level features, are neglected. The acquisition of the dictionary of sparse coding is unsupervised, and this phenomenon reduces the discriminative power of the midlevel feature. Thus, we propose to learn supervised features based on sparse coding, a support vector machine (SVM) classifier, and a conditional random field (CRF) model to utilize the different effective low-level features and improve the discriminability of midlevel feature descriptors. First, three kinds of typical low-level features, namely, dense scale-invariant feature transform, gray-level co-occurrence matrix, and spectral features, are extracted separately. Second, combined with sparse coding and the SVM classifier, the probabilities of the different LULC classes are inferred to build supervised feature descriptors. Finally, the CRF model, which consists of two parts: unary potential and pairwise potential, is employed to construct an LULC classification map. Experimental results show that the proposed classification scheme can achieve impressive performance when the total accuracy reached about 87%.

1.

Introduction

Image classification is crucial in the interpretation of remote sensing images with high spatial resolution (HSR).1 The availability of HSR remote sensing imagery obtained from satellites (e.g., WorldView-2, IKONOS, QuickBird, ZY-3C, GF-1, and GF-2) increases the possibility of accurate Earth observations. Such HSR imagery provides highly valuable geometric and detailed information, which is important for various applications, such as precision agriculture, security applications, and damage assessment for environmental disasters and land use.2 In these applications, mapping a high-resolution image for land use and land cover (LULC) is particularly relevant.

In terms of LULC classification using remote sensing images, Landsat series satellite imagery with medium resolution is important in regional LULC and land use/cover change studies.37 In processing high-resolution remote sensing images, numerous classification algorithms, such as the object-oriented approach,810 based on the classification of a support vector machine (SVM)1113 and Markov random fields (MRF)1418 are being developed.

Local features1923 have been successfully applied to image retrieval, semantic segmentation, and scene understanding. These features gained popularity in the remote sensing community because of their robustness in rotation, scale changes, and occlusion. Sparse coding is one of the most effective approaches to group local features and performs well in object categorization, scene-level land use classification, etc.2436 The sparse coding method combined with max-pooling and spatial pyramid matching (SPM) can be used to learn midlevel features. In this approach, a class type is represented by the distribution of a set of visual words, which are usually obtained by unsupervised K-means clustering of a set of low-level feature descriptors. However, visual words are learned in an unsupervised manner, resulting in less discriminative midlevel features. This characteristic reduces the accuracy of classification. Several conventional low-level features, such as spectral features, are neglected in the building of midlevel features. Some studies have resolved this drawback and effectively incorporated spectral and local features.33,34 Hu et al.37 developed a method that combines convolutional neural networks (CNN) and sparse coding to learn discriminative features for scene-level land use classification, and impressive results were obtained when the total accuracy reached about 96%. However, this method is limited by the lack of information of the LULC class type, because the parameters of a CNN model are estimated by the ImageNet dataset.38

In addition to feature learning, the selection of a classifier is particularly important for LULC classification based on high-resolution remote sensing images. Many classification methods, such as maximum likelihood, MRF, and SVM models, have been developed. The SVM classifier is widely used for various computer vision tasks and LULC classification, because this model has shown advantages on high-dimension feature space. MRF39 and conditional random field (CRF)40 are structured output models that consider interactions of random variables. These approaches have been successfully developed in remote sensing1417,41 and computer vision communities.4249 Moser et al.14 proposed an LULC classification for high-resolution remote sensing images based on the MRF model. However, the results of this model always exhibit an oversmoothed appearance.9,48 Another drawback of the MRF is its difficulty in processing high-dimension feature space. The CRF model overcomes these drawbacks and shows advantages on image classification and semantic segmentation.

Thus, we establish an LULC classification framework for HSR remote sensing images by exploiting labeled data based on midlevel feature learning and the SVM classifier to achieve multifeature soft-probability feature descriptors, and we employ a CRF classification method to jointly model the unary and pairwise costs.

In this paper, a multifeature soft-probability cascading and CRF (MFSC-CRF) classification model is designed to learn discriminative midlevel features in a supervised manner. First, we extracted the spectral, gray-level co-occurrence matrix (GLCM), and dense scale-invariant feature transform (DSIFT) features as low-level feature descriptors. Three types of midlevel feature descriptors are achieved by adopting sparse coding, superpixel segmentation, and max-pooling methods. Then, the probability that some labeled samples belong to LULC classes can be calculated. The three probability values are cascaded to construct the feature descriptors for each superpixel. Finally, the CRF model is introduced to generate the LULC classification.

The supervised learned feature descriptors can be obtained using the SVM classifier with training samples. This classifier has been demonstrated to effectively incorporate low-level features. Using the CRF classifier, the local spatial relationship between the neighboring superpixels is considered by combining the learned feature descriptor. Thus, the proposed method achieves better classification results than traditional methods.

The rest of this paper is structured as follows. In Sec. 2, the proposed method for midlevel feature learning and soft-probability cascading and CRF classification is presented. In Sec. 3, the experiments on the rural residential area dataset of Wuhan are discussed. Conclusions are drawn in Sec. 4.

2.

MFSC-CRF Classification Framework

An HSR remote image classification framework for LULC classification is proposed. This method is based on midlevel feature learning by integrating sparse coding and the CRF method to utilize spectral, structural, and spatial contextual information. Three kinds of typical features, namely, GLCM, DSIFT, and spectral features, are selected to construct the low-level features. The whole pipeline of the MFSC-CRF classification framework consists of two main steps, namely, feature learning and CRF classification (Fig. 1).

Fig. 1

Flowchart of the MFSC-CRF classification framework.

JARS_11_4_045010_f001.png

Midlevel feature descriptors are achieved during the feature learning step using the three features by combining sparse coding, SPM, and max-pooling method. The probability can be calculated by the SVM classifier using the training samples. The resulting probability values form the new discriminative feature descriptors.

During CRF classification, the CRF model is introduced to classify the superpixels according to the land cover class types. The probability feature descriptor from the first step is considered in this step, and an SVM classifier is adopted to construct the unary potentials. The pairwise potentials can be acquired by calculating the distance between neighboring superpixels. The graph-cut-based α-expansion algorithm is executed to obtain the classification result of the CRF models.

2.1.

Midlevel Feature Descriptors

As discussed above, three typical features are adopted for the low-level feature descriptors, and the details are described as follows.

  • 1. Spectral features: Features on the Earth reflect, absorb, transmit, and emit electromagnetic energy from the sun. A measurement of energy commonly used in remote sensing of the Earth is reflected energy (e.g., visible light, near-infrared, etc.) coming from land and water surfaces. The amount of energy reflected from these surfaces is usually expressed as a percentage of the amount of energy striking the objects. The band values of remote sensing images are used as the spectral features in this article.

  • 2. GLCM: GLCM is a texture measurement to many image analyses. In this article, GLCM is extracted by ENVI software. Eight features are achieved, which are called as mean, variance, homogeneity, contrast, dissimilarity, entropy, etc. They are normalized to form feature vectors.

  • 3. DSIFT: DSIFT descriptors are computed at points on a regular grid. At each grid point, the descriptors are computed over four circular support patches with different radii, and, consequently, each point is represented by four SIFT descriptors. Multiple descriptors are computed to allow for scale variation between images.50

The low-level feature descriptors are extracted from images, and each feature descriptor has size T. The visual dictionary D of K visual words obtained by unsupervised K-means clustering algorithm can be defined as follows:

Eq. (1)

D=[d1,d2,,dk]RT×K,
where each dk is represented as a linear classifier with bias and calculated as follows:

Eq. (2)

dk=[Dk,1,Dk,2,,Dk,T]TRT.
An encoding scheme based on the classification score obtained by each dictionary word is used, instead of sparse coding to encode each descriptor. This step is suggested in Ref. 23. If v is a descriptor vector, its coding vector fdk(αil) corresponding to dictionary D is given as follows:

Eq. (3)

fdk(αil)=[αk1.dk,,αkN.dk]RN.
Intuitively, the descriptor α should be similar only to a few words in the dictionary if the visual words of dictionary D are sufficiently discriminative. Therefore, the vector fdk(αil) is expected to have only a few values that are greater than zero.

Given a dictionary D and a set of segmented superpixel regions L over an image, we represent the image by spatial max-pooling. For each superpixel region, l[1,,NS] of image i, where NS represents the number of superpixels extracted from the image, let αjl be a descriptor vector extracted from region l, where j[1,,Nl] indexes the Nl image pixels extracted from region l. Thus, given a dictionary D, region l can be encoded using max spatial pooling, as follows:

Eq. (4)

xil,D=[maxjNlαjl,d1,,maxjNlαjl,dK]RK,xD(i)=[xD(l1),,xD(lNS)]RK×NS,
where xil,D represents the midlevel feature descriptor of superpixel l. xD(i) represents the midlevel feature descriptor of image i. If the midlevel features of the pixels in the segmentation region are more similar to some of the visual words, these features can be used to represent the characteristics of the region, and the similarity is measured for the whole region.

2.2.

Probability Feature Descriptors

Let xD be the midlevel feature vector of an image. This feature represents a vector in a K-dimensional space with a dictionary D. If three different types of features (DSIFT, spectral band, and GLCM) are used in the sparse coding phase, then an image can be represented by three different corresponding vectors. That is, each image i can be represented by the following vectors:

Eq. (5)

xD1(i)=[xD1(l1),,xD1(lNS)]RK1×NS,xD2(i)=[xD2(l1),,xD2(lNS)]RK2×NS,xD3(i)=[xD3(l1),,xD3(lNS)]RK3×NS,
where D1, D2, and D3 are the dictionaries extracted from the DSIFT and spectral features, l represents the superpixels, and K1, K2, and K3 are the dictionary sizes. These two kinds of midlevel features combined with training samples are used to estimate the SVM classifier parameters and calculate the probability of vectors belonging to each LULC class, respectively.

The probability vectors of the different midlevel feature descriptors can be represented as follows:

Eq. (6)

P1=[p1(l1),,p1(lNS)]RKL×NS,P2=[p2(l1),,p2(lNS)]RKL×NS,P3=[p3(l1),,p3(lNS)]RKL×NS,
where KL represents the number of land cover classes. The MFSC feature descriptors for the final classification are given as follows:

Eq. (7)

P=[P1,P2,P3]RKL3×NS,
where KL3 represents the size of the feature descriptors, and this value is thrice the number of LULC classes. The size of MFSC feature descriptors is much smaller than the size of midlevel feature descriptors as in Eq. (5).

2.3.

CRF Classification Model

The CRF model for the final classification of high-resolution remote sensing images is proposed. The CRF is defined over a set of superpixels ν extracted from the image I. Each superpixel iν is associated with a class label xL={1,,L}. The labeling of the image is denoted by the vector xL|ν|. The interaction among various superpixels of the CRF is captured by the set of edges ϵν×ν, where each edge eijϵ corresponds to a pair of superpixels i,jν that share a boundary.

The CRF energy, which consists of unary and pairwise costs, can be formulated as follows:

Eq. (8)

E(x,I)=λUiυψiU(xi,I)+λPei,jϵψijP(xi,xj,I),
where λU0 and λp0 are the relative weights of the unary and pairwise potentials, respectively.

The unary potential, which is expressed as ψiU(xi,I) in Eq. (8), models the cost of assigning a class label xiL to superpixel i in image I. This potential is defined as the score of a kernel SVM classifier for class xi applied to an MFSC feature vector of superpixel i described in Eq. (7). The classifier for class l is trained using the MFSC feature vector extracted from the superpixels in the training set. This vector is labeled as l. The radial basis function (RBF)-χ2 kernel is adopted for SVM classification.

The pairwise potential, ψijP(xi,xj,I), models the cost of assigning labels xi and xj to the neighboring superpixels i and j, respectively. When a CRF formulation is used for classification, the pairwise potentials are usually used to ensure the smoothness of the label assignments. A contrast sensitive cost is used as follows:

Eq. (9)

ψijP(xi,xj,I)=Lijδ(xixj)1+I¯iI¯j,
where Lij is the length of the shared boundary between superpixels i and j, and I¯i and I¯j are the gray mean values of superpixels i and j, respectively. The parameters in Eq. (8), λU and λP, are estimated by the cutting plane method, the details of which are described in Ref. 49. The classification result of the CRF models could be achieved by solving Eq. (8).

3.

Experimental Results

We conduct experiments using the high-resolution aerial images to evaluate the effectiveness of the proposed MFSC-CRF framework for LULC classification. Based on the study of Jain et al.’s49 work, comparative experiments are conducted by combining feature descriptors and classification methods. We compared the different methods using single-object class accuracy and total accuracy. The low-level feature, midlevel feature, and classifier associated with SF-SVM, U-SVM, GLCM-SVM, MFSC-SVM, SF-CRF, U-CRF, GLCM-CRF, and MFSC-CRF are reported in Table 1. The details are described as follows.

  • 1. SF-SVM: This method uses only the unary segmentation cost. Spectral features are considered low-level features in this technique. After midlevel feature learning, the SVM method is adapted to achieve classification results. This method is very similar to the simultaneous orthogonal matching pursuit method proposed by Chen et al.51

  • 2. U-SVM: This method is similar to SF-SVM, but they differ in the selection of low-level features. As described in Ref. 26, the DSIFT feature is considered as the low-level feature, and the SVM classifier is used for superpixel level classification.

  • 3. GLCM-SVM: The GLCM feature is considered as the low-level feature in this method, and the SVM classifier is used for superpixel level classification.

  • 4. MFSC-SVM: Multifeature soft-probability is used for the feature vector in this method, and SVM is adopted for LULC classification.

  • 5. SF-CRF: Spectral feature is considered as the low-level feature in this method, which is combined with sparse coding and CRF to achieve the classification results.

  • 6. U-CRF: Sparse coding and the CRF model are used in this technique, and DSIFT is considered as the low-level feature, as described in Ref. 48.

  • 7. GLCM-CRF: GLCM is considered as the low-level feature descriptor in this model, in which CRF is adopted for classification.

  • 8. MFSC-CRF: Probabilities are considered as feature descriptors in this proposed method, in which CRF is adopted for supervised classification.

Table 1

Information of different classification methods.

MethodLow-level featureMidlevel featureClassifier
SF-SVMSpectral featuresSparse coding and max-pooling [Eq. (4)]SVM
U-SVMDSIFTSparse coding and max-pooling [Eq. (4)]SVM
GLCM-SVMGLCMSparse coding and max-pooling [Eq. (4)]SVM
MFSC-SVMSpectral features, DSIFT, and GLCMMFSC [Eq. (7)]SVM
SF-CRFSpectral featuresSparse coding and max-pooling [Eq. (4)]CRF
U-CRFDSIFTSparse coding and max-pooling [Eq. (4)]CRF
GLCM-CRFGLCMSparse coding and max-pooling [Eq. (4)]CRF
MFSC-CRFSpectral features, DSIFT, and GLCMMFSC [Eq. (7)]CRF

The experimental results are evaluated using three kinds of accuracies, namely, the accuracy of each class, overall accuracy (OA), and kappa coefficient (Kappa). OA is the fraction of correctly classified pixels, based on all pixels of that ground-truth class. For a fair comparison, the classification results with the highest OA are selected for all classification algorithms. The effect of the number of training samples is further investigated in relation to the MFSC-CRF model.

3.1.

Experimental Data Description

3.1.1.

Experimental datasets (testing site 1)

The first test image is captured over the rural residential area in Wuhan city, Hubei Province, China, through unmanned aerial vehicle aerial photography, including red, green, and blue three spectral bands. The image is of 1024×1200  pixels, with spatial resolution of 0.2 m and three multispectral channels. An overview of this dataset is shown in Fig. 2(a). The corresponding ground truth is shown in Fig. 2(b). The testing image was segmented to 52,654 superpixels using the simple linear iterative clustering method. Six classes of interest, namely, low vegetation, homestead, farmland, waterbody, road, and woodland, are considered and listed in Table 2. Rural homestead is the main type of rural residential land and is more scattered. This class contains various houses, walls, and other facilities with spatial correlation and semantic structure characteristics. The other five class types are mainly land cover types. A total of 100 training samples for each LULC class type is used from the reference ground-truth data, and the remaining samples are used to evaluate the accuracy. The results are shown in Table 2.

Fig. 2

Wuhan rural residential area dataset (testing site 1): (a) RGB and (b) ground-truth images (low vegetation, homestead, woodland, farmland, waterbody, and road).

JARS_11_4_045010_f002.png

Table 2

Class information of Wuhan rural residential area dataset of testing site 1.

Class nameTraining samplesTesting samples
Low vegetation1006055
Homestead1006518
Woodland10017,710
Farmland10013,022
Waterbody1002294
Road1007055

3.1.2.

Experimental datasets (testing site 2)

This testing image is also captured over the rural residential area in Wuhan city, Hubei Province, China. The image is of 1113×1777  pixels, with spatial resolution of 0.2 m and three multispectral channels. Compared with testing site 1, testing site 2 is larger and has a more complex scene. More trees are around the homesteads in this rural residential area, and the shadow effect is more obvious. This image is a challenging task for LULC classification. The ground-truth image corresponding to the high resolution image (HRI) has been classified manually into the six most common LULC classes. The classification data (label images) are shown in Fig. 3(b). The testing image was segmented to 92,441 superpixels. Similar to testing site 1, six classes of interest are considered and described in Table 3, which also shows the number of the training and testing samples for each class. The training samples are randomly chosen from the reference ground-truth data and are shown in Table 3. The dictionary size is set to 500, and 20,000 pixels are randomly selected for the training dictionary via the K-means clustering method. A total of 500 training samples per LULC class is randomly selected for classifier parameters (Table 3).

Fig. 3

Wuhan rural residential area dataset (testing site 2): (a) RGB and (b) ground-truth images. (low vegetation, homestead, woodland, farmland, waterbody, and road).

JARS_11_4_045010_f003.png

Table 3

Class information of Wuhan rural residential area dataset of testing site 2.

Class nameTraining samplesTesting samples
Low vegetation5008964
Homestead5009215
Woodland50020,304
Farmland50041,528
Waterbody5001781
Road50010,649

3.2.

Experimental Results and Analysis for Testing Site 1

The experimental results for testing site 1 are reported to validate the effectiveness of the proposed MFSC-CRF for LULC classification. The classification accuracies of the various midlevel feature learning methods, namely, SF-SVM, GLCM-SVM, U-SVM, MFSC-SVM, GLCM-CRF, and U-CRF, which are different combinations of low-level feature descriptors and classifier, are compared. The SVM classifier with RBF kernel has been proven to be successful in supervised classification of high-dimensional HRI data. Among the SVM-based methods, MFSC-SVM achieves better classification results than the other three methods [Figs. 4(c)4(f)]. However, the SVM algorithm, in which any neighborhood spatial contextual information is not considered, results in high isolated salt-and-pepper classification noise, because neighborhood interactions are not considered in the algorithms.

Fig. 4

Classification of Wuhan rural residential area datasets (testing site 1): (a) SF-SVM, (b) U-SVM, (c) GLCM-SVM, (d) MFSC-SVM, (e) SF-CRF, (f) U-CRF, (g) GLCM-CRF, and (h) MFSC-CRF. [The red rectangles in (e) and (h) are used to indicate the difference in the classification results].

JARS_11_4_045010_f004.png

For the MFSC-CRF algorithm, which is proposed to combine different effective features, the oversmoothing is less serious in Fig. 4(e), as is shown in the red boxes of Figs. 4(e) and 4(h). Moreover, the boundaries of homestead are better preserved. By contrast, SF-SVM is more focused on the spectral information. Thus, the classification remarkably depends less on the structural information, which probably explains the misclassification of U-CRF.

The quantitative performances with the highest classification accuracies obtained by SF-SVM, U-SVM, GLCM-SVM, MFSC-SVM, SF-CRF, U-CRF, GLCM-CRF, and MFSC-CRF are reported in Table 4. The best result of each column are in bold. The results show that the algorithms in which spatial contextual information are considered significantly outperformed the SVM classification in classification accuracy. Moreover, the accuracy of MFSC-CRF is higher than the three other CRF-based classification methods (i.e., SF-CRF, U-CRF, and GLCM-CRF), indicating that the MFSC-CRF can adaptively incorporate different low-level feature descriptors. With GLCM as the low-level feature descriptor, the GLCM-CRF method achieves much higher accuracy than the SF-SVM, SF-CRF, U-SVM, and U-CRF. This result shows that GLCM can be very effective for LULC classification. In the dataset of the testing site 1 of Wuhan rural residential area (Table 4), the reported quantitative performance of MFSC-CRF exhibits the improvement in OA. Additionally, the 21% higher accuracy (from 64.9% to 86.3%) of MFSC-CRF compared with U-SVM shows that MFSC-CRF focuses more on spatial contextual information. Thus, spatial contextual information and other effective feature descriptors should be considered. Finally, the MFSC-CRF obtains the highest accuracy.

Table 4

Classification accuracy for Wuhan rural residential area using dataset of testing site 1 with different classifiers.

MethodsAccuracy (%)OA (%)Kappa
Low vegetationHomesteadWoodlandFarmlandWaterbodyRoad
U-SVM55.7±5.665.4±7.847.8±1.868.9±6.294.4±2.557.9±4.164.9±0.70.596±0.000059
SF-SVM74.3±15.283.8±1.279.5±7.691.4±3.197.4±0.872.9±5.183.4±0.30.795±0.000042
GLCM-SVM73.9±6.087.5±8.277.8±0.992.4±3.297.9±1.971.4±16.783.3±0.70.795±0.000096
MFSC-SVM77.6±1.287.7±1.280.1±3.094.7±0.998.1±0.375.9±3.285.7±0.10.823±0.000050
U-CRF59.0±8.467.9±12.752.9±4.079.7±24.797.1±2.358.1±8.169.1±0.80.639±0.000087
SF-CRF74.6±17.585.6±5.181.3±8.892.5±3.897.8±0.573.4±8.484.4±0.60.807±0.000092
GLCM-CRF73.2±5.089.0±8.579.7±1.493.3±3.497.8±0.272.3±16.584.2±0.70.805±0.000097
MFSC-CRF78.2±2.288.7±0.981.4±3.595.1±0.998.4±0.176.6±2.886.3±0.10.830±0.000044

Figure 5 shows the confusion matrices of different classification methods with various feature descriptors and classifiers. The methods, which used only spectral features as low-level feature descriptors (SF-SVM and SF-CRF), misclassified homestead to road with 14%. The reason is that the two LULC types have similar spectral characteristics, and all belong to the impermeable surface. The GLCM- (GLCM-SVM and GLCM-CRF) and MFSC-based methods (MFSC-SVM and MFSC-CRF) are less serious than the SF-based methods. The MFSC-CRF method incorporates different low-level feature descriptors and results in 89% accuracy for homestead.

Fig. 5

Confusion matrices on Wuhan rural residential area datasets (testing site 1): (a) SF-SVM, (b) U-SVM, (c) GLCM-SVM, (d) MFSC-SVM, (e) SF-CRF, (f) U-CRF, (g) GLCM-CRF, and (h) MFSC-CRF.

JARS_11_4_045010_f005.png

3.3.

Experimental Results and Analysis for Testing Site 2

The resulting maps for the visual classification for this testing image are shown in Figs. 6(a)6(h). The quantitative classification results of the different classification methods are shown in Table 5 (The best result of each column is in bold) and Figs. 7(a)7(h). The proposed MFSC-CRF method achieves the highest OA and Kappa than SF-SVM, U-SVM, GLCM-SVM, MFSC-SVM, SF-CRF, U-CRF, and GLCM-CRF. Compared with SF-SVM and U-SVM, the MFSC-SVM method achieves remarkably enhanced OA and homestead accuracy. Compared with GLCM-SVM, the classification accuracy of the MFSC-SVM method shows 3% improvement for each LULC class. Considering neighborhood spatial contextual information, the quantitative performance of MFSC-CRF shows 0.1% accuracy improvement (from 87.4% to 87.5%) compared with MFSC-SVM method.

Fig. 6

Classification of Wuhan rural residential area dataset (testing site 2): (a) SF-SVM, (b) U-SVM, (c) GLCM-SVM, (d) MFSC-SVM, (e) SF-CRF, (f) U-CRF, (g) GLCM-CRF, and (h) MFSC-CRF.

JARS_11_4_045010_f006.png

Table 5

Classification accuracy for Wuhan rural residential area dataset of testing site 2 with different classifiers.

MethodsAccuracy (%)OA (%)Kappa
Low vegetationHomesteadWoodlandFarmlandWaterbodyRoad
U-SVM86.6±1.280.2±0.576.4±2.985.4±0.598.3±0.674.0±0.983.5±0.10.601±0.000036
SF-SVM87.7±0.282.7±3.076.1±0.285.4±0.498.2±0.374.4±1.884.1±0.10.616±0.000022
GLCM-SVM86.4±1.482.7±4.476.6±0.883.1±3.597.9±0.174.5±1.883.6±0.20.612±0.000013
MFSC-SVM91.2±1.985.7±1.981.5±0.589.5±0.196.6±0.379.6±1.087.4±0.10.671±0.000040
U-CRF87.4±0.380.9±0.276.9±3.085.7±0.798.4±0.475.5±2.284.1±0.30.614±0.000160
SF-CRF88.3±0.183.8±1.876.3±1.086.0±0.398.2±0.275.4±1.984.6±0.10.625±0.000059
GLCM-CRF87.0±1.982.0±3.977.1±0.683.4±2.697.9±0.176.2±1.384.0±0.10.619±0.000007
MFSC-CRF91.2±0.685.9±2.781.4±0.589.4±0.296.7±0.380.4±0.787.5±0.10.674±0.000084

Fig. 7

Confusion matrices on Wuhan rural residential area datasets (testing site 2): (a) SF-SVM, (b) U-SVM, (c) GLCM-SVM, (d) MFSC-SVM, (e) SF-CRF, (f) U-CRF, (g) GLCM-CRF, and (h) MFSC-CRF.

JARS_11_4_045010_f007.png

3.4.

Parameter Sensitivity Analysis

The performance of the proposed MFSC-CRF method is further evaluated using different numbers of training samples. Testing image 1 is selected for parameter sensitivity analysis, and the effects of training sample numbers on the MFSC-CRF algorithms are examined. Different sizes ranging from 100 to 1000 are tested with an interval of 100 for each LULC class.

As shown in Fig. 8, the classification accuracy of MFSC-CRF initially increases for the datasets with gradual increase in the number of training samples per class (from 85.6% to 93.2%). The classification accuracy of MFSC-CRF is slightly higher than GLCM-CRF (from 84.0% to 92.0%) and MFSC-SVM (from 85.0% to 92.8%) classification approaches with Wuhan rural residential area dataset of testing site 1. The accuracy then remains roughly constant when the training sample number is set to 900 but slightly decreases. Moreover, the classification accuracy of the proposed method remains higher than the other seven methods at each training number. The training samples are randomly selected from the overall ground truth, and the remaining samples are used to evaluate the classification accuracies. The experiments show that the classification accuracies of the methods incorporating spatial contextual information (i.e., SF-CRF, U-CRF, GLCM-CRF, and the proposed MFSC-CRF) are all better than SVM-based classification methods. Moreover, the MFSC-CRF method is more robust than the other classification methods with different training samples.

Fig. 8

Effect of training data size on the classification results for Wuhan rural residential area dataset of testing site 1.

JARS_11_4_045010_f008.png

4.

Conclusion

A classification method for HSR remote sensing images based on MFSC and CRF models is proposed. The proposed MFSC-CRF method can effectively incorporate spectral, structural, and textural features, as well as spatial contextual information. Midlevel feature learning based on sparse coding is very important in image classification, and the proposed feature combination method can significantly improve the classification accuracy by effectively combining three complementary features, namely, DSIFT, spectral bands, and GLCM. Experiments on the Wuhan residential area datasets also show that the GLCM features can achieve more promising results than the original spectral features. This method is an open model, very convenient to cascade different features to improve the accuracy of image classification. Recently, the convolution neural network is widely used in image classification and achieved good results. However, the convolution neural network model requires a large number of training samples to train the parameters. Therefore, our next step is to use a small amount of training samples to fine-tune the convolution neural network model so that it can be effectively applied to remote sensing image classification applications.

References

1. 

D. Li, L. Zhang and G. S. Xia, “Automatic analysis and mining of remote sensing big data,” Acta Geod. Cartogr. Sin., 43 1211 –1216 (2014). Google Scholar

2. 

G. S. Xia et al., “Structural high-resolution satellite image indexing,” in ISPRS TC VII Symp. 100 Years ISPRS, 298 –303 (2010). Google Scholar

3. 

P. Gong et al., “Finer resolution observation and monitoring of global land cover: first mapping results with Landsat TM and ETM+ data,” Int. J. Remote Sens., 34 (7), 2607 –2654 (2013). http://dx.doi.org/10.1080/01431161.2012.748992 IJSEDK 0143-1161 Google Scholar

4. 

Z. Zhu et al., “Assessment of spectral, polarimetric, temporal, and spatial dimensions for urban and peri-urban land cover classification using Landsat and SAR data,” Remote Sens. Environ., 117 72 –82 (2012). http://dx.doi.org/10.1016/j.rse.2011.07.020 RSEEA7 0034-4257 Google Scholar

5. 

A. Paul, M. A. Peter and J. C. Paul, “Fine spatial resolution simulated satellite sensor imagery for land cover mapping in the United Kingdom,” Remote Sens. Environ., 68 206 –216 (1999). http://dx.doi.org/10.1016/S0034-4257(98)00112-6 RSEEA7 0034-4257 Google Scholar

6. 

O. Debeir et al., “Textural and contextual land-cover classification using single and multiple classifier system,” Photogramm. Eng. Remote Sens., 68 597 –605 (2002). Google Scholar

7. 

K. Jia et al., “Land cover classification of finer resolution remote sensing data integrating temporal features from time series coarser resolution data,” ISPRS J. Photogramm. Remote Sens., 93 49 –55 (2014). http://dx.doi.org/10.1016/j.isprsjprs.2014.04.004 IRSEE9 0924-2716 Google Scholar

8. 

T. Blaschke et al., “Geographic object-based image analysis towards a new paradigm,” ISPRS J. Photogramm. Remote Sens., 87 180 –191 (2014). http://dx.doi.org/10.1016/j.isprsjprs.2013.09.014 IRSEE9 0924-2716 Google Scholar

9. 

Y. Zhong, J. Zhao and L. Zhang, “A hybrid object-oriented conditional random field classification framework for high spatial resolution remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., 52 (11), 7023 –7037 (2014). http://dx.doi.org/10.1109/TGRS.2014.2306692 IGRSD2 0196-2892 Google Scholar

10. 

K. Biro et al., “Exploitation of TerraSAR-X data for land use/land cover analysis using object-oriented classification approach in the African Sahel Area, Sudan,” J. Indian Soc. Remote Sens., 41 (3), 539 –553 (2013). http://dx.doi.org/10.1007/s12524-012-0230-7 Google Scholar

11. 

M. Ustuner, F. Balik Sanli and B. Dixon, “Application of support vector machines for land use classification using high-resolution rapid eye images: a sensitivity analysis,” Eur. J. Remote Sens., 48 403 –422 (2015). http://dx.doi.org/10.5721/EuJRS20154823 Google Scholar

12. 

S. D. Jawak et al., “Advancement in land cover classification using very high resolution remotely sensed 8-band WorldView-2 satellite data,” Int. J. Earth Sci. Eng., 6 (2), 1742 –1749 (2013). Google Scholar

13. 

X. Huang and L. Zhang, “An SVM ensemble approach combining spectral, structural, and semantic features for the classification of high-resolution remotely sensed imagery,” IEEE Trans. Geosci. Remote Sens., 51 (1), 257 –272 (2013). http://dx.doi.org/10.1109/TGRS.2012.2202912 IGRSD2 0196-2892 Google Scholar

14. 

G. Moser, S. B. Serpico and J. A. Benediktsson, “Land-cover mapping by Markov modeling of spatial–contextual information in very-high-resolution remote sensing images,” Proc. IEEE, 101 (3), 631 –651 (2013). http://dx.doi.org/10.1109/JPROC.2012.2211551 IEEPAD 0018-9219 Google Scholar

15. 

L. Wang and Q. Wang, “Subpixel mapping using Markov random field with multiple spectral constraints from subpixel shifted remote sensing images,” IEEE Geosci. Remote Sens. Lett., 10 (3), 598 –602 (2013). http://dx.doi.org/10.1109/LGRS.2012.2215573 IGRSBY 1545-598X Google Scholar

16. 

A. Voisin et al., “Classification of very high resolution SAR images of urban areas using copulas and texture in a hierarchical Markov random field model,” IEEE Geosci. Remote Sens. Lett., 10 (1), 96 –100 (2013). http://dx.doi.org/10.1109/LGRS.2012.2193869 IGRSBY 1545-598X Google Scholar

17. 

W. Yang et al., “SAR-based terrain classification using weakly supervised hierarchical Markov aspect models,” IEEE Trans. Image Process., 21 (9), 4232 –4243 (2012). http://dx.doi.org/10.1109/TIP.2012.2199127 IIPRE4 1057-7149 Google Scholar

18. 

X. L. Li et al., “A survey on scene image classification,” Sci. Sin. Inf., 45 827 –848 (2015). http://dx.doi.org/10.1360/N112014-00286 Google Scholar

19. 

H. Bay et al., “Speeded-up robust features (SURF),” Comput. Vis. Image Understanding, 110 (3), 346 –359 (2008). http://dx.doi.org/10.1016/j.cviu.2007.09.014 CVIUF4 1077-3142 Google Scholar

20. 

G. S. Xia, J. Delon and Y. Gousseau, “Shape-based invariant texture indexing,” Int. J. Comput. Vision, 88 (3), 382 –403 (2010). http://dx.doi.org/10.1007/s11263-009-0312-3 IJCVEQ 0920-5691 Google Scholar

21. 

G. S. Xia, J. Delon and Y. Gousseau, “Accurate junction detection and characterization in natural images,” Int. J. Comput. Vision, 106 (1), 31 –56 (2014). http://dx.doi.org/10.1007/s11263-013-0640-1 IJCVEQ 0920-5691 Google Scholar

22. 

D. G. Lowe, “Distinctive image features from scale-invariant key points,” Int. J. Comput. Vision, 60 91 –110 (2004). http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94 IJCVEQ 0920-5691 Google Scholar

23. 

N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 886 –893 (2005). http://dx.doi.org/10.1109/CVPR.2005.177 Google Scholar

24. 

H. Goncalves, L. Corte-Real and J. Goncalves, “Automatic image registration through image segmentation and SIFT,” IEEE Trans. Geosci. Remote Sens., 49 (7), 2589 –2600 (2011). http://dx.doi.org/10.1109/TGRS.2011.2109389 IGRSD2 0196-2892 Google Scholar

25. 

A. M. Cheriyadat, “Unsupervised feature learning for aerial scene classification,” IEEE Trans. Geosci. Remote Sens., 52 (1), 439 –451 (2014). http://dx.doi.org/10.1109/TGRS.2013.2241444 IGRSD2 0196-2892 Google Scholar

26. 

F. F. Li and P. Perona, “Bayesian hierarchy model for learning natural scene categories,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 524 –531 (2005). http://dx.doi.org/10.1109/CVPR.2005.16 Google Scholar

27. 

Y. Boureau et al., “Ask the locals: multi-way local pooling for image recognition,” in IEEE Int. Conf. on Computer Vision (ICCV), 2651 –2658 (2011). http://dx.doi.org/10.1109/ICCV.2011.6126555 Google Scholar

28. 

Y. Cao et al., “Spatial-bag-of-features,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 3352 –3359 (2011). http://dx.doi.org/10.1109/CVPR.2010.5540021 Google Scholar

29. 

Y. Huang et al., “Salient coding for image classification,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 1753 –1760 (2011). http://dx.doi.org/10.1109/CVPR.2011.5995682 Google Scholar

30. 

Z. L. Jiang, Z. Lin and L. S. Davis, “Label consistent K-SVD: learning a discriminative dictionary for recognition,” IEEE Trans. Pattern Anal. Mach. Intell., 35 (11), 2651 –2664 (2013). http://dx.doi.org/10.1109/TPAMI.2013.88 ITPIDJ 0162-8828 Google Scholar

31. 

W. Yang, X. Yin and G. S. Xia, “Learning high-level features for satellite image classification with limited labelled samples,” IEEE Trans. Geosci. Remote Sens., 53 (8), 4472 –4482 (2015). http://dx.doi.org/10.1109/TGRS.2015.2400449 IGRSD2 0196-2892 Google Scholar

32. 

H. Lobel, R. Vidal and A. Soto, “Learning shared, discriminative, and compact representations for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., 37 (11), 2218 –2231 (2015). http://dx.doi.org/10.1109/TPAMI.2015.2408349 ITPIDJ 0162-8828 Google Scholar

33. 

G. F. Sheng et al., “High-resolution satellite scene classification using a sparse coding based multiple feature combination,” Int. J. Remote Sens., 33 (8), 2395 –2412 (2012). http://dx.doi.org/10.1080/01431161.2011.608740 IJSEDK 0143-1161 Google Scholar

34. 

K. Qi et al., “Land-use scene classification in high-resolution remote sensing images using improved correlatons,” IEEE Geosci. Remote Sens. Lett., 12 (12), 2403 –2407 (2015). http://dx.doi.org/10.1109/LGRS.2015.2478966 IGRSBY 1545-598X Google Scholar

35. 

S. S. Chen and Y. L. Tian, “Pyramid of spatial relations for scene-level land use classification,” IEEE Trans. Geosci. Remote Sens., 53 (4), 1947 –1957 (2015). http://dx.doi.org/10.1109/TGRS.2014.2351395 IGRSD2 0196-2892 Google Scholar

36. 

F. Hu et al., “Unsupervised feature learning via spectral clustering of multidimensional patches for remotely sensed scene classification,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 8 (5), 2015 –2030 (2015). http://dx.doi.org/10.1109/JSTARS.2015.2444405 1939-1404 Google Scholar

37. 

F. Hu et al., “Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery,” Remote Sens., 7 (11), 14680 –14707 (2015). http://dx.doi.org/10.3390/rs71114680 Google Scholar

38. 

K. Nogueira, O. A. B. Penatti and J. A. D. Santos, “Towards better exploiting convolutional neural networks for remote sensing scene classification,” Pattern Recogn., 61 539 –556 (2017). http://dx.doi.org/10.1016/j.patcog.2016.07.001 PTNRA8 0031-3203 Google Scholar

39. 

J. Besag, “Spatial interaction and the statistical analysis of lattice systems,” J. R. Stat. Soc. Ser. B, 36 (2), 192 –236 (1974). http://dx.doi.org/10.2307/2984812 JSTBAJ 0035-9246 Google Scholar

40. 

J. Lafferty, A. McCallum and F. Pereira, “Conditional random fields: probabilistic models for segmenting and labeling sequence data,” in Int. Conf. on Machine Learning, (2001). Google Scholar

41. 

G. S. Xia, C. He and H. Sun, “Integration of synthetic aperture radar image segmentation method using Markov random field on region adjacency graph,” IET Radar Sonar Navig., 1 (5), 348 –353 (2007). http://dx.doi.org/10.1049/iet-rsn:20060128 IRSNBX 1751-8784 Google Scholar

42. 

B. Y. Liu and X. M. He, “Multiclass semantic video segmentation with object-level active inference,” in IEEE Conf. on Computer Vision and Pattern Recognition, (2015). http://dx.doi.org/10.1109/CVPR.2015.7299057 Google Scholar

43. 

X. M. He and G. Stephen, “An exemplar-based CRF for multi-instance object segmentation,” in IEEE Conf. on Computer Vision and Pattern Recognition, (2014). http://dx.doi.org/10.1109/CVPR.2014.45 Google Scholar

44. 

Y. S. Ming, H. D. Li and X. M. He, “Connected contours: a contour completion model that respects closure-effect,” in IEEE Conf. on Computer Vision and Pattern Recognition, (2012). http://dx.doi.org/10.1109/CVPR.2012.6247755 Google Scholar

45. 

X. M. He, R. Zemel and M. Carreira-Perpinan, “Multiscale conditional random fields for image labelling,” in IEEE Conf. on Computer Vision and Pattern Recognition, (2004). http://dx.doi.org/10.1109/CVPR.2004.1315232 Google Scholar

46. 

V. Michele and F. Vittorio, “Semantic segmentation of urban scenes by learning local class interactions,” in IEEE Conf. on Computer Vision and Pattern Recognition Workshops, (2015). http://dx.doi.org/10.1109/CVPRW.2015.7301377 Google Scholar

47. 

V. Michele and F. Vittorio, “Structured prediction for urban scene semantic segmentation with geographic context,” in Joint Urban Remote Sensing Event, (2015). Google Scholar

48. 

P. Zhong and R. Wang, “Learning conditional random fields for classification of hyperspectral images,” IEEE Trans. Image Process., 19 (7), 1890 –1907 (2010). http://dx.doi.org/10.1109/TIP.2010.2045034 IIPRE4 1057-7149 Google Scholar

49. 

A. Jain et al., “Visual dictionary learning for joint object categorization and segmentation,” Lect. Notes Comput. Sci., 7576 (5), 718 –731 (2012). http://dx.doi.org/10.1007/978-3-642-33715-4 LNCSD9 0302-9743 Google Scholar

50. 

A. Bosch, A. Zisserman and X. Munoz, “Image classification using random forests and ferns,” in IEEE Int. Conf. on Computer Vision, 1 –8 (2007). http://dx.doi.org/10.1109/ICCV.2007.4409066 Google Scholar

51. 

Y. Chen, N. M. Nasrabadi and T. D. Tran, “Hyperspectral image classification using dictionary-based sparse representation,” IEEE Trans. Geosci. Remote Sens., 49 (10), 3973 –3985 (2011). http://dx.doi.org/10.1109/TGRS.2011.2129595 IGRSD2 0196-2892 Google Scholar

Biography

Bin Zhang received his BS, MS, and PhD degrees from the School of Electronic Information, Wuhan University, in 2007, 2009, and 2013, respectively. He is currently working at China University of Geosciences. His research interests include image classification, scene-level land use classification, and deep learning.

Biographies for the other authors are not available.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Bin Zhang, Yueyan Liu, Zuyu Zhang, and Yonglin Shen "Land use and land cover classification for rural residential areas in China using soft-probability cascading of multifeatures," Journal of Applied Remote Sensing 11(4), 045010 (1 December 2017). https://doi.org/10.1117/1.JRS.11.045010
Received: 17 April 2017; Accepted: 10 November 2017; Published: 1 December 2017
Lens.org Logo
CITATIONS
Cited by 6 scholarly publications.
Advertisement
Advertisement
KEYWORDS
Image classification

Associative arrays

Feature extraction

Remote sensing

Image segmentation

Roads

Vegetation

Back to Top