Open Access
8 September 2020 Global chlorophyll-a concentration estimation from moderate resolution imaging spectroradiometer using convolutional neural networks
Bowen Yu, Linlin Xu, Jun-huan Peng, Zhongzheng Hu, Alexander Wong
Author Affiliations +
Abstract

The accurate estimation of global chlorophyll-a (Chla) concentration from the large remote sensing data in a timely manner is crucial for supporting various applications. Moderate resolution imaging spectroradiometer (MODIS) is one of the most widely used earth observation data sources, which has the characteristics of global coverage, high spectral resolution, and short revisit period. So the estimation of global Chla concentration from MODIS imagery in a fast and accurate manner is significant. Nevertheless, the estimation of Chla concentration from MODIS using traditional machine learning approaches is challenging due to their limited modeling capability to capture the complex relationship between MODIS spatial–spectral observations and the Chla concentration, and also their low computational efficiency to address large MODIS data in a timely manner. We, therefore, explore the potential of deep convolutional neural networks (CNNs) for Chla concentration estimation from MODIS imagery. The Ocean Color Climate Change Initiative (OC-CCI) Chla concentration image is used as ground truth because it is a well-recognized Chla concentration product that is produced by assimilating different satellite data through a complex data processing steps. A total of 12 monthly OC-CCI global Chla concentration maps and the associated MODIS images are used to investigate the CNN approach using a cross-validation approach. The classical machine learning approach, i.e., the supported vector regression (SVR), is used to compare with the proposed CNN approach. Comparing with the SVR, the CNN performs better with the mean log root-mean-square error and R2 of being 0.129 and 0.901, respectively, indicating that using the MODIS images alone, the CNN approach can achieve results that is close to the OC-CCI Chla concentration images. These results demonstrate that CNNs may provide Chla concentration images that are reliable, stable and timely, and as such CNN constitutes a useful technique for operational Chla concentration estimation from large MODIS data.

1.

Introduction

Chlorophyll-a (Chla) concentration is a key indicator of the biophysical status of water bodies. Numerous satellites programs operated by the National Aeronautics and Space Administration (NASA), the National Oceanic and Atmospheric Administration (NOAA), and the European Space Agency (ESA) have global ocean monitoring capability (Table 1). Accurate and fast estimation of global Chla concentration from these satellites’ images is crucial for various purposes, such as the net primary production derivation,1,2 harmful algal blooms detection,36 physical and biological interactions,7,8 and ecosystem models validation.9,10

Table 1

Summary of primary ocean color satellites.

AcronymFull nameOperating institutionService period
CZCSCoastal zone color scannerNASA1978 to 1986
SeaWiFSSea-viewing wide field-of-view sensorNASA1997 to 2010
MERISMedium resolution imaging spectrometerESA2002 to 2012
MODISModerate resolution imaging spectroradiometerNASA1999 to present
VIIRSVisible and infrared imager radiometer suiteNASA and NOAA2011 to present
OLCIOcean and land color instrumentESA2016 to present

The twin-moderate resolution imaging spectroradiometer (MODIS), terra-MODIS and aqua-MODIS, have high temporal resolution to view the entire Earth’s surface every 1 to 2 days which have provided big remote sensing data. The big data characteristics of satellite remote sensing calls for an efficient Chla concentration estimation algorithm that can handle the big remote sensing data in an efficient and effective manner. Nevertheless, the relationship between the Chla concentration and the remote sensing observations is a very complex nonlinear function, as a consequence, design of an end-to-end machine learning approach with strong learning capability that can capture the complex nonlinear relationship between the Chla concentration and the remote sensing observations is an important research issue. Support vector machine, a classical traditional machine learning approaches, has limitations due to their relative weak modeling capability compared to neural networks to handle the complexity and uncertainty in the inverse problem and also the low computational efficiency.11 Deep learning, especially convolutional neural networks (CNNs), is well known for its using both spatial and spectral information, strong modeling capability and leveraging GPU for high computational efficiency, by greatly outperforming the other machine learning techniques in a lot of computer vision tasks.1216 Remote sensing scientists have seen the value of CNNs when they face the tasks of classification, segmentation, object detection, change detection, and super resolution.1719 However, very limited studies focus on using CNNs to solve regression problems. Although CNNs have huge potential for addressing the problems in Chla estimation, to the best of our knowledge, there is no study to explore the capability of CNNs for the estimation of global Chla concentration. Therefore, this paper explores deep CNN for the estimation of global Chla concentration. Nevertheless, deep CNN requires a lot of training samples, but the ground-truth field data are very limited, and as such we adopt the Ocean Color Climate Change Initiative (OC-CCI) images as ground truth.

The OC-CCI project is one of the 13 projects in the ESA CCI program that aim at addressing a particular essential climate variable from satellite observations. The objective of this project is to produce a long-term, multisensor time-series of satellite ocean color data, e.g., Chla concentration, with specific information on errors and uncertainties.20 The OC-CCI Chla concentration dataset is produced by combining the ocean chlorophyll 3 (OC3) algorithm, the ocean chlorophyll 5 (OC5) algorithm, and the OCI algorithm [ocean chlorophyll 4 (OC4) algorithm + color index (CI) algorithm]2123 on the merged data of SeaWiFS, MERIS, MODIS, and VIIRS. Implemented by a sophisticated processing chain, OC-CCI intends to provide a global Chla concentration product of the highest quality, particularly on Chla retrievals from case 2 waters in global scale, although these may not be the latest.24 Although inevitable biases exist in the OC-CCI products when being matched against the in situ observations, they are the most accessible and reliable Chla concentration product.

This paper explores the use of the CNN regression method for the end-to-end estimation of the global Chla concentration from the imagery recorded by the MODIS sensor on board the Aqua satellite. Unlike approaches that build the relationship between reflectance and Chla concentration obtained from the SeaBASS dataset or cruise measurements,25,26 the CNN method here derives the relationship between reflectance and Chla concentration acquired from the OC-CCI Chla image, which can provide abundant training samples for effective training of the CNN model. MODIS remote sensing reflectance (Rrs) images and OC-CCI monthly Chla concentration images are used as input and ground truth, respectively. Nearly 15,000,000 samples are selected for training the patch-based CNN regression model. A total of 12 monthly global Chla concentration images are produced by the CNN and support vector regression (SVR), respectively. Qualitative and quantitative analyses are used for evaluation and comparison analysis, and the results imply that the CNN model can be successfully used to estimate the global Chla concentration. This effort represents an initial attempt to estimate Chla concentration using CNN.

2.

CNN

A CNN usually takes an image as input and outputs its corresponding class label. CNN architectures comprise typically several layers: convolutional layers, pooling layers, and fully connected layers. A slightly different version from the classical LeNet12 architecture is shown in Fig. 1 to explain CNN.

Fig. 1

Structure of a CNN which is slightly different from the classical LeNet. The order of 3-D X and a are channel × height × width. Kernels W in each convolution layer are four-dimensional and the number before at stands for number of the 3-D kernels, so the order of the four-dimensional kernels W is number × channel × height × width.

JARS_14_3_034520_f001.png

The convolution layer 1 takes an RGB image X and kernels W(1) which consist of weights as inputs and outputs feature maps a(1). The size of X is channel×height×width=3×28×28 and the size of W(1) is number×channel×height×width=20×3×5×5 which means that there are 20 three-dimensional (3-D) kernels. Kernels should have the same channels as the corresponding input image. In general, there will be multiple filters and multiple feature maps are generated and stacked to form a multilayer feature map.27,28 Each feature map is obtained by dot product between receptive field and kernel in specific rules of scanning the input image. The weights in the kernel will be learned by training. The process of convolutional layer 1 can be expressed as

Eq. (1)

a(1)=f[W(1)*X+B(1)],
where the operation of convolution is denoted by * and f(·) means the activation function. B(1) is the bias of first convolution layer with the same size as a(1). The height and width of feature maps a(1) are decided by the height and width of kernels W(1) and the step size of the kernels named stride. They will be calculated as

Eq. (2)

aheight(1)=XheightWheight(1)stride+1,

Eq. (3)

awidth(1)=XwidthWwidth(1)stride+1.
The stride of W(1) is 1, so the height and width of feature maps a(1) is 285+1=24. The size of a(1) is channel×height×width=20×24×24, and the channel is same as the number in kernels W(1). It can be explained that the 20 feature maps a(1) could be treated as a 3-D images as X. The activation function employs nonlinear transformation [e.g., sigmoids, tanh, rectified linear unit (ReLU), etc.] to each element in the outputs of convolutional process.

Conventionally, each convolutional layer is followed by a pooling layer in order to reduce the number of parameters to learn and reduce the variance of features,2931 which can be accomplished by operations like maximum, average, etc. No weights need to be learned in this layer. The process of pooling layer 1 can be expressed as

Eq. (4)

a(2)=down[a(1)],
where down[·] represents down sampling. In general, the output of a pooling layer will be reduced by half of input, so the size of a(2) is channel×height×width=20×12×12. The process of convolutional layer 2 and pooling layer 2 are same as convolutional layer 1 and pooling layer 1 which can be expressed as

Eq. (5)

a(3)=f[W(3)*a(2)+B(3)],

Eq. (6)

a(4)=down[a(3)].

A fully connected layer takes all neurons in the input layer, no matter a convolutional layer or fully connected layer, and connects them to each neuron in its layer. It is regularly used as the last few layers in a CNN architecture and yields as many outputs as labels to a classifier layer. The process of two fully connected layers can be expressed as

Eq. (7)

a(5)=f[W(5)·a(4)(:)+B(5)],

Eq. (8)

a(6)=W(6)·a(5)+B(6).
In the fully connected layer 1, the 3-D a(4) will be flatted to a vector with the size changing from channel×height×width=50×4×4 to (50×4×4)=800. An inner product performs between W(5) and a(4), and the size of b(5) is same as a(5). The fully connected layer 2 is similar but without an activation function, and the output is set as 10 neurons according to the number of class, which is the 10 numbers of handwritten and machine-printed character recognition in LeNet. In the last layer, a softmax function σ(·) will be used as classifier to generate the label Y^. Based on the equation above, the output of the CNN is formulated as

Eq. (9)

Y^=σ{W(6)·f[W(5)·down{f[W(3)*down{f[W(1)*X+B(1)]}+B(3)]}(:)+B(5)]+B(6)}.

3.

Methodology

In this paper, a patch-based CNN regression method is used to estimate global Chla concentration from the MODIS imagery. The patch-based regression CNN model scans through images and crops a patch from the images at each location of the scanning. For each patch, it uses the regression CNN to predict the Chla concentration value of the pixel at the patch center. Then the model assembles the whole predicted values to form the global Chla concentration map. The complete procedure is shown in Fig. 2. It consists of three parts: the preprocessing of the input data, the training of the CNN model, and the data prediction employing the trained CNN, which has the same procedure as testing. Performance of the CNN method is evaluated on the whole dataset.

Fig. 2

Flowchart of the proposed algorithm.

JARS_14_3_034520_f002.png

3.1.

Data

In order to train and evaluate the CNN model in the task of the Chla concentration estimation, Aqua MODIS level 3 monthly standard mapped image Rrs and OC-CCI monthly Chla concentration in version 3.1 are used as input and ground truth, respectively. Monthly data of January 2016 will be used for training and validating, and monthly data of the whole year (include January) will be used for testing. Details are shown in Table 2. Usually, there is only a training and testing data set for inverse problem. Due to the fact that CNNs own large parameters, it is a common practice to use a validation data set to help tune the parameters. The derived CNN model is validated after some iterations by calculating the cost function on the validation data using the current model. The CNN with the smallest validation error will be selected as the trained CNN.

Table 2

Details of the dataset used for training, validating, and testing.

SetInputGround truth
TrainingJanuary 2016 MODIS Rrs 443, 488, 547, and 667January 2016 OC-CCI
ValidationJanuary 2016 MODIS Rrs 443, 488, 547, and 667January 2016 OC-CCI
TestingJanuary 2016 MODIS Rrs 443, 488, 547, and 667January 2016 OC-CCI
February 2016 MODIS Rrs 443, 488, 547, and 667February 2016 OC-CCI
March 2016 MODIS Rrs 443, 488, 547, and 667March 2016 OC-CCI
April 2016 MODIS Rrs 443, 488, 547, and 667April 2016 OC-CCI
May 2016 MODIS Rrs 443, 488, 547, and 667May 2016 OC-CCI
June 2016 MODIS Rrs 443, 488, 547, and 667June 2016 OC-CCI
July 2016 MODIS Rrs 443, 488, 547, and 667July 2016 OC-CCI
August 2016 MODIS Rrs 443, 488, 547, and 667August 2016 OC-CCI
September 2016 MODIS Rrs 443, 488, 547, and 667September 2016 OC-CCI
October 2016 MODIS Rrs 443, 488, 547, and 667October 2016 OC-CCI
November 2016 MODIS Rrs 443, 488, 547, and 667November 2016 OC-CCI
December 2016 MODIS Rrs 443, 488, 547, and 667December 2016 OC-CCI

The Rrs products are obtained from the NASA ocean color website.32 These data have a monthly temporal resolution and 4 km (at the equator) spatial resolution with global coverage in equirectangular projection. The atmospheric correction algorithm employed by NASA has been illustrated in detail in the previous studies.3335 We download Rrs images at 443, 488, 547, and 667 nm of each month in 2016. These wavelengths are selected because they are the main input data for the OC3 and the CI algorithms that have been used in the OC-CCI processing chain. In total, 48 images, with 4 images for each month, are used with the same image size, 4320×8640.

On the other hand, 12 images from the year 2016, with one image for each month, derived from the OC-CCI Chla concentration product, are used as ground truth. All Chla images have the same spatial resolution, the coverage, the map projection, and the size as the Rrs images described above, and as such they can be readily used for establishing the relationship between MODIS images and the OC-CCI Chla concentration estimation. In addition, according to the OC-CCI processing chain, all the possible ocean color satellites should be used. For the Chla concentration product in 2016, MODIS and VIIRS are the only available data to merge due to the service period. Further information about the Chla data can be found in Ref. 36.

3.2.

Preprocessing

The ground-truth OC-CCI Chla images are logarithmically transformed due to the heavy-tailed distribution of Chla.3740 The four input MODIS Rrs images associated with a ground-truth image are scaled to achieve the same range as the log-transformed Chla (Table 3). Comparing with the normalization operation,4143 the logarithmic transformation proves in this study to be more effective way to improve the performance of a CNN. Pixels that had been flagged for land, clouds, failure in atmospheric correction, stray light, etc., are assigned the value of NAN to be excluded from your training the algorithm.

Table 3

Details of five images before and after preprocessing.

Before preprocessingAfter preprocessing
MaxMinMeanMidMaxMinMeanMid
Rrs 4430.06580.000960.00790.00786.575200.789170.7762
Rrs 4880.07380.0001040.0060.00627.3840.01040.597860.6154
Rrs 5470.06350.0008020.00220.00196.35320.08020.220990.1878
Rrs 6670.03420.0030.0002250.0001683.421200.022470.0168
Chla96.37010.00120.23670.13531.98392.93450.83740.8687

An image patch of 15×15 (60  km×60  km)  pixels scans the whole Rrs images to generate samples used in this study. Because the Chla image has the same size with the Rrs images, it is convenient to obtain the corresponding Chla concentration located at the patch center. If an image patch contains pixels of NAN value or lies outside the boundary, it will be abandoned. A total of 18,566,240 sample pairs are created, each of which consists of a 3-D matrix of input 4×15×15 image patch and a float scale value of the output label. Validation is crucial to train a strong CNN model, and nearly 3,000,000 samples are selected for validation and the rest 15,000,000 samples are for training. Moreover, the preprocessing procedure is the same during testing.

3.3.

Structure of the Patch-Based Regression CNN

The structure of the patch-based regression CNN is demonstrated in Fig. 3. The main difference from Fig. 1 is the last layer which is specifically designed by outputting only one value to solve regression problem. The CNN consists of two convolutional layers, two pooling layers and two fully connected layers. Strategy of how to tune the parameters to get the structure of the CNN will be explained in Sec. 5.

Fig. 3

Structure of the patch-based regression CNN. The order of 3-D X and a are channel × height × width. Kernels W in each convolution layer are four-dimensional and the number before at stands for number of the 3-D kernels, so the order of the four-dimensional kernels W is number × channel × height × width. The red Tn is the residual error in the n’th layer during backpropagation which has the same size as the corresponding a.

JARS_14_3_034520_f003.png

The input X is image patch with the size of 4×15×15. The output of the convolution layer 1 is 3-D feature map a(1) with the size of 20×13×13, which is a result of the convolution of input patch with the kernel W(1) with stride 1. A max-pooling layer is followed to reduce the size to 20×6×6, and the maximum operation is chosen in the pooling layer owing to its simplicity and effectiveness. Another convolution of kernel W(2) with stride 1, following a max-pooling of process on a(2), yielding feature map a(3) with the size of 50×4×4 and a(4) with the size of 50×2×2. The first fully connected layer takes input of dimension (50×2×2)=200 and outputs 100 neurons. Unlike employing activation function sigmoids in the classical LeNet, the activation function ReLU, max{0,z}, is used in each convolutional layer and the first fully connected layer because it helps accelerate convergence in training.44 Different with the classification problem that the output number of the last fully connected layer is the number of the class, to address the regression here, the last fully connected layer takes as input the 100 neurons and yields one neuron which corresponds to the Chla concentration value.

3.4.

Training and Testing

After modeling a CNN network, a loss function needs to be defined which will be minimized by training. In place of using softmax loss, which is regularly adopted in classification CNNs, the Euclidean loss is adopted for this regression CNN to minimize the error between the CNN output and the ground-truth Chla concentration provided by the OC-CCI Chla image. The loss function is equal to

Eq. (10)

L(W,B)=12Nn=1N(YnY^n)2,
where W represents the weights in kernels, B means bias, N is the number of samples used in each mini-batch which is a small subset of training samples,45 Y^ is the output of the CNN with the parameters, and Y is the ground-truth Chla concentration from OC-CCI Chla image. Y^ is calculated as

Eq. (11)

Y^=W(6)·f[W(5)·down{f[W(3)*down{f[W(1)*X+B(1)]}+B(3)]}(:)+B(5)]+B(6),
which is similar with Eq. (9) in Sec. 2 but without the softmax layer. After the loss function defined, the CNN is trained to minimize the cost using stochastic gradient descent (SGD) optimization algorithm.46,47 The update rule of weights W in SGD is given as

Eq. (12)

Wi+1=Wi+ΔWi+1,

Eq. (13)

ΔWi+1=αΔWiε(LW+λWi),
where Wi+1 and ΔWi+1 are weights and weight update at iteration i+1, the learning rate ε decides how much to learn in each step, the momentum α is motivated by physical perspective for optimization problem to accelerate converge rates, and the weight decay λ helps to avoid overfitting.

In Fig. 3, Tn is the residual error in the n’th layer, which has the same size as the corresponding a. LW will be calculated in each layer from back to head. The LW in fully connected layer 2 will be computed as

Eq. (14)

LW(6)=T(6)·a(5)T,

Eq. (15)

T(6)=YY^.
The LW in fully connected layer 1 will be computed as

Eq. (16)

LW(5)=T(5)·a(4)T,

Eq. (17)

T(5)=f[W(6)T·T(6)],
where f(·) represents the inverse function of ReLU. No weights and bias need to be trained in pooling layers and the LW in convolutional layer 2 will be computed as

Eq. (18)

LW(3)=T(3)a(2)T,

Eq. (19)

T(3)=f{up[T(4)]},

Eq. (20)

T(4)=W(5)T·T(5),
where represents the reverse convolution operation and up(·) represents upsampling. The LW in convolutional layer 1 will be computed as

Eq. (21)

LW(1)=T(1)XT,

Eq. (22)

T(1)=f{up[T(2)]}.

Eq. (23)

T(2)=W(3)TT(3).
The update rule of bias B is same as weights W and the LB in each layer will be computed as

Eq. (24)

LB(6)=T(6),

Eq. (25)

LB(5)=T(5),

Eq. (26)

LB(3)=T(3),

Eq. (27)

LB(1)=T(1).

We train the CNN for 100,000 iterations with mini-batches size of 128. During the training, validating is carried out every 1000 iterations with exploiting 3000 forward passes each iteration on a validating mini-batch size of 1000. All the parameters for this CNN are chosen after comprehensive experiments with different values, and the one with the best validating performance will be selected. Learning rate, momentum, and weight decay are set to be 0.0001, 0.9, and 0.0005 respectively. Once the patch-based CNN model is trained, Chla concentration will be estimated by applying the trained model on the Rrs images pixel by pixel. Evaluation for the testing is processed by the corresponding Chla images.

3.5.

Comparison with SVR

For the purpose of comparison, an SVR is adopted to estimate the Chla concentration using the same data source as the CNN method. The SVR solves complex regression problems by calculating the linear regression function in a higher-dimensional feature space where the input data are converted by a nonlinear function. The SVR is implemented using the fitcsvm in MATLAB. Grid search method is used to tune the soft margin parameter and the Gaussian kernel parameter to maximize performance, and the two tuned parameters are 1 and 2 finally. Unlike the patch-based CNN method, the Chla concentration is performed in a pixel-based manner. Since SVR in MATLAB cannot handle large training data, only 50,000 representative samples are adopted for training.

3.6.

Numerically Assessment of the Method

To numerically assess the agreement between the Chla concentration values achieved by the CNN model and the ground-truth OC-CCI Chla concentration values, a set of statistical indices are used, which include the coefficient of determination [R2; Eq. (28)], the root-mean-square error [RMSE, Eq. (29)], the mean bias [Eq. (30)], the mean absolute percentage error [MAPE, Eq. (31)], and the mean absolute error [MAE; Eq. (32)].

Eq. (28)

R2={[(ChlaCNNiChlaCNN¯)×(ChlaOC-CCIiChlaOC-CCI¯)]N×σChlaCNN×σChlaOC-CCI}2,

Eq. (29)

RMSE=(ChlaCNNChlaOCCCI)2N,

Eq. (30)

bias=(ChlaCNNChlaOC-CCI)N,

Eq. (31)

MAPE=(ChlaCNNChlaOC-CCIChlaOC-CCI)100N,

Eq. (32)

MAE=|ChlaCNNChlaOC-CCI|N,
where ChlaCNN denotes the Chla concentration values estimated by the CNN model and ChlaOC-CCI denotes the ground-truth OC-CCI Chla concentration values. Therefore, R2 is a measurement of the correlation between the predicted and the ground-truth data sets, and RMSE is used as a measurement of absolute error.48 The pixels with NAN values in the CNN prediction map or the OC-CCI Chla ground-truth map are not used for calculating the above-described measures. Moreover, because the natural distribution of Chla is lognormal, the evaluation is in log 10 space.43,49,50

4.

Results

Using the patch-based CNN method, 12 monthly global Chla concentration images are predicted, and the prediction of each image takes about 2 h. A one-month Chla concentration map estimated from the Rrs images by the CNN model and the SVR model, as well as the corresponding OC-CCI ground-truth image are shown in Fig. 3. For both SVR and the CNN predictions, values below 0.001 or above 100  mgm3 (representing <0.001% of the Chla estimates) are removed to limit the product as same as the OC-CCI ground-truth image. From Figs. 4(a) and 4(b), we can see that the prediction map achieved by the CNN model has a strong spatial consistency with the OC-CCI ground-truth Chla concentration map, which is achieved by assimilating various sources of information, such as the proximity to land, the depth of the ocean, and ocean currents.51 Figure 6(a) shows the scatter plots of the predicted values achieved by the CNN model and ground-truth OC-CCI Chla concentration values. In a scatter plot, good performance means that points are close to the equal-value line. Figure 6(a) shows that the majority of data points distribute around the equal-value line, indicating that the CNN method owns the capacity to predict very accurately the Chla concentration information in OC-CCI data, by learning the complex relationship between the Rrs spectra data and the OC-CCI data.

Fig. 4

Spatial distribution of the monthly global Chla concentration in March 2016 (a) obtained using the CNN method, (b) the OC-CCI data, and (c) obtained using the SVR method. In general, Figs. 4(a) and 4(c) have strong spatial consistency with Fig. 4(b). But in the three red rectangle areas, Fig. 4(c) is much deeper red than Fig. 4(a), which means that heavier overestimation happened in the prediction map achieved by the SVR method than the CNN model.

JARS_14_3_034520_f004.png

In Table 4, we can see that the R2 achieved by the CNN model is between 0.879 and 0.918 with the mean of 0.901, indicating a high consistency between the Chla concentration values predicted by the CNN model and the ground-truth OC-CCI values. The other numerical measures also indicate small error and high accuracy of the proposed CNN model, i.e., the overall RMSE of the estimated Chla concentration is 0.129, and the max bias and MAPE are 0.054 and 26.687, respectively. The max and min of MAE are 0.153 and 0.102, respectively, and the mean value is 0.125. We want to highlight the fact that although the proposed CNN model is only trained on the month 1 data, it achieves stably good performance on the other 11 months’ data, indicating a strong generalization capability of the CNN model.

Table 4

Statistical results for 12-month Chla concentration estimated by the CNN model.

MonthR2RMSEBiasMAPEMAE
10.9150.1230.00823.5190.117
20.8840.1480.01824.9970.124
30.8930.1210.00421.4540.153
40.9140.1160.00720.0200.102
50.9180.1260.00518.9460.151
60.9120.1280.01320.6690.118
70.9020.1280.03826.4800.117
80.8990.1340.05026.3620.123
90.8830.1460.03223.7570.127
100.9180.1140.04925.2430.112
110.8950.1240.03525.9660.141
120.8790.1450.05426.6870.119
Mean0.9010.1290.01623.6750.125

Figure 7 shows the RMSE and R2 performance comparison between the CNN model and the SVR method on 12 months Chla concentration estimation. We can see that the R2 curve achieved by the CNN model is above the curve of the SVR method, indicating that CNN can outperform SVR over all the 12-month predictions. Moreover, the RMSE achieved by CNN are also consistently lower than SVR. The mean RMSE and R2 of the 12 months are (0.129, 0.901) and (0.175, 0.828) for CNN and SVR, respectively. Figures 4(c) and 5 shows that the color of some case 2 water areas are much deeper red than Figs. 4(a) and 5, i.e., Yellow Sea and East China Sea (red rectangle 1), English Channel and the south of North Sea (red rectangle 2), and the east of Bering Sea (red rectangle 3), which means that heavier overestimation happened in the prediction map achieved by the SVR method than the CNN model. Figure 6 supports the visual analysis above, i.e., although both the CNN model and the SVR method overestimate in the high Chla concentration areas, the SVR method performs much worse. From Fig. 6, we can also see that the CNN model outperforms the SVR method in the low Chla concentration areas.

Fig. 5

Zoomed in regions of the corresponding red rectangle areas in Fig. 4. (a) CNN, (b), OC-CCI, and (c) SVR.

JARS_14_3_034520_f005.png

Fig. 6

The scatter plot between (a) the Chla concentration values estimated by the CNN model and the ground-truth OC-CCI Chla concentration values. (b) The values achieved by the SVR method and the OC-CCI values. The SVR approach tends to overestimate when the Chla concentration is high, which is consistent with Figs. 4 and 5 that the SVR results have higher values than the ground truth in high Chla concentration areas.

JARS_14_3_034520_f006.png

Fig. 7

The performance measures comparison of CNN and SVR for 12 months Chla concentration estimation based on RMSE and R2.

JARS_14_3_034520_f007.png

5.

Discussion

Based on the above experimental results, we can summarize that the CNN approach can better learn the complex relationship between the Rrs and the OC-CCI Chla values than SVR, demonstrating the possibility that the CNN may be used for building an end-to-end approach for efficient Chla concentration estimation. Song et al.52 proposed a novel inversion method for rough surface parameters using CNN, which microwave images of rough surfaces are used for training. In Ref. 53, a CNN-based method is employed to establish the solar flare forecasting model, in which forecasting patterns can be learned from line-of-sight magnetograms of solar active regions. Line-of-sight magnetograms of active regions observed by SOHO/MDI and SDO/HMI, and the corresponding soft x-ray solar flares observed by GOES generate the dataset for training. Previously CNN has been used for rough surface parameters and solar flare inversion. Here we demonstrate that it can also be used for Chla concentration inversion by learning the complex nonlinear relationship between the Rrs and the OC-CCI Chla values.

CNNs have a large number of hyperparameters and in this paper the patch size, kernel size, number of kernel, neuron number in the fully connected layer, number of layers, and parameters for training need to tune to obtain good performance for estimating global Chla concentration. Adopting grid search method to obtain the best combination of hyperparameters is very time-consuming. In this experiment, the hyperparameters are tuned in the flowing strategy. Structure of a CNN mentioned in Fig. 1 is the classical LeNet, and the default hyperparameters in this experiment are based on the parameters in Fig. 1. Number of layers will first to be changed manually from Fig. 1 to generate a few experiment groups with different depths. Then other hyperparameters in each group will be tuned to get the best performance CNN model which owns the structure described in Fig. 2.

In this paper, only January 2016 data are used for training and the whole-year 12-month data for testing. Still, CNN works very well, therefore CNN has strong robustness and generalization capability, indicating that it may be used for predicting long-term Chla concentration without the need to fine tune using the new observations. The mean R2 of CNN is 0.901, which is very close to OC-CCI Chla concentration, suggesting that using MODIS data only, the CNN approach can approximate the OC-CCI products, which is probably due to the high learning capability of the CNN that can fully take advantage of the information in MODIS by exploiting the complex nonlinear relationship between the Rrs and the OC-CCI Chla values. In addition, the OC-CCI Chla concentration dataset is adopted as the ground truth because it is well-recognized Chla concentration product that is produced by assimilating different satellite data through a complex data processing steps. The GPU Nvidia Quadro M2000 is used in this study, with the GPU memory of 4 GB, NVIDIA CUDA cores of 768, and boost clock of 1180 MHz. A more powerful GPU like Nvidia GeForce GTX 1080 Ti with the GPU memory of 11 GB, NVIDIA CUDA cores of 3584, and boost clock of 1582 MHz will help deal with big data. With better GPU, more GPUs, and batch processing during the prediction stage, we can further shorten the processing time.

Although the mean RMSE and R2 of CNN are acceptable, overestimation of CNN can be found from Figs. 4(a), 5, and 6(a) in the high Chla concentration areas, which may be caused by the imbalance quantities of the case 1 and the case 2 waters training samples. During sample selection, we scan the whole global Chla image to train the CNN model, resulting in the case 1 samples are much more than case 2 samples. More comprehensive studies, like combining sea surface temperature and digital bathymetry model data as input, are required to in the next stage to solve the overestimation problem. Although OC-CCI is the most accessible and reliable Chla concentration product, uncertainties may exist that will affect the performance of model. In the future, if more reliable data appear, further studies will be conducted to train and evaluate the CNN model.

6.

Conclusion

In this study, a CNN method was applied to MODIS Rrs images to estimate global Chla concentration. The CNN took the patches of four MODIS Rrs images as input and generated the Chla concentration directly. The OC-CCI Chla concentration image was used as ground-truth for training. A total of 12 monthly global Chla concentration images were produced and the generation of each image takes about 2 h. Qualitative and quantitative analyses were used for evaluation and comparison analysis, and the results implied that the CNN constitutes an accurate, fast, and robust method for the estimation of the global Chla concentration. Considering the big data characteristic of remotely sensed global Chla data, these characteristics of the CNN model is of great importance.

7.

Appendix

All experiments are conducted on a 64 bits Intel Xeon E5-2640 v3 workstation with 2.6 GHz of clock and 32 GB of RAM memory. A GPU Nvidia Quadro M2000 with 4 GB memory and version 8.0 CUDA. Caffe,54 a popular deep learning framework especially for CNN which is short for convolutional architecture for fast feature embedding, under Ubuntu 14.04 LTS is used in this paper. Caffe provides a convenient way to implement the CNN models by defining the architecture of CNN such as the number of layers, the type of layer and the strategy for optimization. The preprocessing of the data and the numerically assessment are performed under MATLAB R2014a. The data type of CNN in Caffe for reading image patches and their corresponding Chla values is of the HDF5 format instead of the default Lightning Memory-Mapped Database format to achieve easily storage of multichannel images and float labels.

Acknowledgments

We would like to thank the anonymous reviewers for their valuable comments, which helped to improve the quality of this paper. The authors declare no conflicts of interest.

References

1. 

M. J. Behrenfeld and P. G. Falkowski, “Photosynthetic rates derived from satellite-based chlorophyll concentration,” Limnol. Oceanogr., 42 (1), 1 –20 (1997). https://doi.org/10.4319/lo.1997.42.1.0001 LIOCAH 0024-3590 Google Scholar

2. 

M.-E. Carr et al., “A comparison of global estimates of marine primary production from ocean color,” Deep Sea Res. Part II, 53 (5–7), 741 –770 (2006). https://doi.org/10.1016/j.dsr2.2006.01.028 Google Scholar

3. 

R. P. Stumpf, “Applications of Satellite Ocean color sensors for monitoring and predicting harmful algal blooms,” Hum. Ecol. Risk Assess., 7 (5), 1363 –1368 (2001). https://doi.org/10.1080/20018091095050 Google Scholar

4. 

Y.-J. Park, K. Ruddick and G. Lacroix, “Detection of algal blooms in European waters based on satellite chlorophyll data from MERIS and MODIS,” Int. J. Remote Sens., 31 (24), 6567 –6583 (2010). https://doi.org/10.1080/01431161003801369 IJSEDK 0143-1161 Google Scholar

5. 

P. Shanmugam, “A new bio-optical algorithm for the remote sensing of algal blooms in complex ocean waters,” J. Geophys. Res. Oceans, 116 (C4), (2011). https://doi.org/10.1029/2010JC006796 JGRCEY 0148-0227 Google Scholar

6. 

D. Blondeau-Patissier et al., “A review of ocean color remote sensing methods and statistical techniques for the detection, mapping and analysis of phytoplankton blooms in coastal and open oceans,” Prog. Oceanogr., 123 123 –144 (2014). https://doi.org/10.1016/j.pocean.2013.12.008 POCNA8 0079-6611 Google Scholar

7. 

J. I. Goes et al., “Warming of the Eurasian landmass is making the Arabian sea more productive,” Science, 308 (5721), 545 –547 (2005). https://doi.org/10.1126/science.1106610 SCIEAS 0036-8075 Google Scholar

8. 

C. R. McClain, “A decade of satellite ocean color observations,” Annu. Rev. Mar. Sci., 1 19 –42 (2009). https://doi.org/10.1146/annurev.marine.010908.163650 1941-1405 Google Scholar

9. 

W. W. Gregg and N. W. Casey, “Modeling coccolithophores in the global oceans,” Deep Sea Res. Part II, 54 (5–7), 447 –477 (2007). https://doi.org/10.1016/j.dsr2.2006.12.007 Google Scholar

10. 

G. Lacroix et al., “Validation of the 3D biogeochemical model MIRO&Co with field nutrient and phytoplankton data and MERIS-derived surface chlorophyll a images,” J. Mar. Syst., 64 (1–4), 66 –88 (2007). https://doi.org/10.1016/j.jmarsys.2006.01.010 JMASE5 0924-7963 Google Scholar

11. 

M. T. McCann, K. H. Jin and M. Unser, “Convolutional neural networks for inverse problems in imaging: a review,” IEEE Signal Process. Mag., 34 (6), 85 –95 (2017). https://doi.org/10.1109/MSP.2017.2739299 Google Scholar

12. 

Y. LeCun et al., “Gradient-based learning applied to document recognition,” Proc. IEEE, 86 (11), 2278 –2324 (1998). https://doi.org/10.1109/5.726791 IEEPAD 0018-9219 Google Scholar

13. 

A. Krizhevsky, I. Sutskever and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Adv. Neural Inf. Process. Syst., 1097 –1105 (2012). Google Scholar

14. 

C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 1 –9 (2015). https://doi.org/10.1109/CVPR.2015.7298594 Google Scholar

15. 

Y. LeCun, Y. Bengio and G. Hinton, “Deep learning,” Nature, 521 (7553), 436 (2015). https://doi.org/10.1038/nature14539 Google Scholar

16. 

K. He et al., “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 770 –778 (2016). https://doi.org/10.1109/CVPR.2016.90 Google Scholar

17. 

L. ZhangL. Zhang and B. Du, “Deep learning for remote sensing data: a technical tutorial on the state of the art,” IEEE Geosci. Remote Sens. Mag., 4 (2), 22 –40 (2016). https://doi.org/10.1109/MGRS.2016.2540798 Google Scholar

18. 

X. X. Zhu et al., “Deep learning in remote sensing: a comprehensive review and list of resources,” IEEE Geosci. Remote Sens. Mag., 5 (4), 8 –36 (2017). https://doi.org/10.1109/MGRS.2017.2762307 Google Scholar

19. 

J. E. Ball, D. T. Anderson and C. S. Chan, “Comprehensive survey of deep learning in remote sensing: theories, tools, and challenges for the community,” J. Appl. Remote Sens., 11 (4), 042609 (2017). https://doi.org/10.1117/1.JRS.11.042609 Google Scholar

20. 

S. Lavender, T. Jackson and S. Sathyendranath, “The ocean colour climate change initiative: Merging ocean colour observations seamlessly,” Ocean Challenge, 21 29 –31 (2015). OCCHEZ Google Scholar

21. 

H. R. Gordon and A. Y. Morel, “Remote assessment of ocean color for interpretation of satellite visible imagery: a review,” Lect. Notes Coast. Estuarine Stud., 4 (1983). https://doi.org/10.1029/LN004 Google Scholar

22. 

J. E. O’Reilly et al., “Ocean color chlorophyll algorithms for SeaWiFS,” J. Geophys. Res. Oceans, 103 (C11), 24937 –24953 (1998). https://doi.org/10.1029/98JC02160 JGRCEY 0148-0227 Google Scholar

23. 

C. Hu, Z. Lee and B. Franz, “Chlorophyll a algorithms for oligotrophic oceans: A novel approach based on three-band reflectance difference,” J. Geophys. Res. Oceans, 117 (C1), C01011 (2012). https://doi.org/10.1029/2011JC007395 JGRCEY 0148-0227 Google Scholar

24. 

Plymouth Marine Laboratory, “Ocean colour climate change initiative product user guide,” (2016) http://www.esa-oceancolour-cci.org/?q=webfm_send/317 Google Scholar

25. 

M. W. Matthews, “A current review of empirical procedures of remote sensing in inland and near-coastal transitional waters,” Int. J. Remote Sens., 32 (21), 6855 –6899 (2011). https://doi.org/10.1080/01431161.2010.512947 IJSEDK 0143-1161 Google Scholar

26. 

D. Odermatt et al., “Review of constituent retrieval in optically deep and complex waters from satellite imagery,” Remote Sens. Environ., 118 116 –126 (2012). https://doi.org/10.1016/j.rse.2011.11.013 Google Scholar

27. 

L. Wang, K. A. Scott and D. A. Clausi, “Sea ice concentration estimation during freeze-up from sar imagery using a convolutional neural network,” Remote Sens., 9 (5), 408 (2017). https://doi.org/10.3390/rs9050408 RSEND3 Google Scholar

28. 

L. Wang et al., “Sea ice concentration estimation during melt from dual-pol SAR scenes using deep convolutional neural networks: a case study,” IEEE Trans. Geosci. Remote Sens., 54 (8), 4524 –4533 (2016). https://doi.org/10.1109/TGRS.2016.2543660 IGRSD2 0196-2892 Google Scholar

29. 

F. Hu et al., “Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery,” Remote Sens., 7 (11), 14680 –14707 (2015). https://doi.org/10.3390/rs71114680 RSEND3 Google Scholar

30. 

K. Nogueira, O. A. Penatti and J. A. dos Santos, “Towards better exploiting convolutional neural networks for remote sensing scene classification,” Pattern Recognit., 61 539 –556 (2017). https://doi.org/10.1016/j.patcog.2016.07.001 Google Scholar

31. 

E. Maggiori et al., “Convolutional neural networks for large-scale remote-sensing image classification,” IEEE Trans. Geosci. Remote Sens., 55 (2), 645 –657 (2017). https://doi.org/10.1109/TGRS.2016.2612821 IGRSD2 0196-2892 Google Scholar

33. 

H. R. Gordon and M. Wang, “Influence of oceanic whitecaps on atmospheric correction of ocean-color sensors,” Appl. Opt., 33 (33), 7754 –7763 (1994). https://doi.org/10.1364/AO.33.007754 APOPAI 0003-6935 Google Scholar

34. 

Z. Ahmad et al., “New aerosol models for the retrieval of aerosol optical thickness and normalized water-leaving radiances from the SeaWiFS and MODIS sensors over coastal regions and open oceans,” Appl. Opt., 49 (29), 5545 –5560 (2010). https://doi.org/10.1364/AO.49.005545 APOPAI 0003-6935 Google Scholar

35. 

S. W. Bailey, B. A. Franz and P. J. Werdell, “Estimation of near-infrared water-leaving reflectance for satellite ocean color data processing,” Opt. Express, 18 (7), 7521 –7527 (2010). https://doi.org/10.1364/OE.18.007521 OPEXFF 1094-4087 Google Scholar

37. 

J. W. Campbell, “The lognormal distribution as a model for bio-optical variability in the sea,” J. Geophys. Res. Oceans, 100 (C7), 13237 –13254 (1995). https://doi.org/10.1029/95JC00458 JGRCEY 0148-0227 Google Scholar

38. 

P. Cipollini et al., “Retrieval of sea water optically active parameters from hyperspectral data by means of generalized radial basis function neural networks,” IEEE Trans. Geosci. Remote Sens., 39 (7), 1508 –1524 (2001). https://doi.org/10.1109/36.934081 IGRSD2 0196-2892 Google Scholar

39. 

L. Xu, J. Li and A. Brenning, “A comparative study of different classification techniques for marine oil spill identification using radarsat-1 imagery,” Remote Sens. Environ., 141 14 –23 (2014). https://doi.org/10.1016/j.rse.2013.10.012 Google Scholar

40. 

N. Al-Naimi et al., “Evaluation of satellite retrievals of chlorophyll-a in the Arabian Gulf,” Remote Sens., 9 (3), 301 (2017). https://doi.org/10.3390/rs9030301 RSEND3 Google Scholar

41. 

V. Mnih, “Machine learning for aerial image labeling,” University of Toronto, (2013). Google Scholar

42. 

J. Long, E. Shelhamer and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit., 3431 –3440 (2015). https://doi.org/10.1109/CVPR.2015.7298965 Google Scholar

43. 

R. Alshehhi et al., “Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks,” ISPRS J. Photogramm. Remote Sens., 130 139 –149 (2017). https://doi.org/10.1016/j.isprsjprs.2017.05.002 IRSEE9 0924-2716 Google Scholar

44. 

V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. 27th Int. Conf. Mach. Learn., 807 –814 (2010). Google Scholar

45. 

N. Mboga et al., “Detection of informal settlements from VHR images using convolutional neural networks,” Remote Sens., 9 (11), 1106 (2017). https://doi.org/10.3390/rs9111106 Google Scholar

46. 

L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proc. COMPSTAT, 177 –186 (2010). Google Scholar

47. 

Y. Bengio, “Practical recommendations for gradient-based training of deep architectures,” Lect. Notes Comput. Sci., 7700 437 –478 (2012). https://doi.org/10.1007/978-3-642-35289-8_26 LNCSD9 0302-9743 Google Scholar

48. 

L. G. Vilas, E. Spyrakos and J. M. T. Palenzuela, “Neural network estimation of chlorophyll a from MERIS full resolution data for the coastal waters of Galician Rias (NW Spain),” Remote Sens. Environ., 115 (2), 524 –535 (2011). https://doi.org/10.1016/j.rse.2010.09.021 Google Scholar

49. 

D. A. Siegel et al., “Regional to global assessments of phytoplankton dynamics from the SeaWiFS mission,” Remote Sens. Environ., 135 77 –91 (2013). https://doi.org/10.1016/j.rse.2013.03.025 Google Scholar

50. 

M. Woźniak et al., “Empirical model for phycocyanin concentration estimation as an indicator of cyanobacterial bloom in the optically complex coastal waters of the Baltic Sea,” Remote Sens., 8 (3), 212 (2016). https://doi.org/10.3390/rs8030212 RSEND3 Google Scholar

51. 

S. Dasgupta, R. P. Singh and M. Kafatos, “Comparison of global chlorophyll concentrations using MODIS data,” Adv. Space Res., 43 (7), 1090 –1100 (2009). https://doi.org/10.1016/j.asr.2008.11.009 ASRSDW 0273-1177 Google Scholar

52. 

T. Song et al., “Inversion of rough surface parameters from SAR images using simulation-trained convolutional neural networks,” IEEE Geosci. Remote Sens. Lett., 15 1130 –1134 (2018). https://doi.org/10.1109/LGRS.2018.2822821 Google Scholar

53. 

X. Huang et al., “Deep learning based solar flare forecasting model. I. Results for line-of-sight magnetograms,” Astrophys. J., 856 (1), 7 (2018). https://doi.org/10.3847/1538-4357/aaae00 ASJOAB 0004-637X Google Scholar

54. 

Y. Jia et al., “Caffe: convolutional architecture for fast feature embedding,” in Proc. 22nd ACM Int. Conf. Multimedia, 675 –678 (2014). Google Scholar

Biography

Bowen Yu received his BSc, MEng, and PhD degrees from China University of Geosciences, Beijing, China, in 2012, 2015, and 2019, respectively, all in remote sensing. He is currently an engineer at China Academy of Electronics and Information Technology. His research interests are in the field of deep learning for remote sensing application.

Linlin Xu received his BEng and MSc degrees in geomatics engineering from the China University of Geosciences, Beijing, China, in 2007 and 2010, respectively, and his PhD in geography from the University of Waterloo, Waterloo, Ontario, Canada, in 2014. His research interests include machine learning, deep learning, and remote sensing image processing, especially hyperspectral, and synthetic aperture radar image processing.

Junhuan Peng received his PhD in geodesy from the Wuhan University, Wuhan, China, in 2003. Currently, he is a professor with the School of Land Science and Techniques, China University of Geosciences, Beijing, China. His research interests are in the areas of spatial statistics, robust estimation and their associated application in surveying engineering, image geodesy, remote sensing, and satellite geodesy.

Zhongzheng Hu received his BEng and MEng from China University of Geosciences, Beijing, China, in 2015 and 2019, both in surveying and mapping. Currently, he is an engineer at China Center for Resources Satellite Data and Application. His research interests are in the field of deep learning for remote sensing application.

Alexander Wong received his BASc degree in computer engineering, his MASc degree in electrical and computer engineering, and his PhD in systems design engineering from the University of Waterloo, Waterloo, Ontario, Canada, in 2005, 2007, and 2010, respectively. Currently, he is an assistant professor with the Department of Systems Design Engineering, University of Waterloo. He has authored refereed journal and conference papers, as well as patents, in various fields, such as computer vision, graphics, image processing, multimedia systems, and wireless communications. His research interests include image processing, computer vision, pattern recognition, and cognitive radio networks, with a focus on biomedical and remote sensing image processing and analysis such as image registration, image denoising and reconstruction, image super-resolution, image segmentation, tracking, and image and video coding and transmissions. He was a recipient of the Outstanding Performance Award, the Engineering Research Excellence Award, the Early Researcher Award from the Ministry of Economic Development and Innovation, the Best Paper Award by the Canadian Image Processing and Pattern Recognition Society, and the Alumni Gold Medal.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Bowen Yu, Linlin Xu, Jun-huan Peng, Zhongzheng Hu, and Alexander Wong "Global chlorophyll-a concentration estimation from moderate resolution imaging spectroradiometer using convolutional neural networks," Journal of Applied Remote Sensing 14(3), 034520 (8 September 2020). https://doi.org/10.1117/1.JRS.14.034520
Received: 7 May 2020; Accepted: 7 August 2020; Published: 8 September 2020
Lens.org Logo
CITATIONS
Cited by 13 scholarly publications.
Advertisement
Advertisement
KEYWORDS
MODIS

Data modeling

Image resolution

Satellites

Convolution

Remote sensing

Convolutional neural networks

Back to Top