TY - JOUR
T1 - Comparison of Machine Learning Techniques in Inferring Phytoplankton Size Classes
AU - Hu, Shuibo
AU - Liu, Huizeng
AU - Zhao, Wenjing
AU - Shi, Tiezhu
AU - Hu, Zhongwen
AU - Li, Qingquan
AU - Wu, Guofeng
N1 - The research was supported by the National Key R&D Program of China (No. 2017YFC0506200), Basic Research Program of Shenzhen Science and Technology Innovation Committee (No. JCYJ20151117105543692), National Natural Science Foundation of China grants (Grant Nos. 41606199, 41506202), The General Financial Grant from the China Postdoctoral Science Foundation (No. 2016M592521), the Open Project Program of State Key Laboratory of Tropical Oceanography, South China Sea Institute of Oceanology, Chinese Academy of Sciences (Project No. LTO1611), and Shenzhen Future Industry Development Funding Program (No. 201507211219247860). We thank NASA GSFC for providing in situ data and SeaWiFS products, thank NOAA NODC for providing SST data, and thank CERSAT for providing wind stress products.
Publisher Copyright:
© 2018 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2018/3
Y1 - 2018/3
N2 - The size of phytoplankton not only influences its physiology, metabolic rates and marine food web, but also serves as an indicator of phytoplankton functional roles in ecological and biogeochemical processes. Therefore, some algorithms have been developed to infer the synoptic distribution of phytoplankton cell size, denoted as phytoplankton size classes (PSCs), in surface ocean waters, by the means of remotely sensed variables. This study, using the NASA bio-Optical Marine Algorithm Data set (NOMAD) high performance liquid chromatography (HPLC) database, and satellite match-ups, aimed to compare the effectiveness of modeling techniques, including partial least square (PLS), artificial neural networks (ANN), support vector machine (SVM) and random forests (RF), and feature selection techniques, including genetic algorithm (GA), successive projection algorithm (SPA) and recursive feature elimination based on support vector machine (SVM-RFE), for inferring PSCs from remote sensing data. Results showed that: (1) SVM-RFE worked better in selecting sensitive features; (2) RF performed better than PLS, ANN and SVM in calibrating PSCs retrieval models; (3) machine learning techniques produced better performance than the chlorophyll-a based three-component method; (4) sea surface temperature, wind stress, and spectral curvature derived from the remote sensing reflectance at 490, 510, and 555 nm were among the most sensitive features to PSCs; and (5) the combination of SVM-RFE feature selection techniques and random forests regression was recommended for inferring PSCs. This study demonstrated the effectiveness of machine learning techniques in selecting sensitive features and calibrating models for PSCs estimations with remote sensing.
AB - The size of phytoplankton not only influences its physiology, metabolic rates and marine food web, but also serves as an indicator of phytoplankton functional roles in ecological and biogeochemical processes. Therefore, some algorithms have been developed to infer the synoptic distribution of phytoplankton cell size, denoted as phytoplankton size classes (PSCs), in surface ocean waters, by the means of remotely sensed variables. This study, using the NASA bio-Optical Marine Algorithm Data set (NOMAD) high performance liquid chromatography (HPLC) database, and satellite match-ups, aimed to compare the effectiveness of modeling techniques, including partial least square (PLS), artificial neural networks (ANN), support vector machine (SVM) and random forests (RF), and feature selection techniques, including genetic algorithm (GA), successive projection algorithm (SPA) and recursive feature elimination based on support vector machine (SVM-RFE), for inferring PSCs from remote sensing data. Results showed that: (1) SVM-RFE worked better in selecting sensitive features; (2) RF performed better than PLS, ANN and SVM in calibrating PSCs retrieval models; (3) machine learning techniques produced better performance than the chlorophyll-a based three-component method; (4) sea surface temperature, wind stress, and spectral curvature derived from the remote sensing reflectance at 490, 510, and 555 nm were among the most sensitive features to PSCs; and (5) the combination of SVM-RFE feature selection techniques and random forests regression was recommended for inferring PSCs. This study demonstrated the effectiveness of machine learning techniques in selecting sensitive features and calibrating models for PSCs estimations with remote sensing.
KW - Feature selection
KW - Machine learning
KW - Phytoplankton size classes
KW - Random forest
KW - Remote sensing
UR - http://www.scopus.com/inward/record.url?scp=85044204949&partnerID=8YFLogxK
U2 - 10.3390/rs10030191
DO - 10.3390/rs10030191
M3 - Journal article
AN - SCOPUS:85044204949
SN - 2072-4292
VL - 10
JO - Remote Sensing
JF - Remote Sensing
IS - 3
M1 - 191
ER -