TY - JOUR
T1 - Effectiveness of Semi-Supervised Learning and Multi-Source Data in Detailed Urban Landuse Mapping with a Few Labeled Samples
AU - Sun, Bo
AU - Zhang, Yang
AU - Zhou, Qiming
AU - Zhang, Xinchang
N1 - Funding Information:
This work was supported by the Natural Science Foundation of China (NSFC) General Program [Grant number: 41971386], the Hong Kong Research Grant Council (RGC) General Research Fund [Grant number: HKBU 12301820] and Shenzhen Science and Technology Innovation Committee General Project [Grant number: JCYJ20210324101406019].
Publisher Copyright:
© 2022 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2022/2
Y1 - 2022/2
N2 - Detailed urban landuse information plays a fundamental role in smart city management. A sufficient sample size has been identified as a very crucial pre-request in machine learning algorithms for urban landuse classification. However, it is often difficult to recognize and label landuse categories from remote sensing images alone. Alternatively, field investigation is time-consuming with a high demand in human resources and monetary cost. Therefore, previous studies on urban landuse classification have often relied on a small size of labeled samples with very uneven spatial distribution. This study aims to explore the effectiveness of a semi-supervised classification framework with multi-source data for detailed urban landuse classification with a few labeled samples. A disagreement-based semi-supervised learning approach, the co-forest, was employed and compared with traditional supervised methods (e.g., random forest and XGBoost). Multi-source geospatial data were utilized including optical and nighttime light remote sensing and geospatial big data, which present the physical and socio-economic features of landuse categories. Taking urban landuse classification in Shenzhen City as a case, results show that the classification accuracy of the semi-supervised method are generally on par with that of traditional supervised methods, and less labeled samples are needed to achieve a comparable result under different training set ratios. Given a small sample size, the accuracy tends to be stable with training samples no less than 5% in total. Our results also indicate that the classification accuracy by using multi-source data is significantly higher than that with any single data source being applied. Among these data, map POI and high-resolution optical remote sensing data make larger contributions on the classification, followed by mobile data and nighttime light remote sensing data.
AB - Detailed urban landuse information plays a fundamental role in smart city management. A sufficient sample size has been identified as a very crucial pre-request in machine learning algorithms for urban landuse classification. However, it is often difficult to recognize and label landuse categories from remote sensing images alone. Alternatively, field investigation is time-consuming with a high demand in human resources and monetary cost. Therefore, previous studies on urban landuse classification have often relied on a small size of labeled samples with very uneven spatial distribution. This study aims to explore the effectiveness of a semi-supervised classification framework with multi-source data for detailed urban landuse classification with a few labeled samples. A disagreement-based semi-supervised learning approach, the co-forest, was employed and compared with traditional supervised methods (e.g., random forest and XGBoost). Multi-source geospatial data were utilized including optical and nighttime light remote sensing and geospatial big data, which present the physical and socio-economic features of landuse categories. Taking urban landuse classification in Shenzhen City as a case, results show that the classification accuracy of the semi-supervised method are generally on par with that of traditional supervised methods, and less labeled samples are needed to achieve a comparable result under different training set ratios. Given a small sample size, the accuracy tends to be stable with training samples no less than 5% in total. Our results also indicate that the classification accuracy by using multi-source data is significantly higher than that with any single data source being applied. Among these data, map POI and high-resolution optical remote sensing data make larger contributions on the classification, followed by mobile data and nighttime light remote sensing data.
KW - Multi-source geospatial data
KW - Sampling strategy
KW - Semi-supervised classification
KW - Small sample learning
KW - Urban landuse
UR - http://www.scopus.com/inward/record.url?scp=85123687957&partnerID=8YFLogxK
U2 - 10.3390/rs14030648
DO - 10.3390/rs14030648
M3 - Journal article
SN - 2072-4292
VL - 14
JO - Remote Sensing
JF - Remote Sensing
IS - 3
M1 - 648
ER -