Unsupervised machine learning methods were applied on multivariate geophysical and geochemical datasets of ocean floor sediment cores collected from the South China Sea. The well-preserved and continuous core samples comprising high resolution Cenozoic sediment records enable scientists to carry out paleoenvironment studies in detail. Bayesian age-depth chronological models constructed from biostratigraphic control points for the drilling sites are applied on cluster boundaries generated from two popular unsupervised learning methods: K-means and random forest. The unsupervised learning methods experimented have produced compact and unambiguous clusters from the datasets, indicating that previously unknown data patterns can be revealed when all variables from the datasets are taken into account simultaneously. A study of synchroneity of past events represented by the cluster boundaries across geographically separated ocean drilling sites is achieved through converting the fixed depths of cluster boundaries into chronological ranges represented by Gaussian density plots which are then compared with known past events in the region. A Gaussian density peak at around 7.2 Ma has been identified from results of all three sites and it is suggested to coincide with the initiation of the East Asian monsoon. Contrary to traditional statistical approach, a priori assumptions are not required for unsupervised learning, and the clustering results serve as a novel data-driven proxy for studying the complex and dynamic processes of the paleoenvironment surrounding the ocean sediment. This work serves as a pioneering approach to extract valuable information of regional events and opens up a systematic and objective way to study the vast global ocean sediment datasets.
Scopus Subject Areas
- Earth and Planetary Sciences(all)
- machine learning
- ocean sediments
- unsupervised classification