Predicting the distribution of organic carbon content in surface sediments of the eastern China seas using random forest algorithm
-
摘要:
厘清中国边缘海沉积有机碳的分布特征和控制因素有助于建立东亚边缘海有机碳循环模型及其“源-汇”格局。当前中国东部边缘海有机碳分布图的绘制,主要是通过数学插值对采样点之间进行填充。该方法一方面极大地受限于采样站位的位置和数量,另一方面通过数学插值填图也忽视了样品与海水理化性质、海底地形和洋流等环境因素的差异,将复杂的地质问题简单化。机器学习方法能够从高维和复杂数据中提取关键信息,构建环境属性特征和预测变量的映射关系。本文借助机器学习方法中常用的随机森林算法,通过对405个海洋沉积物有机碳数据与50个环境属性特征映射关系的学习,预测了中国东部边缘海表层沉积物的有机碳含量。相比根据同样数量样品由克里金插值计算绘制的有机碳分布图,随机森林算法对沉积物有机碳含量预测结果的平均绝对误差、均方根误差、最大残差等误差评价指标均更小,十折交叉检验的R2达到0.6,表现出较高的拟合精度。尤其对于采样密度较低或因采样困难存在样品空缺的海区,随机森林算法能更准确的预测表层沉积物有机碳含量,体现出更符合实际情况的预测潜力和外推性优势。本文所建立的随机森林算法对于未来其他海洋沉积物地球化学指标的预测也同样具有借鉴作用,对于中国东部边缘海的资源调查和环境保护具有重要的现实意义。
Abstract:Clarifying the distribution characteristics and controlling factors of sedimentary organic carbon in China marginal seas is crucial for establishing an organic carbon cycle model for the East Asian marginal seas and its “source-to-sink” pattern. Currently, the distribution map of organic carbon in the Eastern China marginal seas is constructed mainly based on mathematical interpolation of existing data. However, this method is significantly limited by the location and quantity of sampling stations, and in addition, the mathematical interpolation mapping neglects the differences between the samples and environmental factors such as seawater physicochemical properties, seabed topography, and ocean currents, thus oversimplifying the complex geological issues. Machine learning methods can extract key information from high-dimensional and complex data and establish mapping relationships between geological property features and predictive variables. In this study, the commonly used Random Forest (RF) algorithm in machine learning was employed to predict the organic carbon content in the surface sediments of the Eastern China marginal seas by learning the mapping relationship among 405 marine sediment organic carbon data and 50 geological property features. Compared to the organic carbon distribution map generated by the Kriging interpolation calculations based on the same number of samples, the RF algorithm showed smaller errors of evaluation indicators, including mean absolute error, root mean square error, and maximum residual error. The ten-fold cross-validation R2 reached 0.60, indicating high fitting accuracy. Notably, for regions with low sampling density or missing data due to sampling difficulties, the RF algorithm demonstrated a superior predictive accuracy for surface sediment organic carbon content, reflecting its potential for more realistic predictions and extrapolation advantages. The RF model established in this study provided valuable insights for predicting other geochemical indicators of marine sediments in the future and holds significant practical implications for resource investigation and environmental protection in the Eastern China marginal seas.
-
Key words:
- spatial interpolation /
- machine learning /
- random forest /
- surface sediment /
- organic carbon contents
-
-
表 1 用于执行初始预测的50个环境属性特征
Table 1. The 50 environmental attribute features used for initial prediction
预测特征 特征名称 1 GL_DIST_TO_COAST_KM_ETOPO.5m.ggg 离海岸距离 2 GL_ELEVATION_M_ASL_SRTM15+V2.5m.ggg 海拔 3 GL_LATITUDE_DD.5m.ggg 纬度 4 GL_LONGITUDE.5m.ggg 经度 5 GL_RIVERMOUTH_CO2_TGCYR-1_ORNL.5m.ggg 河口CO2 6 GL_RIVERMOUTH_DOC_TGCYR-1_ORNL.5m.ggg 河口DOC 7 GL_RIVERMOUTH_HCO3_TGCYR-1_ORNL.5m.ggg 河口HCO3− 8 GL_RIVERMOUTH_POC_TGCYR-1_ORNL.5m.ggg 河口POC 9 GL_RIVERMOUTH_TSS_TGYR-1_ORNL.5m.ggg 河口TSS 10 GL_TOT_SED_THICK_M_CRUST1_NOAA.5m.ggg 沉积物厚度(据CRUST1模型) 11 GL_TOT_SED_THICK_M_GLOBSED_Straume.5m.ggg 沉积物厚度(据GLOBSED模型) 12 SF_AVG_SEA_DENSITY_KGM3_DECADAL_MEAN_woa13x.5m.ggg 平均海水密度(十年平均) 13 SF_AVG_SEA_SOUNDSPEED_MS_DECADAL_MEAN_woa13x.5m.ggg 平均海水声波速度(十年平均) 14 SF_CALSIL_FRAC_DUTKIEWICZ.5m.ggg 硅酸钙盐含量 15 SF_CLAYFRACTION_FRAC_NGDC.5m.ggg 黏土含量 16 SF_CURRENT_EAST_MS_2012_12_HYCOMx.5m.ggg 东向洋流的速度 17 SF_CURRENT_NORTH_MS_2012_12_HYCOMx.5m.ggg 北向洋流的速度 18 SF_GRAINSIZE_D16_MM_NGDC.5m.ggg D16粒径 19 SF_GRAINSIZE_D50_MM_NGDC.5m.ggg D50粒径 20 SF_GRAINSIZE_D84_MM_NGDC.5m.ggg D84粒径 21 SF_SEA_CONDUCTIVITY_SM_DECADAL_MEAN_woa13v2x.5m.ggg 海水电导率(十年平均) 22 SF_SEA_NITRATE_MCML_DECADAL_MEAN_woa13v2x.5m.ggg 海水硝酸盐浓度 23 SF_SEA_OXYGEN_MLL_DECADAL_MEAN_woa13v2x.5m.ggg 海水溶解氧浓度 24 SF_SEA_OXYGEN_PCTSAT_DECADAL_MEAN_woa13v2x.5m.ggg 海水溶解氧饱和度 25 SF_SEA_PHOSPHATE_MCML_DECADAL_MEAN_woa13v2x.5m.ggg 海水磷酸盐浓度 26 SF_SEA_SALINITY_PSU_DECADAL_MEAN_woa13v2x.5m.ggg 海水盐度 27 SF_SEA_SEA_OXYGEN_UTILIZATION_MOLM3_DECADAL_MEAN_woa13v2x.5m.ggg 海水氧利用率 28 SF_SEA_SILICATE_MCML_DECADAL_MEAN_woa13v2x.5m.ggg 海水硅酸盐浓度 29 SF_SEA_TEMPERATURE_C_DECADAL_MEAN_woa13v2x.5m.ggg 海水温度 30 SF_TERBIO_FRAC_DUTKIEWICZ.5m.ggg 陆源生物硅 31 SF_TOC_PDW.5m.ggg 总有机碳 32 SL_GEOID_M_ABOVE_WGS84_NGA_egm2008.5m.ggg 大地水准面 33 SS_BIOMASS_BACTERIA_LOG10_MGCM2_WEI2010x.5m.ggg 细菌生物量 34 SS_BIOMASS_FISH_LOG10_MGCM2_WEI2010x.5m.ggg 鱼类生物量 35 SS_BIOMASS_INVERTEBRATE_LOG10_MGCM2_WEI2010x.5m.ggg 无脊椎动物生物量 36 SS_BIOMASS_MACROFAUNA_LOG10_MGCM2_WEI2010x.5m.ggg 大型动物生物量 37 SS_BIOMASS_MEGAFAUNA_LOG10_MGCM2_WEI2010x.5m.ggg 巨型动物生物量 38 SS_BIOMASS_MEIOFAUNA_LOG10_MGCM2_WEI2010x.5m.ggg 较小型底栖生物生物量 39 SS_BIOMASS_TOTAL_LOG10_MGCM2_WEI2010x.5m.ggg 总生物量 40 SS_CHLOROPHYLL_LOG_MG_M3_MODIS_Aqua_MISSION_MEANx.5m.ggg 叶绿素 41 SS_DENSITY_KGM-3_SACD_Aquarius_MISSION_MEANx.5m.ggg 平均海水密度 42 SS_PHOTO_AVAIL_RAD_EINSTEIN_M2_DAY_SNPP_VIIRS_MISSION_MEANx.5m.ggg 可利用光照 43 SS_PHYTO_ABSORPTION_443NM_M-1_SNPP_VIIRS_MISSION_MEANx.5m.ggg 光吸收量 44 SS_PIC_LOG_MOL_M3-1_MODIS_Aqua_MISSION_MEANx.5m.ggg PIC 45 SS_POC_LOG_MOL_M3-1_MODIS_Aqua_MISSION_MEANx.5m.ggg POC 46 SS_WAVE_DIRECTION_DEG_2012_12_WAVEWATCH3x.5m.ggg 波向 47 SS_WAVE_HEIGHT_M_2012_12_WAVEWATCH3x.5m.ggg 波高 48 SS_WAVE_PERIOD_S_2012_12_WAVEWATCH3x.5m.ggg 波周期 49 SS_WINDSPEED_MS-1_SACD_Aquarius_MISSION_MEANx.5m.ggg 风速 50 Sediment type 沉积物类型 注:表中加粗的变量表示经过特征选择方法筛选出的38个重要环境特征。 -
[1] Mayer L M. Sedimentary organic matter preservation: an assessment and speculative synthesis-a comment[J]. Marine Chemistry, 1995, 49(2-3):123-126. doi: 10.1016/0304-4203(95)00011-F
[2] Battle M, Bender M L, Tans P P, et al. Global carbon sinks and their variability inferred from atmospheric O2 and δ13C[J]. Science, 2000, 287(5462):2467-2470. doi: 10.1126/science.287.5462.2467
[3] Wright L D, Wiseman W J, Bornhold B D, et al. Marine dispersal and deposition of Yellow River silts by gravity-driven underflows[J]. Nature, 1988, 332(6165):629-632. doi: 10.1038/332629a0
[4] Yang Z S, Liu J P. A unique Yellow River-derived distal subaqueous delta in the Yellow Sea[J]. Marine Geology, 2007, 240(1-4):169-176. doi: 10.1016/j.margeo.2007.02.008
[5] 石学法, 胡利民, 乔淑卿, 等. 中国东部陆架海沉积有机碳研究进展: 来源、输运与埋藏[J]. 海洋科学进展, 2016, 34(3):313-327
SHI Xuefa, HU Limin, QIAO Shuqing, et al. Progress in research of sedimentary organic carbon in the East China Sea: sources, dispersal and sequestration[J]. Advances in Marine Science, 2016, 34(3):313-327.]
[6] 石学法, 吴斌, 乔淑卿, 等. 中国东部近海沉积有机碳的分布、埋藏及碳汇效应[J]. 中国科学:地球科学, 2024, 54(10):3113-3133
SHI Xuefa, WU Bin, QIAO Shuqing, et al. Distribution, burial fluxes and carbon sink effect of sedimentary organic carbon in the eastern China seas[J]. Science China:Earth Sciences, 2024, 67(10):3062-3082.]
[7] 赵美训, 丁杨, 于蒙. 中国边缘海沉积有机质来源及其碳汇意义[J]. 中国海洋大学学报, 2017, 47(9):70-76
ZHAO Meixun, DING Yang, YU Meng. Sources of sedimentary organic matter in China marginal sea surface sediments and implications of carbon sink[J]. Periodical of Ocean University of China, 2017, 47(9):70-76.]
[8] 焦念志, 梁彦韬, 张永雨, 等. 中国海及邻近区域碳库与通量综合分析[J]. 中国科学:地球科学, 2018, 48(11): 1393-1421
JIAO Nianzhi, LIANG Yantao, ZHANG Yongyu, et al. Carbon pools and fluxes in the China Seas and adjacent oceans[J]. Science China Earth Sciences, 2018, 61(11):1535-1563.]
[9] Bao R, McIntyre C, Zhao M X, et al. Widespread dispersal and aging of organic carbon in shallow marginal seas[J]. Geology, 2016, 44(10):791-794. doi: 10.1130/G37948.1
[10] 李军, 胡邦琦, 窦衍光, 等. 中国东部海域泥质沉积区现代沉积速率及其物源控制效应初探[J]. 地质论评, 2012, 58(4):745-756
LI Jun, HU Bangqi, DOU Yanguang, et al. Modern sedimentation rate, budget and supply of the muddy deposits in the East China Seas[J]. Geological Review, 2012, 58(4):745-756.]
[11] Deng B, Zhang J, Wu Y. Recent sediment accumulation and carbon burial in the East China Sea[J]. Global Biogeochemical Cycles, 2006, 20(3):GB3014.
[12] Wu Y, Eglinton T, Yang L Y, et al. Spatial variability in the abundance, composition, and age of organic matter in surficial sediments of the East China Sea[J]. Journal of Geophysical Research: Biogeosciences, 2013, 118(4):1495-1507. doi: 10.1002/2013JG002286
[13] Qiao S Q, Shi X F, Wang G Q, et al. Sediment accumulation and budget in the Bohai Sea, Yellow Sea and East China Sea[J]. Marine Geology, 2017, 390:270-281. doi: 10.1016/j.margeo.2017.06.004
[14] Liu W, Du P J, Zhao Z W, et al. An adaptive weighting algorithm for interpolating the soil potassium content[J]. Scientific Reports, 2016, 6(1):23889. doi: 10.1038/srep23889
[15] 徐述腾, 周永章. 基于深度学习的镜下矿石矿物的智能识别实验研究[J]. 岩石学报, 2018, 34(11):3244-3252
XU Shuteng, ZHOU Yongzhang. Artificial intelligence identification of ore minerals under microscope based on deep learning algorithm[J]. Acta Petrologica Sinica, 2018, 34(11):3244-3252.]
[16] 朱紫怡, 周飞, 王瑀, 等. 基于机器学习的锆石成因分类研究[J]. 地学前缘, 2022, 29(5):464-475
ZHU Ziyi, ZHOU Fei, WANG Yu, et al. Machine learning-based approach for zircon classification and genesis determination[J]. Earth Science Frontiers, 2022, 29(5):464-475.]
[17] Zhao Y, Zhang Y G, Geng M, et al. Involvement of slab-derived fluid in the generation of cenozoic basalts in Northeast China inferred from machine learning[J]. Geophysical Research Letters, 2019, 46(10):5234-5242. doi: 10.1029/2019GL082322
[18] Guo P, Yang T, Xu W L, et al. Machine learning reveals source compositions of intraplate basaltic rocks[J]. Geochemistry, Geophysics, Geosystems, 2021, 22(9):e2021GC009946. doi: 10.1029/2021GC009946
[19] Gion A M, Piccoli P M, Candela P A. Characterization of biotite and amphibole compositions in granites[J]. Contributions to Mineralogy and Petrology, 2022, 177(4):43. doi: 10.1007/s00410-022-01908-7
[20] Kirkwood C, Cave M, Beamish D, et al. A machine learning approach to geochemical mapping[J]. Journal of Geochemical Exploration, 2016, 167:49-61. doi: 10.1016/j.gexplo.2016.05.003
[21] Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, et al. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines[J]. Ore Geology Reviews, 2015, 71:804-818. doi: 10.1016/j.oregeorev.2015.01.001
[22] Sun T, Chen F, Zhong L X, et al. GIS-based mineral prospectivity mapping using machine learning methods: A case study from Tongling ore district, eastern China[J]. Ore Geology Reviews, 2019, 109:26-49. doi: 10.1016/j.oregeorev.2019.04.003
[23] Breiman L. Bagging predictors[J]. Machine Learning, 1996, 24(2):123-140.
[24] Breiman L. Random forests[J]. Machine Learning, 2001, 45(1):5-32. doi: 10.1023/A:1010933404324
[25] Berné S, Vagner P, Guichard F O, et al. Pleistocene forced regressions and tidal sand ridges in the East China Sea[J]. Marine Geology, 2002, 188(3-4):293-315. doi: 10.1016/S0025-3227(02)00446-2
[26] Chough S K, Kim J W, Lee S H, et al. High-resolution acoustic characteristics of epicontinental sea deposits, central–eastern Yellow Sea[J]. Marine Geology, 2002, 188(3-4):317-331. doi: 10.1016/S0025-3227(02)00379-1
[27] Graw J H, Wood W T, Phrampus B J. Predicting global marine sediment density using the random forest regressor machine learning algorithm[J]. Journal of Geophysical Research: Solid Earth, 2021, 126(1):e2020JB020135. doi: 10.1029/2020JB020135
[28] 石学法, 刘焱光, 乔淑卿, 等. 渤海、黄海和东海沉积物类型图[M]. 北京: 科学出版社, 2021
SHI Xuefa, LIU Yanguang, QIAO Shuqing, et al. Sediment Type Map of the Bohai Sea, Yellow Sea and East China Sea[M]. Beijing: Science Press, 2021.]
[29] Lee T R, Phrampus B J, Obelcz J, et al. Global marine isochore estimates using machine learning[J]. Geophysical Research Letters, 2020, 47(18):e2020GL088726. doi: 10.1029/2020GL088726
[30] Zhao B, Yao P, Bianchi T S, et al. Contrasting controls of particulate organic carbon composition and age from riverine to coastal sediments of Eastern China Marginal Seas[J]. Chemical Geology, 2023, 624:121429.
[31] 王玥铭, 窦衍光, 李军, 等. 16ka以来冲绳海槽中南部沉积物物源演化及其对古气候的响应[J]. 沉积学报, 2018, 36(6):1157-1168
WANG Yueming, DOU Yanguang, LI Jun, et al. Sediment provenance change and its response to paleochimate change in the middle Okinawa Trough since 16 ka[J]. Acta Sedimentologica Sinica, 2018, 36(6):1157-1168.]
[32] Malinverno A, Martinez E A. The effect of temperature on organic carbon degradation in marine sediments[J]. Scientific Reports, 2015, 5:17861. doi: 10.1038/srep17861
[33] 高建华, 汪亚平, 潘少明, 等. 长江口外海域沉积物中有机物的来源及分布[J]. 地理学报, 2007, 62(9):981-991
GAO Jianhua, WANG Yaping, PAN Shaoming, et al. Source and distribution of organic matter in seabed sediments of the Changjiang River estuary and its adjacent sea area[J]. Acta Geographica Sinica, 2007, 62(9):981-991.]
[34] Chen S Z, Lou S, Yang Z Y, et al. Distributions and influence factors of organic carbon in coastal area of the Yangtze River Estuary, China[J]. Estuaries and Coasts, 2024, 47(8):2253-2266. doi: 10.1007/s12237-024-01428-6
[35] Zhang S S, Liang C, Xian W W. Spatial and temporal distributions of terrestrial and marine organic matter in the surface sediments of the Yangtze River Estuary[J]. Continental Shelf Research, 2020, 203:104158. doi: 10.1016/j.csr.2020.104158
[36] Helber R W, Richman J G, Barron C N. The influence of temperature and salinity variability on the upper ocean density and mixed layer[J]. Ocean Science Discussions, 2010, 7(4):1469-1495.
[37] Wang Y J, Jiang C J, Cheng H Q, et al. Influence of the density stratification on the vertical distribution of suspended sediment in a partially mixed estuary[J]. Marine Geology, 2025, 481:107461. doi: 10.1016/j.margeo.2024.107461
[38] Masmoudi S, Tastard E, Guermazi W, et al. Salinity gradient and nutrients as major structuring factors of the phytoplankton communities in salt marshes[J]. Aquatic Ecology, 2015, 49(1):1-19. doi: 10.1007/s10452-014-9500-5
[39] D'ors A, Bartolomé M C, Sánchez-Fortún S. Repercussions of salinity changes and osmotic stress in marine phytoplankton species[J]. Estuarine, Coastal and Shelf Science, 2016, 175:169-175. doi: 10.1016/j.ecss.2016.04.004
-