Prediction of lead content in soil based on model population analysis coupled with ELM algorithm
-
摘要: 为探寻区域土壤重金属含量最佳反演模型,以龙海市为研究区,对土壤原始光谱数据分别进行SG平滑、小波变换、高斯滤波和多元散射校正4种光谱预处理,运用基于模型集群分析(model population analysis,MPA)策略开发的波长选择算法: 竞争适应性重加权采样算法(competitive adaptive reweighted sampling,CARS)、变量空间迭代收缩算法(variable iterative space shrinkage approach,VISSA)、迭代变量子集优化算法(iteratively variable subset optimization,IVSO)和区间组合优化算法(interval combination optimization,ICO)剔除干扰与无信息波长变量,采用线性模型偏最小二乘回归(partial least squares regression,PLSR)、非线性模型支持向量机(support vector machine,SVM)及神经网络模型极限学习机(extreme learning machine,ELM)进行土壤重金属铅(Pb)含量回归预测。结果表明: 经过多种预处理方法建立的Pb含量反演模型中,基于小波变换第七层重构后的光谱数据构建的模型预测精度最优,其验证集R2=0.736,RMSE=5.426,RPD=1.976,RPIQ=2.560。基于MPA策略开发的CARS,VISSA,IVSO和ICO都能显著提升模型解释性与泛化性能,并且提高建模效率。3种回归模型总体的预测表现排序: ELM>PLSR>SVM。其中ICO-ELM预测精度最高,其验证集R2=0.863,RMSE=3.953,RPD=2.712,RPIQ=3.514。所建最优模型可为区域土地质量和生态指标快速准确监测提供新的理论参考。Abstract: This paper aims to explore the optimal inversion model of regional heavy metal content in soil. With Longhai City taken as the study area, this study preprocessed the original spectral data of soil using the methods of Savizky Golay (SG), wavelet transform (WT), gaussian filter (GF), and multiple scatter correction (MSC) individually, then eliminated the interference and wavelength bearing no information using the wavelength selection algorithms developed based on model population analysis (MPA), including the competitive adaptive reweighted sampling (CARS), variable iterative space shrinkage approach (VISSA), iteratively variable subset optimization (IVSO), and interval combination optimization (ICO), and finally predicted the lead content in soil using the linear partial least squares regression (PLSR) model, nonlinear support vector machine (SVM) model, and extreme learning machine (ELM) based on neural network. The results are as follows. ① Among the inversion models of lead content in soil established using various preprocessing methods, the model built based on reconstructed spectral data of level 7th by wavelet transform had the most optimal prediction accuracy, with R2=0.736, RMSE=5.426, RPD=1.976, and RPIQ=2.560. ② The CARS, VISSA, IVSO, and ICO algorithms developed based on MPA significantly improved the performance of model interpretation and generalization and improved modeling efficiency. ③ In terms of overall prediction results, the three regression models were in the order of ELM>PLSR>SVM. Among them, the ICO-ELM had the highest prediction accuracy, with R2=0.863, RMSE=3.953, RPD=2.712,and RPIQ=3.514. Therefore, the optimal model established in this study can provide a new theoretical reference for the rapid monitoring of regional land quality and ecological indicators.
-
-
[1] 宋伟, 陈百明, 刘琳. 中国耕地土壤重金属污染概况[J]. 水土保持研究, 2013,20(2):293-298.
[2] Song W, Chen B M, Liu L. Soil heavy metal pollution of cultivated land in China[J]. Research of Soil and Water Conservation, 2013,20(2):293-298.
[3] 贺军亮, 韩超山, 韦锐, 等. 基于偏最小二乘的土壤重金属镉间接反演模型[J]. 国土资源遥感, 2019,31(4):96-103.doi: 10.6046/gtzyyg.2019.04.13.
[4] He J L, Han C S, Wei R, et al. Research on indirect hyperspectral estimating model of heavy metal Cd based on partial least squares regression[J]. Remote Sensing for Land and Resources, 2019,31(4):96-103.doi: 10.6046/gtzyyg.2019.04.13.
[5] 高凯旋, 焦海明, 王新闯. 融合影像纹理、光谱与地形特征的森林冠顶高反演模型[J]. 国土资源遥感, 2020,32(3):63-70.doi: 10.6046/gtzyyg.2020.03.09.
[6] Gao K X, Jiao H M, Wang X C. Inversion model of forest canopy height based on image texture,spectral and topographic features[J]. Remote Sensing for Land and Resources, 2020,32(3):63-70.doi: 10.6046/gtzyyg.2020.03.09.
[7] 段宏伟, 朱荣光, 许卫东, 等. 基于GA和CARS的真空包装冷却羊肉细菌菌落总数高光谱检测[J]. 光谱学与光谱分析, 2017,37(3):847-852.
[8] Duan H W, Zhu R G, Xu W D, et al. Hyperspectral imaging detection of total viable count from vacuum packing cooling mutton based on GA and CARS algorithms[J]. Spectroscopy and Spectral Analysis, 2017,37(3):847-852.
[9] Le B T, 肖冬, 毛亚纯, 等. 可见、近红外光谱和深度学习CNN-ELM算法的煤炭分类[J]. 光谱学与光谱分析, 2018,38(7):2107-2112.
[10] Le B T, Xiao D, Mao Y C, et al. Coal classification based on visible,near-infrared spectroscopy and CNN-ELM algorithm[J]. Spectroscopy and Spectral Analysis, 2018,38(7):2107-2112.
[11] 汪六三, 鲁翠萍, 王儒敬, 等. 土壤碱解氮含量可见/近红外光谱预测模型优化[J]. 发光学报, 2018,39(7):1016-1023.
[12] Wang L S, Lu C P, Wang R J, et al. Optimization for Vis/NIRS prediction model of soil available nitrogen content[J]. Chinese Journal of Luminescence, 2018,39(7):1016-1023.
[13] 吴倩, 姜琦刚, 史鹏飞, 等. 基于高光谱的土壤碳酸钙含量估算模型研究[J]. 国土资源遥感, 2021,33(1):138-144.doi: 10.6046/gtzyyg.2020095.
[14] Wu Q, Jiang Q G, Shi P F, et al. Estimation of soil calcium carbonate content based on hyperspectral data[J]. Remote Sensing for Land and Resources, 2021,33(1):138-144.doi: 10.6046/gtzyyg.2020095.
[15] 李跑, 周骏, 蒋立文, 等. 窗口竞争性自适应重加权采样策略的近红外特征变量选择方法[J]. 光谱学与光谱分析, 2019,39(5):1428-1432.
[16] Li P, Zhou J, Jiang L W, et al. A variable selection approach of near infrared spectra based on window competitive adaptive reweighted sampling strategy[J]. Spectroscopy and Spectral Analysis, 2019,39(5):1428-1432.
[17] Li H, Liang Y, Xu Q, et al. Model population analysis for variable selection[J]. Journal of Chemometrics, 2010,24(7-8):418-423.
[18] 于雷, 章涛, 朱亚星, 等. 基于IRIV算法优选大豆叶片高光谱特征波长变量估测SPAD值[J]. 农业工程学报, 2018,34(16):148-154.
[19] Yu L, Zhang T, Zhu Y X, et al. Determination of soybean leaf SPAD value using characteristic wavelength variables preferably selected by IRIV algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering, 2018,34(16):148-154.
[20] 刘贵珊, 张翀, 樊奈昀, 等. IVISSA算法冷鲜滩羊肉嫩度的高光谱模型优化[J]. 光谱学与光谱分析, 2020,40(8):2558-2563.
[21] Liu G S, Zhang C, Fan N Y, et al. Hyperspectral model optimization for tenderness of chilled tan-sheep mutton based on IVISSA[J]. Spectroscopy and Spectral Analysis, 2020,40(8):2558-2563.
[22] 云永欢, 邓百川, 梁逸曾. 化学建模与模型集群分析[J]. 分析化学, 2015,43(11):1638-1647.
[23] Yun Y H, Deng B C, Liang Y Z. Progress of chemical modeling and model population analysis[J]. Chinese Journal of Analytical Chemistry, 2015,43(11):1638-1647.
[24] Deng B C, Yun Y H, Liang Y Z, et al. A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling[J]. Analyst, 2014,139(19):4836-4845.
[25] Wang W, Yun Y, Deng B, et al. Iteratively variable subset optimization for multivariate calibration[J]. RSC Advances, 2015,5(116):95771-95780.
[26] Song X Z, Yue H, Hong Y, et al. A novel algorithm for spectral interval combination optimization[J]. Analytica Chimica Acta, 2016,948:19-29.
[27] 孙凯, 孙彬彬, 周国华, 等. 福建龙海土壤重金属含量特征及影响因素研究[J]. 现代地质, 2018,32(6):1302-1310.
[28] Sun K, Sun B B, Zhou G H, et al. Study on concentration characteristics and influencing factors of heavy metals in soils in Longhai,Fujian Province[J]. Geoscience, 2018,32(6):1302-1310.
[29] 刘智超, 蔡文生, 邵学广. 蒙特卡洛交叉验证用于近红外光谱奇异样本的识别[J]. 中国科学(B辑:化学), 2008,38(4):316-323.
[30] Liu Z C, Cai W S, Shao X G. Identification of NIR outlier samples by MCCV[J]. Science in China Series B:Chemistry, 2008,38(4):316-323.
[31] Huang G, Huang G, Song S, et al. Trends in extreme learning machines:A review[J]. Neural Networks, 2015,61:32-48.
[32] Tan K, Wang H, Zhang Q, et al. An improved estimation model for soil heavy metal(loid) concentration retrieval in mining areas using reflectance spectroscopy[J]. Journal of Soils and Sediments, 2018,18(5):2008-2022.
[33] 宋相中, 唐果, 张录达, 等. 近红外光谱分析中的变量选择算法研究进展[J]. 光谱学与光谱分析, 2017,37(4):1048-1052.
[34] Song X Z, Tang G, Zhang L D, et al. Research advance of variable selection algorithms in near infrared spectroscopy analysis[J]. Spectroscopy and Spectral Analysis, 2017,37(4):1048-1052.
-
计量
- 文章访问数: 759
- PDF下载数: 68
- 施引文献: 0