Handling imbalanced samples in landslide susceptibility evaluation
-
摘要:
滑坡易发性评价中,样本不均衡问题的不同处理方案通常会带来评价结果的大量不确定性。针对这一问题,以藏东昌都市部分县(区)为研究区,构建滑坡/非滑坡样本不均衡数据集,采用不处理、下采样和合成少数类过采样(synthetic minority oversampling technique, SMOTE)3种处置方案,运用逻辑回归方法分别构建滑坡易发性评价模型。基于ROC曲线、准确度、精确率、召回率、漏检率等评价指标,采用综合评价指标F1′同数对模型分类的精度进行验证。结果表明:数据处理成均衡数据集(过采样/下采样)建立的模型效果较不处理数据建立的模型效果有了大幅提升,F1′同数的值最大提高了53.17%;在下采样、过采样两种数据处理方案中,过采样方法比下采样方法F1′分数的值提高了16.30%,表明过采样方法对处理样本不均衡数据问题方面具有较好效果。研究成果可为滑坡预测和地质灾害预测前的数据集处理提供参考,为进一步提高区域防灾减灾水平提供理论与技术支持。
-
关键词:
- 滑坡易发性 /
- 合成少数类过采样技术 /
- 评价模型 /
- 昌都市 /
- 样本不均衡数据
Abstract:In landslide susceptibility assessment, different approaches to handling sample imbalance can introduce significant uncertainty in evaluation outcomes. To address this issue, this study focused on the Changdu area of eastern Tibet and constructed the landslide susceptibility evaluation model using a dataset with imbalanced landslide and non-landslide samples. Three disposal schemes were applied: no treatment, downsampling, and SMOTE oversampling. The logistic regression method was used to construct the landslide susceptibility evaluation model. Based on ROC curve, accuracy, precision, recall, missed detection rate, and other evaluation indicators, the comprehensive evaluation index of F1′ score was used to verify the accuracy of model classification. The results show that the modeling effect of landslide susceptibility obtained by data processing into equilibrium data (downsampling/oversampling) is greatly improved compared with that obtained without processing data. Specifically, the value of the F1′score of the comprehensive index was increased by 53.17%. In the two schemes for processing data (downsampling and oversampling), the oversampling method increased the value of the composite index F1′ score by 16.30% compared with the downsampling method, indicating that the oversampling method has effectiveness in handling unbalanced data. This study can provide basic information for processing of data sets before landslide prediction and geological disaster prediction, and provide theoretical and technical support for further improving regional disaster prevention and mitigation.
-
Key words:
- Landslide susceptibility /
- SMOTE /
- evaluation model /
- Changdu /
- unbalanced data
-
-
表 1 各评价因子分级及频率比值
Table 1. Frequency ratio of each evaluation factor
指标因子 指标分级 占滑坡
栅格比/%占总
栅格比/%频率比 归一化值 指标因子 指标分级 占滑坡
栅格比/%占总
栅格比/%频率比 归一化值 坡度/(°) [0, 10) 0.05 0.09 0.55 0 坡向 西南 0.13 0.14 0.95 0.66 [10, 17) 0.09 0.13 0.71 0.07 西 0.13 0.12 1.00 0.74 [17, 23) 0.13 0.16 0.81 0.12 西北 0.07 0.11 0.60 0.15 [23, 28) 0.18 0.18 1.03 0.22 距断层
距离/m[0, 500) 0.15 0.14 1.04 0.55 [28, 33) 0.18 0.18 0.98 0.19 [500, 1000) 0.12 0.13 0.90 0.12 [33, 39) 0.17 0.15 1.08 0.24 [1000 , 2 000)0.22 0.22 0.99 0.41 [39, 47) 0.15 0.09 1.66 0.50 [2 000, 4000) 0.22 0.26 0.86 0 [47, 81] 0.06 0.02 2.79 1.00 ≥4000 0.29 0.25 1.18 1.00 高程/m [2496 ,3435) 0.34 0.03 10.53 1.00 工程地质
岩组较坚硬层状碎屑岩组 0.13 0.19 0.68 0.25 [3435 ,3787) 0.26 0.07 3.58 0.34 较坚硬层状碳酸盐岩组 0.10 0.12 0.89 0.52 [3787 ,4055) 0.21 0.12 1.73 0.16 软硬相间互层状碎屑岩组 0.48 0.41 1.16 0.86 [4055 ,4278) 0.12 0.17 0.71 0.07 坚硬块状侵入岩组 0.15 0.13 1.16 0.85 [4278 ,4487) 0.05 0.21 0.24 0.02 较软弱薄层浅变质岩组 0.06 0.08 0.73 0.31 [4487 ,4705) 0.02 0.19 0.12 0.01 坚硬厚层-块状深变质岩组 0.07 0.05 1.27 1.00 [4705 ,4966) 0 0.14 0.02 0 第四系松散岩组 0.01 0.02 0.49 0 [4966 ,5784] 0 0.07 0.02 0 距河流
距离/m[0, 500) 0.72 0.23 3.09 1.00 曲率 [−9.23, −1.25) 0.07 0.03 2.55 1.00 [500, 1000) 0.13 0.21 0.62 0.18 [−1.25, −0.65) 0.16 0.10 1.60 0.46 [1000 ,1500) 0.08 0.19 0.42 0.11 [−0.65, −0.28) 0.21 0.18 1.13 0.20 [1500 , 2 000)0.04 0.16 0.28 0.06 [−0.28, 0.09) 0.19 0.24 0.78 0 [2 000, 2500) 0.02 0.12 0.19 0.03 [0.09, 0.46) 0.18 0.22 0.86 0.04 ≥ 2500 0.01 0.10 0.10 0 [0.46, 0.90) 0.12 0.14 0.82 0.02 植被指数 [0, 0.10) 0.01 0.09 0.07 0 [0.90, 1.57) 0.06 0.08 0.81 0.02 [0.10, 0.21) 0.06 0.07 0.85 0.45 [1.57, 9.63] 0.02 0.02 0.98 0.11 [0.21, 0.30) 0.23 0.13 1.79 1.00 坡向 平面 0 0 0.50 0 [0.30, 0.37) 0.30 0.22 1.35 0.74 北 0.10 0.12 0.83 0.49 [0.37, 0.43) 0.21 0.23 0.90 0.48 东北 0.17 0.15 1.17 0.98 [0.43, 0.51) 0.11 0.15 0.71 0.37 东 0.14 0.13 1.06 0.83 [0.51, 0.62) 0.06 0.08 0.73 0.38 东南 0.13 0.11 1.18 1.00 [0.62, 1] 0.03 0.03 1.14 0.62 南 0.13 0.11 1.18 1.00 表 2 不同处理方式得到的逻辑回归模型预测结果评价
Table 2. Evaluation of logistic regression model prediction results obtained by different processing methods
评价指标
处理方式准确度/% 精确率/% 召回率/% 漏检率/% 不处理 99.16 60.43 24.96 75.04 下采样 82.25 91.57 71.95 28.05 过采样 95.78 16.76 90.48 9.52 -
[1] 吴树仁,石菊松,张春山,等. 地质灾害风险评估技术指南初论[J]. 地质通报,2009,28(8):995 − 1005. [WU Shuren,SHI Jusong,ZHANG Chunshan,et al. Preliminary discussion on technical guideline for geohazard risk assessment[J]. Geological Bulletin of China,2009,28(8):995 − 1005. (in Chinese with English abstract)] doi: 10.3969/j.issn.1671-2552.2009.08.001
WU Shuren, SHI Jusong, ZHANG Chunshan, et al. Preliminary discussion on technical guideline for geohazard risk assessment[J]. Geological Bulletin of China, 2009, 28(8): 995 − 1005. (in Chinese with English abstract) doi: 10.3969/j.issn.1671-2552.2009.08.001
[2] ADITIAN A,KUBOTA T,SHINOHARA Y. Comparison of GIS-based landslide susceptibility models using frequency ratio,logistic regression,and artificial neural network in a tertiary region of Ambon,Indonesia[J]. Geomorphology,2018,318:101 − 111. doi: 10.1016/j.geomorph.2018.06.006
[3] CHEN Wei,LI Wenping,CHAI Huichan,et al. GIS-based landslide susceptibility mapping using analytical hierarchy process (AHP) and certainty factor (CF) models for the Baozhong region of Baoji City,China[J]. Environmental Earth Sciences,2015,75(1):63.
[4] MYRONIDIS D,PAPAGEORGIOU C,THEOPHANOUS S. Landslide susceptibility mapping based on landslide history and analytic hierarchy process (AHP)[J]. Natural Hazards,2016,81(1):245 − 263. doi: 10.1007/s11069-015-2075-1
[5] LARI S,FRATTINI P,CROSTA G B. A probabilistic approach for landslide hazard analysis[J]. Engineering Geology,2014,182:3 − 14.
[6] HU Xudong,MEI Hongbo,ZHANG Han,et al. Performance evaluation of ensemble learning techniques for landslide susceptibility mapping at the Jinping county, Southwest China[J]. Natural Hazards,2021,105(2):1663 − 1689.
[7] 王世宝,庄建琦,樊宏宇. 基于频率比与集成学习的滑坡易发性评价——以金沙江上游巴塘—德格河段为例[J]. 工程地质学报,2022,30(3):817 − 828. [WANG Shibao,ZHUANG Jianqi,FAN Hongyu,et al. Evaluation of landslide susceptibility based on frequency ratio and ensemble learning—Taking the Batang-Dege section in the upstream of Jinsha River as an example[J]. Journal of Engineering Geology,2022,30(3):817 − 828.(in Chinese with English abstract)]
WANG Shibao, ZHUANG Jianqi, FAN Hongyu, et al. Evaluation of landslide susceptibility based on frequency ratio and ensemble learning—Taking the Batang-Dege section in the upstream of Jinsha River as an example[J]. Journal of Engineering Geology, 2022, 30(3): 817 − 828.(in Chinese with English abstract)
[8] KAYASTHA P, DHITAL M R, DE SMEDT F. Application of the analytical hierarchy process (AHP) for landslide susceptibility mapping: a case study from the tinau watershed, West Nepal[J]. Computers & Geosciences,2013,52:389 − 408.
[9] MANDAL B, MANDAL S. Analytical hierarchy process (AHP) based landslide susceptibility mapping of Lish river basin of eastern Darjeeling Himalaya, India[J]. Advances in Space Research,2018,62(11):3114 − 3132.
[10] AKGUN A, DAG S, BULUT F. Landslide susceptibility mapping for a landslide-prone area (Findikli, NE of Turkey) by likelihood-frequency ratio and weighted linear combination models[J]. Environmental Geology,2008,54(6):1127 − 1143.
[11] AZIZ K, SARKAR S,SAHU P. Comparative analysis of frequency ratio, information value, and analytical hierarchy process statistical models for landslide susceptibility mapping in Kashmir Himalayas[J]. Arabian Journal of Geosciences,2024,17(1):36.
[12] BILGILIOĞLU H. A comparison of different machine learning models for landslide susceptibility mapping in Rize (Türkiye)[J]. Baltica,2023,36(2):115 − 132.
[13] 张钟远,邓明国,徐世光,等. 镇康县滑坡易发性评价模型对比研究[J]. 岩石力学与工程学报,2022,41(1):157 − 171. [ZHANG Zhongyuan, DENG Mingguo, XU Shiguang,et al. Comparison of landslide susceptibility assessment models in Zhenkang County,Yunnan Province,China[J]. Chinese Journal of Rock Mechanics and Engineering,2022,41(1):157 − 171. (in Chinese with English abstract)]
ZHANG Zhongyuan, DENG Mingguo, XU Shiguang, et al. Comparison of landslide susceptibility assessment models in Zhenkang County, Yunnan Province, China[J]. Chinese Journal of Rock Mechanics and Engineering, 2022, 41(1): 157 − 171. (in Chinese with English abstract)
[14] MA Yanbin,LI Hongrui,WANG Lin,et al. Machine learning algorithms and techniques for landslide susceptibility investigation:A literature review[J]. Journal of Civil and Environmental Engineering,2022,44(1):53 − 67.
[15] CHEN W,ZHAO X,SHAHABI H,et al. Spatial prediction of landslide susceptibility by combining evidential belief function,logistic regression and logistic model tree[J]. Geocarto International,2019,34(11):1177 − 1201. doi: 10.1080/10106049.2019.1588393
[16] OH H J,KADAVI P R,LEE C W,et al. Evaluation of landslide susceptibility mapping by evidential belief function,logistic regression and support vector machine models[J]. Geomatics,Natural Hazards and Risk,2018,9(1):1053 − 1070. doi: 10.1080/19475705.2018.1481147
[17] CHEN Wei,XIE Xiaoshen,PENG Jianbing,et al. GIS-based landslide susceptibility evaluation using a novel hybrid integration approach of bivariate statistical based random forest method[J]. CATENA,2018,164:135 − 149. doi: 10.1016/j.catena.2018.01.012
[18] 刘渊博,牛瑞卿,于宪煜,等. 旋转森林模型在滑坡易发性评价中的应用研究[J]. 武汉大学学报(信息科学版),2018,43(6):595 − 964. [LIU Yuanbo,NIU Ruiqing,YU Xianyu,et al. Application of the rotation forest model in landslide susceptibility assessment[J]. Geomatics and Information Science of Wuhan University,2018,43(6):959 − 946.(in Chinese with English abstract)]
LIU Yuanbo, NIU Ruiqing, YU Xianyu, et al. Application of the rotation forest model in landslide susceptibility assessment[J]. Geomatics and Information Science of Wuhan University, 2018, 43(6): 959 − 946.(in Chinese with English abstract)
[19] 王卫东,刘攀,龚陆. 基于支持向量机模型的四川省滑坡灾害易发性区划[J]. 铁道科学与工程学报,2019,16(5):1194 − 1200. [WANG Weidong,LIU Pan,GONG Lu. Landslide susceptibility mapping of Sichuan province based on support vector machine[J]. Journal of Railway Science and Engineering,2019,16(5):1194 − 1200.(in Chinese with English abstract)]
WANG Weidong, LIU Pan, GONG Lu. Landslide susceptibility mapping of Sichuan province based on support vector machine[J]. Journal of Railway Science and Engineering, 2019, 16(5): 1194 − 1200.(in Chinese with English abstract)
[20] 牟家琦,庄建琦,王世宝,等. 基于深度神经网络模型的雅安市滑坡易发性评价[J]. 中国地质灾害与防治学报,2023,34(3):157 − 168. [MU Jiaqi,ZHUANG Jianqi,WANG Shibao,et al. Evaluation of landslide susceptibility in Ya’an City based on depth neural network model[J]. The Chinese Journal of Geological Hazard and Control,2023,34(3):157 − 168. (in Chinese with English abstract)]
MU Jiaqi, ZHUANG Jianqi, WANG Shibao, et al. Evaluation of landslide susceptibility in Ya’an City based on depth neural network model[J]. The Chinese Journal of Geological Hazard and Control, 2023, 34(3): 157 − 168. (in Chinese with English abstract)
[21] PAVEL M,NELSON J D,JONATHAN FANNIN R. An analysis of landslide susceptibility zonation using a subjective geomorphic mapping and existing landslides[J]. Computers & Geosciences,2011,37(4):554 − 566.
[22] 穆柯,谢婉丽,刘琦琦,等. 基于LR-RF模型的滑坡易发性评价——以铜川市耀州区为例[J]. 灾害学,2022,37(3):212 − 218. [MU Ke,XIE Wanli,LIU Qiqi,et al. Research on landslide susceptibility evaluation based on logistic regression and LR coupling model[J]. Journal of Catastrophology,2022,37(3):212 − 218. (in Chinese with English abstract)]
MU Ke, XIE Wanli, LIU Qiqi, et al. Research on landslide susceptibility evaluation based on logistic regression and LR coupling model[J]. Journal of Catastrophology, 2022, 37(3): 212 − 218. (in Chinese with English abstract)
[23] 刘坚,李树林,陈涛. 基于优化随机森林模型的滑坡易发性评价[J]. 武汉大学学报(信息科学版),2018,43(7):1085 − 1091. [LIU Jian,LI Shulin,CHEN Tao. Landslide susceptibility assesment based on optimized random forest model[J]. Geomatics and Information Science of Wuhan University,2018,43(7):1085 − 1091. (in Chinese with English abstract)]
LIU Jian, LI Shulin, CHEN Tao. Landslide susceptibility assesment based on optimized random forest model[J]. Geomatics and Information Science of Wuhan University, 2018, 43(7): 1085 − 1091. (in Chinese with English abstract)
[24] HU Qiao,ZHOU Yi,WANG Shixing,et al. Machine learning and fractal theory models for landslide susceptibility mapping:Case study from the Jinsha River Basin[J]. Geomorphology,2020,351:106975. doi: 10.1016/j.geomorph.2019.106975
[25] 黄发明,陈佳武,唐志鹏,等. 不同空间分辨率和训练测试集比例下的滑坡易发性预测不确定性[J]. 岩石力学与工程学报,2021,40(6):1155 − 1169. [HUANG Faming,CHEN Jiawu,TANG Zhipeng,et al. Uncertainties of landslide susceptibility prediction due to different spatial resolutions and different proportions of training and testing datasets[J]. Chinese Journal of Rock Mechanics and Engineering,2021,40(6):1155 − 1169. (in Chinese with English abstract)]
HUANG Faming, CHEN Jiawu, TANG Zhipeng, et al. Uncertainties of landslide susceptibility prediction due to different spatial resolutions and different proportions of training and testing datasets[J]. Chinese Journal of Rock Mechanics and Engineering, 2021, 40(6): 1155 − 1169. (in Chinese with English abstract)
[26] 王毅,方志策,牛瑞卿,等. 基于深度学习的滑坡灾害易发性分析[J]. 地球信息科学学报,2021,23(12):2244 − 2260. [WANG Yi,FANG Zhice,NIU Ruiqing,et al. Landslide susceptibility analysis based on deep learning[J]. Journal of Geo-Information Science,2021,23(12):2244 − 2260. (in Chinese with English abstract)]
WANG Yi, FANG Zhice, NIU Ruiqing, et al. Landslide susceptibility analysis based on deep learning[J]. Journal of Geo-Information Science, 2021, 23(12): 2244 − 2260. (in Chinese with English abstract)
[27] 杜国梁,杨志华,袁颖,等. 基于逻辑回归–信息量的川藏交通廊道滑坡易发性评价[J]. 水文地质工程地质,2021,48(5):102 − 111. [DU Guoliang,YANG Zhihua,YUAN Ying,et al. Landslide susceptibility mapping in the Sichuan-Tibet traffic corridor using logistic regression-information value method[J]. Hydrogeology & Engineering Geology,2021,48(5):102 − 111. (in Chinese with English abstract)]
DU Guoliang, YANG Zhihua, YUAN Ying, et al. Landslide susceptibility mapping in the Sichuan-Tibet traffic corridor using logistic regression-information value method[J]. Hydrogeology & Engineering Geology, 2021, 48(5): 102 − 111. (in Chinese with English abstract)
[28] 陈涛,钟子颖,牛瑞卿,等. 利用深度信念网络进行滑坡易发性评价[J]. 武汉大学学报(信息科学版),2020,45(11):1809 − 1817. [CHEN Tao,ZHONG Ziying,NIU Ruiqing,et al. Mapping landslide susceptibility based on deep belief network[J]. Geomatics and Information Science of Wuhan University,2020,45(11):1809 − 1817. (in Chinese with English abstract)]
CHEN Tao, ZHONG Ziying, NIU Ruiqing, et al. Mapping landslide susceptibility based on deep belief network[J]. Geomatics and Information Science of Wuhan University, 2020, 45(11): 1809 − 1817. (in Chinese with English abstract)
[29] 杨强,王高峰,丁伟翠,等. 多种组合模型的区域滑坡易发性及精度评价[J]. 自然灾害学报,2021,30(2):36 − 51. [YANG Qiang,WANG Gaofeng,DING Weicui,et al. Susceptibility and accuracy evaluation of regional landsldie based on multiple hybrid models[J]. Journal of Natural Disasters,2021,30(2):36 − 51. (in Chinese with English abstract)]
YANG Qiang, WANG Gaofeng, DING Weicui, et al. Susceptibility and accuracy evaluation of regional landsldie based on multiple hybrid models[J]. Journal of Natural Disasters, 2021, 30(2): 36 − 51. (in Chinese with English abstract)
[30] 郭子正,殷坤龙,付圣,等. 基于GIS与WOE-BP模型的滑坡易发性评价[J]. 地球科学,2019,44(12):4299 − 4312. [GUO Zizheng,YIN Kunlong,FU Sheng,et al. Evaluation of landslide susceptibility based on GIS and WOE-BP model[J]. Earth Science,2019,44(12):4299 − 4312. (in Chinese with English abstract)]
GUO Zizheng, YIN Kunlong, FU Sheng, et al. Evaluation of landslide susceptibility based on GIS and WOE-BP model[J]. Earth Science, 2019, 44(12): 4299 − 4312. (in Chinese with English abstract)
[31] 贾雨霏,魏文豪,陈稳,等. 基于SOM-I-SVM耦合模型的滑坡易发性评价[J]. 水文地质工程地质,2023,50(3):125 − 137. [JIA Yufei,WEI Wenhao,CHEN Wen,et al. Landslide susceptibility assessment based on the SOM-I-SVM model[J]. Hydrogeology & Engineering Geology,2023,50(3):125 − 137. (in Chinese with English abstract)]
JIA Yufei, WEI Wenhao, CHEN Wen, et al. Landslide susceptibility assessment based on the SOM-I-SVM model[J]. Hydrogeology & Engineering Geology, 2023, 50(3): 125 − 137. (in Chinese with English abstract)
[32] 武雪玲,杨经宇,牛瑞卿. 一种结合SMOTE和卷积神经网络的滑坡易发性评价方法[J]. 武汉大学学报(信息科学版),2020,45(8):1223 − 1232. [WU Xueling,YANG Jingyu,NIU Ruiqing. A landslide susceptibility assessment method using SMOTE and convolutional neural network[J]. Geomatics and Information Science of Wuhan University,2020,45(8):1223 − 1232. (in Chinese with English abstract)]
WU Xueling, YANG Jingyu, NIU Ruiqing. A landslide susceptibility assessment method using SMOTE and convolutional neural network[J]. Geomatics and Information Science of Wuhan University, 2020, 45(8): 1223 − 1232. (in Chinese with English abstract)
[33] 李坤,赵俊三,林伊琳,等. 基于SMOTE和多粒度级联森林的泥石流易发性评价[J]. 农业工程学报,2022,38(6):113 − 121. [LI Kun,ZHAO Junsan,LIN Yilin,et al. Assessment of debris flow susceptibility based on SMOTE and multi-Grained Cascade Forest[J]. Transactions of the Chinese Society of Agricultural Engineering,2022,38(6):113 − 121. (in Chinese with English abstract)]
LI Kun, ZHAO Junsan, LIN Yilin, et al. Assessment of debris flow susceptibility based on SMOTE and multi-Grained Cascade Forest[J]. Transactions of the Chinese Society of Agricultural Engineering, 2022, 38(6): 113 − 121. (in Chinese with English abstract)
[34] 赵占骜,王继周,毛曦,等. 多维CNN耦合的滑坡易发性评价方法[J]. 武汉大学学报(信息科学版),2024,49(8):1466 − 1481. [ZHAO Zhan’ao,WANG Jizhou,MAO Xi,et al. A multi-dimensional CNN coupled landslide susceptibility assessment method[J]. Geomatics and Information Science of Wuhan University,2024,49(8):1466 − 1481. (in Chinese with English abstract)]
ZHAO Zhan’ao, WANG Jizhou, MAO Xi, et al. A multi-dimensional CNN coupled landslide susceptibility assessment method[J]. Geomatics and Information Science of Wuhan University, 2024, 49(8): 1466 − 1481. (in Chinese with English abstract)
-