Vu Duy Dong, Nguyen An Hung, Nguyen Tien Phat, Nguyen Thi Huyen, Nguyen Thi Nhat Thanh

Main Article Content

Abstract

This study proposes a multi-layer machine learning architecture for multi-class rainfall estimation in Central Vietnam. The input data includes Himawari-8 satellite imagery, ERA5 reanalysis data, ASTER DEM, and rain gauge observations. Four regional satellite-based rainfall products, including IMERG Final Run V07, IMERG Early Run V07, GSMaP_MVK_Gauge V08, and PERSIANN_CCS, were used as comparative datasets. Three machine learning algorithms, including Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGB), and Random Forest (RF), were employed within the proposed architecture. Performance evaluation based on rain gauge observations showed that the LGBM-based rainfall product achieved the highest classification performance among the three surveyed products, with a Probability of Detection (POD) of 0.80, a Critical Success Index (CSI) of 0.54, a Matthews Correlation Coefficient (MCC) of 0.59, and a Symmetric Extremal Dependence Index (SEDI) of 0.58. Compared to the best-performing rainfall product (GSMaP_MVK_Gauge V08), the LGBM-based product demonstrated significant improvements in classification performance, with increases of 6.67% in POD, 8.00% in CSI, 11.32% in MCC, and 20.83% in SEDI. In terms of rainfall regression performance, the LGBM-based product also outperformed the other evaluated products, exhibiting the lowest errors, with a Mean Absolute Error (MAE) of 2.91 mm/h, Root Mean Square Error (RMSE) of 5.81 mm/h, and Mean Logarithmic Squared Error (MLSE) of 0.47.


 

Keywords: Rainfall estimation, Machine learning, LGBM, Random forest, Himawari-8, ERA5.

References

[1] L. Trinh, J. Matsumoto, T. N. Duc, M. Nodzu, T. Inoue, Evaluation of Satellite Precipitation Products Over Central Vietnam, Vol. 6, 2019, https://doi.org/10.1186/s40645-019-0297-7.
[2] T. N. Duc, T. Long, Future Rainfall Projections in Vietnam based on a CMIP6 Dynamical Downscaling Experiment, VNU Journal of Science: Earth and Environmental Sciences,
Vol. 39, 2023, https://doi.org/10.25073/2588-1094/vnuees.4933.
[3] M. Kühnlein, T. Appelhans, B. Thies, T. Nauss, Improving The Accuracy of Rainfall Rates from Optical Satellite Sensors with Machine Learning — A Random Forests-Based Approach Applied to MSG Seviri, Remote Sens Environ, Vol. 141, 2014, pp. 129-143, https://doi.org/10.1016/j.rse.2013.10.026.
[4] M. Min et al., Estimating Summertime Precipitation from Himawari-8 and Global Forecast System Based on Machine Learning, IEEE Transactions on Geoscience and Remote Sensing, Vol. 57, 2019, pp. 2557–2570, https://doi.org/10.1109/TGRS.2018.2874950.
[5] M. Putra, M. Rosid, D. Handoko, High-Resolution Rainfall Estimation Using Ensemble Learning Techniques and Multisensor Data Integration, Sensors, Vol. 24, 2024, pp. 5030, https://doi.org/10.3390/s24155030.
[6] G. V. Nguyen, X. Le, L. N. Van, S. Jung, C. Choi, G. Lee, Evaluating the Performance of Light Gradient Boosting Machine in Merging Multiple Satellite Precipitation Products Over South Korea, 2023.
[7] M. I. Nodzu, J. Matsumoto, L. T. Tuan, T. N. Duc, Precipitation Estimation Performance by Global Satellite Mapping and Its Dependence on Wind Over Northern Vietnam, Prog Earth Planet Sci, Vol. 6, No. 1, 2019, pp. 58, https://doi.org/10.1186/s40645-019-0296-8.
[8] T. Cong, L. Quyen, G. N. Minh, L. Quyet, The Application of Himawari Satellite Data in Forecast and Warning of Rain and Thunderstorm, Vietnam Journal of Hydrometeorology, Vol. 719, 2020,
pp. 1-13, https://doi.org/10.36335/VNJHM.2020(719).1-13.
[9] Y. Huang, Y. Bao, G. P. Petropoulos, Q. Lu, Y. Huo, F. Wang, Precipitation Estimation Using FY-4B/AGRI Satellite Data Based on Random Forest,” Remote Sens (Basel), Vol. 16, No. 7, 2024, https://doi.org/ 10.3390/rs16071267.
[10] Y. Kim, S. Hong, Very Short-term Prediction of Weather Radar-Based Rainfall Distribution and Intensity Over the Korean Peninsula Using Convolutional Long Short-Term Memory Network, Asia Pac J Atmos Sci, Vol. 58, 2022, https://doi.org/10.1007/s13143-022-00269-2.
[11] T. C. Chen, J. D. Tsay, M. C. Yen, J. Matsumoto, Interannual Variation of the Late Fall Rainfall in Central Vietnam, J Clim, Vol. 25, 2012, pp. 392-413, https://doi.org/10.1175/JCLI-D-11-00068.1.
[12] V. Hang, N. Pham, H. P. Thanh, Evaluation of GSMaP Satellite Precipitation Over Central Vietnam in 2000-2010 Period and Correction Ability, VNU Journal of Science: Earth and Environmental Sciences, Vol. 34, 2018, https://doi.org/10.25073/2588-1094/vnuees.4341.
[13] B. M. Tuan, P. Yen, T. Mai, C. Ta Huu, N. Hoa, Distinct Characteristics of Early Summer Rainfall Over the Red River Delta and Southern Floodplain, VNU Journal of Science: Earth and Environmental Sciences, 2025, https://doi.org/10.25073/2588-1094/vnuees.5255.
[14] D. Lavers, A. Simmons, F. Vamborg, M. Rodwell, An Evaluation of ERA5 Precipitation for Climate Monitoring, Quarterly Journal of the Royal Meteorological Society, Vol. 148, 2022, https://doi.org/10.1002/qj.4351.
[15] A. Mohammadi et al., A Multi-sensor Comparative Analysis on the Suitability of Generated DEM from Sentinel-1 SAR Interferometry Using Statistical and Hydrological Models, Sensors,
Vol. 20, 2020, pp. 7214, https://doi.org/10.3390/s20247214.
[16] L. Xuegang et al., Comparative Evaluation of GPM IMERG V07 Early, Late and Final Run Products Compared to IMERG V06 in Sichuan Province, China, Theor Appl Climatol, Vol. 156, 2025, https://doi.org/ 10.1007/s00704-025-05569-x.
[17] F. Gan, X. Cai, Y. Gao, X. Zhang, A Performance-Enhancement-oriented Evaluation System to Scrutinize the Changes From IMERG V06 Updated to V07 in Capturing and Presenting Typhoon Process, Atmos Res, vol. 326, 2025,
pp. 108292, https://doi.org/10.1016/j.atmosres.2025.108292.
[18] C. Zhou, L. Zhou, J. Du, J. Yue, T. Ao, Accuracy Evaluation and Comparison of Gsmap Series for Retrieving Precipitation on the Eastern Edge of the Qinghai-Tibet Plateau, J Hydrol Reg Stud, Vol. 56, 2024, pp. 102017, 2024, https://doi.org/10.1016/j.ejrh.2024.102017.
[19] P. Nguyen et al., The Persiann Family of Global Satellite Precipitation Data: A Review and Evaluation of Products, Hydrol Earth Syst Sci,
Vol. 22, 2018, pp. 5801–5816, https://doi.org/10.5194/hess-22-5801-2018.
[20] A. Giordani, I. Cerenzia, T. Paccagnella, S. Di Sabatino, Sphera, A New Convection-Permitting Regional Reanalysis Over Italy: Improving the Description of Heavy Rainfall, 2022.
[21] D. Piyush, A. Varma, P. K. Pal, G. Liu, An Analysis of Rainfall Measurements over Different Spatio-Temporal Scales and Potential Implications for Uncertainty in Satellite Data Validation, Journal of the Meteorological Society of Japan, Vol. 90, 2012,
https://doi.org/10.2151/jmsj.2012-401.
[22] R. K. Sumesh et al., Microphysical Aspects of Tropical Rainfall During Bright Band Events at Mid and High-altitude Regions Over Southern Western Ghats, India, Atmos Res, Vol. 227, 2019, pp. 178-197, https://doi.org/10.1016/j.atmosres.2019.05.002.
[23] L. Breiman, Random Forests, Mach Learn, Vol. 45, 2021, pp. 5-32, https://doi.org/10.1023/A:1010950718922.
[24] N. H. Pham, Q. Pham, T. Tran, Apply Machine Learning to Predict Saltwater Intrusion in the Ham Luong River, Ben Tre Province, VNU Journal of Science Earth and Environmental Sciences,
Vol. 38, 2022, pp. 79-92, https://doi.org/10.25073/2588-1094/vnuees.4852.
[25] F. Baig, L. Ali, F. Ma, H. Chen, M. Sherif, How Accurate are the Machine Learning Models in Improving Monthly Rainfall Prediction in Hyper Arid Environment?, J Hydrol (Amst), Vol. 633, 2024, pp. 131040, https://doi.org/10.1016/j.jhydrol.2024.131040.
[26] S. Panigrahi, V. Vidyarthi, Assessing the Suitability of McKee et al., Drought Severity Classification Across India, Natural Hazards,
Vol. 120, 2024, pp. 13543-13572, https://doi.org/10.1007/s11069-024-06762-3.
[27] K. Sharma, R. Ashrit, S. Kumar, A. Mitra, E. Rajagopal, Unified Model Rainfall Forecasts Over India During 2007–2018: Evaluating Extreme Rains Over Hilly Regions, Journal of Earth System Science, Vol. 130, 2021, https://doi.org/10.1007/s12040-021-01595-1.
[28] P. Krithik, M. Raghavan, K. Vasanth, Real Time Rainfall Prediction for Hyderabad Region using Machine learning Approach, 2021, pp. 1-6, https://doi.org/10.1109/iPACT52855.2021.9697017.
[29] H. Hirose, S. Shige, M. Yamamoto, A. Higuchi, High Temporal Rainfall Estimations from Himawari-8 Multiband Observations Using the Random-Forest Machine-Learning MethodHigh Temporal Rainfall Estimations from Himawari-8 Multiband Observations Using the Random-Forest Machine-Learning Method, Journal of the Meteorological Society of Japan, Ser. II, Vol. 97, 2019, https://doi.org/10.2151/jmsj.2019-040.
[30] V. Dong, A Nguyen, N. Phat, N. Thanh, N. Huyen, Improving Precipitation Estimation Accuracy for The Central Vietnam Region Using The Xgboost Model With Multi-Source Data, Tnu Journal of Science and Technology, Vol. 229, 2024,
pp. 69-77, https://doi.org/10.34238/tnu-jst.11346.
[31] Q. Jiang et al., Evaluation of the ERA5 Reanalysis Precipitation Dataset Over Chinese Mainland, J Hydrol (Amst), Vol. 595, 2021, pp. 125660, https://doi.org/10.1016/j.jhydrol.2020.125660.
[32] X. Li et al., Leveraging Machine Learning for Quantitative Precipitation Estimation from Fengyun-4 Geostationary Observations and Ground Meteorological Measurements, Atmos Meas Tech, Vol. 14, No. 11, 2021, pp. 7007-7023, 2021, https://doi.org/10.5194/amt-14-7007-2021.
[33] S. Ullah et al., GPM-Based Multitemporal Weighted Precipitation Analysis Using Gpm_Imergdf Product and Aster Dem in EDBF Algorithm, Remote Sens (Basel), Vol. 12, 3020, pp. 3162, https://doi.org/10.3390/rs12193162.