Articles | Volume 10, issue 2
https://doi.org/10.5194/soil-10-679-2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/soil-10-679-2024
© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Insights into the prediction uncertainty of machine-learning-based digital soil mapping through a local attribution approach
BRGM, 3 Av. C. Guillemin, 45060 Orléans Cedex 2, France
Stephane Belbeze
BRGM, 3 Av. C. Guillemin, 45060 Orléans Cedex 2, France
Dominique Guyonnet
BRGM, 3 Av. C. Guillemin, 45060 Orléans Cedex 2, France
Related authors
Jeremy Rohmer, Remi Thieblemont, Goneri Le Cozannet, Heiko Goelzer, and Gael Durand
The Cryosphere, 16, 4637–4657, https://doi.org/10.5194/tc-16-4637-2022, https://doi.org/10.5194/tc-16-4637-2022, 2022
Short summary
Short summary
To improve the interpretability of process-based projections of the sea-level contribution from land ice components, we apply the machine-learning-based
SHapley Additive exPlanationsapproach to a subset of a multi-model ensemble study for the Greenland ice sheet. This allows us to quantify the influence of particular modelling decisions (related to numerical implementation, initial conditions, or parametrisation of ice-sheet processes) directly in terms of sea-level change contribution.
Jeremy Rohmer, Deborah Idier, Remi Thieblemont, Goneri Le Cozannet, and François Bachoc
Nat. Hazards Earth Syst. Sci., 22, 3167–3182, https://doi.org/10.5194/nhess-22-3167-2022, https://doi.org/10.5194/nhess-22-3167-2022, 2022
Short summary
Short summary
We quantify the influence of wave–wind characteristics, offshore water level and sea level rise (projected up to 2200) on the occurrence of flooding events at Gâvres, French Atlantic coast. Our results outline the overwhelming influence of sea level rise over time compared to the others. By showing the robustness of our conclusions to the errors in the estimation procedure, our approach proves to be valuable for exploring and characterizing uncertainties in assessments of future flooding.
Ryota Wada, Jeremy Rohmer, Yann Krien, and Philip Jonathan
Nat. Hazards Earth Syst. Sci., 22, 431–444, https://doi.org/10.5194/nhess-22-431-2022, https://doi.org/10.5194/nhess-22-431-2022, 2022
Short summary
Short summary
Characterizing extreme wave environments caused by tropical cyclones in the Caribbean Sea near Guadeloupe is difficult because cyclones rarely pass near the location of interest. STM-E (space-time maxima and exposure) model utilizes wave data during cyclones on a spatial neighbourhood. Long-duration wave data generated from a database of synthetic tropical cyclones are used to evaluate the performance of STM-E. Results indicate STM-E provides estimates with small bias and realistic uncertainty.
Rémi Thiéblemont, Gonéri Le Cozannet, Jérémy Rohmer, Alexandra Toimil, Moisés Álvarez-Cuesta, and Iñigo J. Losada
Nat. Hazards Earth Syst. Sci., 21, 2257–2276, https://doi.org/10.5194/nhess-21-2257-2021, https://doi.org/10.5194/nhess-21-2257-2021, 2021
Short summary
Short summary
Sea level rise and its acceleration are projected to aggravate coastal erosion over the 21st century. Resulting shoreline projections are deeply uncertain, however, which constitutes a major challenge for coastal planning and management. Our work presents a new extra-probabilistic framework to develop future shoreline projections and shows that deep uncertainties could be drastically reduced by better constraining sea level projections and improving coastal impact models.
Jeremy Rohmer, Pierre Gehl, Marine Marcilhac-Fradin, Yves Guigueno, Nadia Rahni, and Julien Clément
Nat. Hazards Earth Syst. Sci., 20, 1267–1285, https://doi.org/10.5194/nhess-20-1267-2020, https://doi.org/10.5194/nhess-20-1267-2020, 2020
Short summary
Short summary
Fragility curves (FCs) are key tools for seismic probabilistic safety assessments that are performed at the level of the nuclear power plant (NPP). These statistical methods relate the probabilistic seismic hazard loading at the given site to the required performance of the NPP safety functions. In the present study, we investigate how the tools of
non-stationary extreme value analysis can be used to model in a flexible manner the FCs for NPP.
T.J. B. Dewez, D. Girardeau-Montaut, C. Allanic, and J. Rohmer
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLI-B5, 799–804, https://doi.org/10.5194/isprs-archives-XLI-B5-799-2016, https://doi.org/10.5194/isprs-archives-XLI-B5-799-2016, 2016
J. Rohmer and T. Dewez
Nat. Hazards Earth Syst. Sci., 15, 349–362, https://doi.org/10.5194/nhess-15-349-2015, https://doi.org/10.5194/nhess-15-349-2015, 2015
Short summary
Short summary
This article uses summary statistics of spatial point process theory to study the spatio-temporal pattern of a rockfall inventory recorded with repeated terrestrial laser scanning surveys at a chalk coastal cliff site in Normandy, France. This allows testing and quantifying the significance of geomorphological observations. From a spatial distribution perspective, behaviours of small and large scars cannot be considered equivalent, suggesting that erosion processes and triggering factors differ.
Jeremy Rohmer, Remi Thieblemont, Goneri Le Cozannet, Heiko Goelzer, and Gael Durand
The Cryosphere, 16, 4637–4657, https://doi.org/10.5194/tc-16-4637-2022, https://doi.org/10.5194/tc-16-4637-2022, 2022
Short summary
Short summary
To improve the interpretability of process-based projections of the sea-level contribution from land ice components, we apply the machine-learning-based
SHapley Additive exPlanationsapproach to a subset of a multi-model ensemble study for the Greenland ice sheet. This allows us to quantify the influence of particular modelling decisions (related to numerical implementation, initial conditions, or parametrisation of ice-sheet processes) directly in terms of sea-level change contribution.
Jeremy Rohmer, Deborah Idier, Remi Thieblemont, Goneri Le Cozannet, and François Bachoc
Nat. Hazards Earth Syst. Sci., 22, 3167–3182, https://doi.org/10.5194/nhess-22-3167-2022, https://doi.org/10.5194/nhess-22-3167-2022, 2022
Short summary
Short summary
We quantify the influence of wave–wind characteristics, offshore water level and sea level rise (projected up to 2200) on the occurrence of flooding events at Gâvres, French Atlantic coast. Our results outline the overwhelming influence of sea level rise over time compared to the others. By showing the robustness of our conclusions to the errors in the estimation procedure, our approach proves to be valuable for exploring and characterizing uncertainties in assessments of future flooding.
Ryota Wada, Jeremy Rohmer, Yann Krien, and Philip Jonathan
Nat. Hazards Earth Syst. Sci., 22, 431–444, https://doi.org/10.5194/nhess-22-431-2022, https://doi.org/10.5194/nhess-22-431-2022, 2022
Short summary
Short summary
Characterizing extreme wave environments caused by tropical cyclones in the Caribbean Sea near Guadeloupe is difficult because cyclones rarely pass near the location of interest. STM-E (space-time maxima and exposure) model utilizes wave data during cyclones on a spatial neighbourhood. Long-duration wave data generated from a database of synthetic tropical cyclones are used to evaluate the performance of STM-E. Results indicate STM-E provides estimates with small bias and realistic uncertainty.
Rémi Thiéblemont, Gonéri Le Cozannet, Jérémy Rohmer, Alexandra Toimil, Moisés Álvarez-Cuesta, and Iñigo J. Losada
Nat. Hazards Earth Syst. Sci., 21, 2257–2276, https://doi.org/10.5194/nhess-21-2257-2021, https://doi.org/10.5194/nhess-21-2257-2021, 2021
Short summary
Short summary
Sea level rise and its acceleration are projected to aggravate coastal erosion over the 21st century. Resulting shoreline projections are deeply uncertain, however, which constitutes a major challenge for coastal planning and management. Our work presents a new extra-probabilistic framework to develop future shoreline projections and shows that deep uncertainties could be drastically reduced by better constraining sea level projections and improving coastal impact models.
Jeremy Rohmer, Pierre Gehl, Marine Marcilhac-Fradin, Yves Guigueno, Nadia Rahni, and Julien Clément
Nat. Hazards Earth Syst. Sci., 20, 1267–1285, https://doi.org/10.5194/nhess-20-1267-2020, https://doi.org/10.5194/nhess-20-1267-2020, 2020
Short summary
Short summary
Fragility curves (FCs) are key tools for seismic probabilistic safety assessments that are performed at the level of the nuclear power plant (NPP). These statistical methods relate the probabilistic seismic hazard loading at the given site to the required performance of the NPP safety functions. In the present study, we investigate how the tools of
non-stationary extreme value analysis can be used to model in a flexible manner the FCs for NPP.
T.J. B. Dewez, D. Girardeau-Montaut, C. Allanic, and J. Rohmer
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLI-B5, 799–804, https://doi.org/10.5194/isprs-archives-XLI-B5-799-2016, https://doi.org/10.5194/isprs-archives-XLI-B5-799-2016, 2016
J. Rohmer and T. Dewez
Nat. Hazards Earth Syst. Sci., 15, 349–362, https://doi.org/10.5194/nhess-15-349-2015, https://doi.org/10.5194/nhess-15-349-2015, 2015
Short summary
Short summary
This article uses summary statistics of spatial point process theory to study the spatio-temporal pattern of a rockfall inventory recorded with repeated terrestrial laser scanning surveys at a chalk coastal cliff site in Normandy, France. This allows testing and quantifying the significance of geomorphological observations. From a spatial distribution perspective, behaviours of small and large scars cannot be considered equivalent, suggesting that erosion processes and triggering factors differ.
Related subject area
Pedometrics
Reference soil groups map of Ethiopia based on legacy data and machine learning-technique: EthioSoilGrids 1.0
Accuracy of regional-to-global soil maps for on-farm decision-making: are soil maps “good enough”?
Shapley values reveal the drivers of soil organic carbon stock prediction
How well does digital soil mapping represent soil geography? An investigation from the USA
Ashenafi Ali, Teklu Erkossa, Kiflu Gudeta, Wuletawu Abera, Ephrem Mesfin, Terefe Mekete, Mitiku Haile, Wondwosen Haile, Assefa Abegaz, Demeke Tafesse, Gebeyhu Belay, Mekonen Getahun, Sheleme Beyene, Mohamed Assen, Alemayehu Regassa, Yihenew G. Selassie, Solomon Tadesse, Dawit Abebe, Yitbarek Wolde, Nesru Hussien, Abebe Yirdaw, Addisu Mera, Tesema Admas, Feyera Wakoya, Awgachew Legesse, Nigat Tessema, Ayele Abebe, Simret Gebremariam, Yismaw Aregaw, Bizuayehu Abebaw, Damtew Bekele, Eylachew Zewdie, Steffen Schulz, Lulseged Tamene, and Eyasu Elias
SOIL, 10, 189–209, https://doi.org/10.5194/soil-10-189-2024, https://doi.org/10.5194/soil-10-189-2024, 2024
Short summary
Short summary
This paper focuses on collating legacy soil profile data and on the production of an updated national soil type map of Ethiopia, EthioSoilGrids version 1.0, using legacy data and a machine-learning approach. Given its quantitative digital representation, the map and the associated data make tremendous contributions to agricultural development planning and digital agricultural solutions, as well as improving the accuracy of global predictive soil mapping efforts.
Jonathan J. Maynard, Edward Yeboah, Stephen Owusu, Michaela Buenemann, Jason C. Neff, and Jeffrey E. Herrick
SOIL, 9, 277–300, https://doi.org/10.5194/soil-9-277-2023, https://doi.org/10.5194/soil-9-277-2023, 2023
Short summary
Short summary
Accurate information on soil properties is critical for identifying soil limitations and the management practices needed to improve crop yields on smallholder farms. This study evaluated the accuracy of soil map information for agronomic decision-making. Based on four publicly available soil maps in Ghana, we found that soil map data significantly overestimated crop suitability, potentially leading to ineffective agronomic investments by smallholder farmers.
Alexandre M. J.-C. Wadoux, Nicolas P. A. Saby, and Manuel P. Martin
SOIL, 9, 21–38, https://doi.org/10.5194/soil-9-21-2023, https://doi.org/10.5194/soil-9-21-2023, 2023
Short summary
Short summary
We introduce Shapley values for machine learning model interpretation and reveal the local and global controlling factors of soil organic carbon (SOC) stocks. The method enables spatial analysis of the important variables. Vegetation and topography determine much of the SOC stock variation in mainland France. We conclude that SOC stock variation is complex and should be interpreted at multiple levels.
David G. Rossiter, Laura Poggio, Dylan Beaudette, and Zamir Libohova
SOIL, 8, 559–586, https://doi.org/10.5194/soil-8-559-2022, https://doi.org/10.5194/soil-8-559-2022, 2022
Short summary
Short summary
Maps of soil properties made by machine learning techniques are increasingly applied in Earth surface process modelling and agronomy. Maps of the same area made by different methods appear quite different and also differ from field-based polygon soil survey maps. We explore these differences both visually and numerically, using methods that quantify the spatial patterns. Readers can apply the methods to their areas of interest in the USA with the supplied R Markdown scripts.
Cited articles
Aas, K., Jullum, M., and Løland, A.: Explaining individual predictions when features are dependent: More accurate approximations to Shapley values, Artif. Intell., 298, 103502, https://doi.org/10.1016/j.artint.2021.103502, 2021.
Abdar, M., Pourpanah, F., Hussain, S., Rezazadegan, D., Liu, L., Ghavamzadeh, M., Fieguth, P., Cao, X., Khosravi, A., Rajendra Acharya, U., Makarenkov, V., and Nahavandi, S.: A review of uncertainty quantification in deep learning: Techniques, applications and challenges, Inform. Fusion, 76, 243–297, 2021.
Adhikari, K. and Hartemink, A. E.: Linking soils to ecosystem services – A global review, Geoderma, 262, 101–111, 2016.
Arrouays, D., McBratney, A., Bouma, J., Libohova, Z., Richerde-Forges, A. C., Morgan, C. L. S., Roudier, P., Poggio, L., and Mulder, V. L.: Impressions of digital soil maps: The good, the not so good, and making them ever better, Geoderma Regional, 20, e00255, https://doi.org/10.1016/j.geodrs.2020.e00255, 2020.
Behrens, T., Schmidt, K., Viscarra Rossel, R. A., Gries, P., Scholten, T., and MacMillan, R. A.: Spatial modelling with Euclidean distance fields and machine learning, Eur. J. Soil Sci., 69, 757–770, 2018.
Bel, L., Allard, D., Laurent, J. M., Cheddadi, R., and Bar-Hen, A.: CART algorithm for spatial data: Application to environmental and ecological data, Comput. Stat. Data An., 53, 3082–3093, 2009.
Belbeze, S., Djemil, M., Béranger, S., and Stochetti, A.: Détermination de FPGA – Fonds Pédo-Géochimiques Anthropisés urbains Agglomération pilote: TOULOUSE MÉTROPOLE, Technical Report BRGM/RP-69502-FR, 347 pp., http://ficheinfoterre.brgm.fr/document/RP-69502-FR (last access: 25 September 2024), 2019 (in French).
Belbeze, S., Assy, Y., Le Cointe, P., and Rame, E.: CAPacité d'Infiltration des eaux pluviales du territoire de TOULouse Métropole (CAPITOUL), Technical Report BRGM/RP71904-FR, 72 pp., http://infoterre.brgm.fr/rapports/RP-71904-FR.pdf (last access: 25 September 2024), 2022 (in French).
Belbeze, S., Rohmer, J., Négrel, P., and Guyonnet, D.: Defining urban soil geochemical backgrounds: A review for application to the French context, J. Geochem. Explor., 254, 107298, https://doi.org/10.1016/j.gexplo.2023.107298, 2023.
Bénard, C., Da Veiga, S., and Scornet, E.: Mean decrease accuracy for random forests: inconsistency, and a practical solution via the Sobol-MDA, Biometrika, 109, 881–900, 2022.
Ben Salem, M., Roustant, O., Gamboa, F., and Tomaso, L.: Universal prediction distribution for surrogate models, SIAM/ASA Journal on Uncertainty Quantification, 5, 1086–1109, 2017.
Breiman, L.: Random forests, Mach. Learn., 45, 5–32, 2001.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J.: Classification and regression trees, Wadsworth, California, 1984.
Chen, H., Covert, I. C., Lundberg, S. M., and Lee, S. I.: Algorithms to estimate Shapley value feature attributions, Nature Machine Intelligence, 5, 590–601, 2023.
Chilès, J.-P. and Delfiner, P.: Geostatistics: modeling spatial uncertainty, 2nd edn., Wiley, New York, https://doi.org/10.1002/9781118136188, 2012.
Chilès, J. P. and Desassis, N.: Fifty Years of Kriging, in: Handbook of Mathematical Geosciences, edited by: Daya Sagar, B., Cheng, Q., and Agterberg, F., Springer, Cham, https://doi.org/10.1007/978-3-319-78999-6_29, 2018.
Copernicus Land Monitoring Service information: Urban Atlas Land Cover/Land Use 2012 (vector), Europe, 6-yearly, Jan. 2021, Copernicus [data set], https://doi.org/10.2909/debc1869-a4a2-4611-ae95-daeefce23490, 2012.
Da Veiga, S.: Global sensitivity analysis with dependence measures, J. Stat. Comput. Sim., 85, 1283–1305, 2015.
De Bruin, S., Brus, D. J., Heuvelink, G. B., van Ebbenhorst Tengbergen, T., and Wadoux, A. M. C.: Dealing with clustered samples for assessing map accuracy by cross-validation, Ecol. Inform., 69, 101665, https://doi.org/10.1016/j.ecoinf.2022.101665, 2022.
El Amri, M. R. and Marrel, A.: Optimized HSIC‐based tests for sensitivity analysis: Application to thermalhydraulic simulation of accidental scenario on nuclear reactor, Qual. Reliab. Eng.Int., 38, 1386–1403, 2022.
Gretton, A., Bousquet, O., Smola, A., and Schölkopf, B.: Measuring Statistical Dependence with Hilbert-Schmidt Norms, in: Algorithmic Learning Theory,edited by: Jain, S., Simon, H. U., and Tomita, E., ALT 2005, Lecture Notes in Computer Science, Vol. 3734, Springer, Berlin, Heidelberg, https://doi.org/10.1007/11564089_7, 2005.
Gullo, F., Ponti, G., and Tagarelli, A.: Clustering Uncertain Data Via K-Medoids, in: Scalable Uncertainty Management, edited by: Greco, S. and Lukasiewicz, T., SUM 2008, Lecture Notes in Computer Science, Vol. 5291, Springer, Berlin, Heidelberg, https://doi.org/10.1007/978-3-540-87993-0_19, 2008.
Hastie, T., Tibshirani, R., and Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, Berlin/Heidelberg, Germany, https://doi.org/10.1007/978-0-387-84858-7, 2009.
Heuvelink, G. B. and Webster, R.: Spatial statistics and soil mapping: A blossoming partnership under pressure, Spat. Stat.-Neth., 50, 100639, https://doi.org/10.1016/j.spasta.2022.100639, 2022.
Hothorn, T., Hornik, K., and Zeileis, A.: Unbiased Recursive Partitioning: A Conditional Inference Framework, J. Comput. Graph. Stat., 15, 651–674, 2006.
Jay, C., Yu, Y., Crawford, I., Archer-Nicholls, S., James, P., Gledson, A., Shaddick, G., Haines, R., Lannelongue, L., Lines, E., Hosking, S., and Topping, D.: Prioritize environmental sustainability in use of AI and data science methods, Nat. Geosci., 17, 106–108, https://doi.org/10.1038/s41561-023-01369-y, 2024.
Jullum, M., Redelmeier, A., and Aas, K.: Efficient and Simple Prediction Explanations with groupShapley: A Practical Perspective, in: Proceedings of the 2nd Italian Workshop on Explainable Artificial Intelligence, 28–43, CEUR Workshop Proceedings, 1–3 December 2021, https://ceur-ws.org/Vol-3014/paper3.pdf (last access: 25 September 2024) 2021.
Kirkwood, C., Economou, T., Pugeault, N., and Odbert, H.: Bayesian deep learning for spatial interpolation in the presence of auxiliary information, Math. Geosci., 54, 507–531, 2022.
Leprond, H: Bilan annuel du projet ≪ Etablissements Sensibles ≫, Technical Report BRGM/RP-62878-FR, 24 pp., http://ficheinfoterre.brgm.fr/document/RP-62878-FR (last access: 25 September 2024), 2013 (in French).
Ludwig, M., Moreno-Martinez, A., Hölzel, N., Pebesma, E., and Meyer, H.: Assessing and improving the transferability of current global spatial prediction models, Global Ecol. Biogeogr., 32, 356–368, 2023.
Lundberg, S. M. and Lee, S. I.: A unified approach to interpreting model predictions, Adv. Neur. In., 30, https://doi.org/10.48550/arXiv.1705.07874, 2017.
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., and Hornik, K.: cluster: Cluster Analysis Basics and Extensions, R package version 2.1.6, https://doi.org/10.32614/CRAN.package.cluster, 2023.
McBratney, A. B., Santos, M. M., and Minasny, B.: On digital soil mapping, Geoderma, 117, 3–52, 2003.
Meinshausen, N.: Quantile regression forests, J. Mach. Learn. Res., 7, 983–999, 2006.
Meyer, H.: Vignette of the R package CAST available, Github [data set], https://hannameyer.github.io/CAST/articles/cast02-AOA-tutorial.html, last access: 25 September 2024.
Meyer, H., Reudenbach, C., Hengl, T., Katurji, M., and Nauss, T.: Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation, Environ. Modell. Softw., 101, 1–9, 2018.
Molnar, C.: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, 2nd edn., https://christophm.github.io/interpretable-ml-book/ (last access: 2 January 2024), 2022.
Padarian, J., McBratney, A. B., and Minasny, B.: Game theory interpretation of digital soil mapping convolutional neural networks, SOIL, 6, 389–397, https://doi.org/10.5194/soil-6-389-2020, 2020.
Panagos, P., Van Liedekerke, M., Borrelli, P., Köninger, J., Ballabio, C., Orgiazzi, A., Lugato, E., Liakos, L., Hervas, J., Jones, A., and Montanarella, L.: European Soil Data Centre 2.0: Soil data and knowledge in support of the EU policies, Eur. J. Soil Sci., 73, e13315, https://doi.org/10.1111/ejss.13315, 2022.
Poggio, L., de Sousa, L. M., Batjes, N. H., Heuvelink, G. B. M., Kempen, B., Ribeiro, E., and Rossiter, D.: SoilGrids 2.0: producing soil information for the globe with quantified spatial uncertainty, SOIL, 7, 217–240, https://doi.org/10.5194/soil-7-217-2021, 2021.
Redelmeier, A., Jullum, M., and Aas, K.: Explaining Predictive Models with Mixed Features Using Shapley Values and Conditional Inference Trees, in: Machine Learning and Knowledge Extraction, edited by: Holzinger, A., Kieseberg, P., Tjoa, A., and Weippl, E., CD-MAKE 2020, Lecture Notes in Computer Science, Vol. 12279, Springer, Cham, https://doi.org/10.1007/978-3-030-57321-8_7, 2020.
Rohmer, J.: R script for computing group SHAPLEY dedicated to prediction uncertainty, Zenodo [code], https://doi.org/10.5281/zenodo.13838496, 2024.
Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M., and Tarantola, S. (Eds.): Global sensitivity analysis: the primer, John Wiley & Sons, https://doi.org/10.1002/9780470725184, 2008.
Schmidinger, J. and Heuvelink, G. B.: Validation of uncertainty predictions in digital soil mapping, Geoderma, 437, 116585, https://doi.org/10.1016/j.geoderma.2023.116585, 2023.
Sellereite, N., Jullum, M., Redelmeier, A., and Lachmann, J.: shapr: Prediction Explanation with Dependence-Aware Shapley Values. R package version 0.2.3.9100, https://github.com/NorskRegnesentral/shapr/ (last access: 25 September 2024), https://norskregnesentral.github.io/shapr/ (last access: 25 September 2024), 2023.
Shapley, L. S.: A value for n-person games, in: Contributions to the Theory of Games, edited by: Kuhn, H. and Tucker, A. W., Volume II, Annals of Mathematics Studies, Princeton University Press, Princeton, NJ, Chap. 17, 307–317, 1953.
Song, H., Liu, H., and Wu, M. C.: A fast kernel independence test for cluster-correlated data, Sci. Rep.-UK, 12, 21659, https://doi.org/10.1038/s41598-022-26278-9, 2022.
Takoutsing, B. and Heuvelink, G. B.: Comparing the prediction performance, uncertainty quantification and extrapolation potential of regression kriging and random forest while accounting for soil measurement errors, Geoderma, 428, 116192, https://doi.org/10.1016/j.geoderma.2022.116192, 2022.
Varella, H., Guérif, M., and Buis, S.: Global sensitivity analysis measures the quality of parameter estimation: the case of soil parameters and a crop model, Environ. Modell. Softw., 25, 310–319, 2010.
Vaysse, K. and Lagacherie, P.: Using quantile regression forest to estimate uncertainty of digital soil mapping products, Geoderma, 291, 55–64, 2017.
Venables, W. N. and Ripley, B. D.: Modern Applied Statistics with S, Springer, https://doi.org/10.1007/978-0-387-21706-2, 2002.
Veronesi, F. and Schillaci, C.: Comparison between geostatistical and machine learning models as predictors of topsoil organic carbon with a focus on local uncertainty estimation, Ecol. Indic., 101, 1032–1044, 2019.
Wadoux, A. M. C. and Molnar, C.: Beyond prediction: methods for interpreting complex models of soil variation, Geoderma, 422, 115953, https://doi.org/10.1016/j.geoderma.2022.115953, 2022.
Wadoux, A. M. C., Minasny, B., and McBratney, A. B.: Machine learning for digital soil mapping: Applications, challenges and suggested solutions, Earth-Sci. Rev., 210, 103359, https://doi.org/10.1016/j.earscirev.2020.103359, 2020.
Wadoux, A. M. J.-C., Saby, N. P. A., and Martin, M. P.: Shapley values reveal the drivers of soil organic carbon stock prediction, SOIL, 9, 21–38, https://doi.org/10.5194/soil-9-21-2023, 2023.
Watson, D. S., O'Hara, J., Tax, N., Mudd, R., and Guy, I.: Explaining Predictive Uncertainty with Information Theoretic Shapley Values, arXiv [preprint], https://doi.org/10.48550/arXiv.2306.05724, 2023.
Wright, M. N. and Ziegler, A.: ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R, J. Stat. Softw., 77, 1–17, 2017.
Xu, R., Nettleton, D., and Nordman, D. J.: Case-specific random forests, J. Comput. Graph. Stat., 25, 49–65, 2016.
Short summary
Machine learning (ML) models have become key ingredients for digital soil mapping. To explain why the ML model is reliable, we apply a popular method from explainable artificial intelligence to the uncertainty prediction, with an application to the mapping of hydrocarbon pollutants on urban soil. We show the benefit of a joint analysis of the influence on the best estimate and the uncertainty to improve communication with end users and support decisions regarding covariates’ characterisation.
Machine learning (ML) models have become key ingredients for digital soil mapping. To explain...