Using 3D observations with high spatio-temporal resolution to calibrate and evaluate a process-focused cellular automaton model of soil erosion by water

Eltner, Anette; Favis-Mortlock, David; Grothum, Oliver; Neumann, Martin; Laburda, Tomáš; Kavka, Petr

doi:https://doi.org/10.5194/soil-11-413-2025

Articles | Volume 11, issue 1

https://doi.org/10.5194/soil-11-413-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/soil-11-413-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 11, issue 1

Original research article

| Highlight paper

|

12 Jun 2025

Original research article | Highlight paper |

| 12 Jun 2025

Using 3D observations with high spatio-temporal resolution to calibrate and evaluate a process-focused cellular automaton model of soil erosion by water

Anette Eltner, David Favis-Mortlock, Oliver Grothum, Martin Neumann, Tomáš Laburda, and Petr Kavka

Download

Final revised paper (published on 12 Jun 2025)
Supplement to the final revised paper
Preprint (discussion started on 12 Sep 2024)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2024-2648', Anonymous Referee #1, 14 Oct 2024
This manuscript is precisely what we need in soil erosion modelling: innovative, nuanced approaches to improve model evaluation and an honest assessment of model performance.
In the specific comments below, I made several questions that came up while reading the manuscript and some suggestions to hopefully improve the paper. Specifically, I recommend adjusting some of the modelling terminology (e.g. the usage of terms such as calibration, validation, and evaluation) and reducing the focus on identifying a ‘best model run’. Moreover, the figures generally could use some improvements. I expand on these topics in specific comments below.
Specific comments
Abstract: Here and in the introduction you could highlight the novelty of your work. To my knowledge, this is the first time such high spatiotemporal resolution data is used for evaluating erosion models.
L50-75: I found the model description a bit on the long side for the introduction. I would consider moving most of this to section 2.3 in the methods.
L74: Please define the abbreviations RD-ST and RD-FT.
L105: “Validate” and “evaluate” seem to be used interchangeably, but these terms can signify different meanings (Oreskes, 1998; Oreskes et al., 1994). Beven and Young (2013) suggest avoiding the term “validation” in hydrological modelling.
L121: Consider rephrasing to: “Ten objective functions were considered to calibrate model parameters”.
L122: By model runs, do you mean model realisations, i.e. “one random sample taken from the set of all possible random samples in a Monte Carlo simulation” (Beven, 2009)?
I missed a stronger statement about the importance and novelty of your work. What you have done is innovative and exciting and creates new possibilities for testing erosion models.
L167: How did you choose the DEM resolution?
L168: Why do the time-lapse data have finer resolutions than the DEMs used as input for the model?
L170: What is M3C2-PM?
L170-180: It’s great to have this spatially distributed error estimate for the point clouds. Have you considered using this as part of the model evaluation process, i.e. for defining limits of acceptability of model error (Beven, 2018)?
I appreciate the narrative model description, but having a list of model equations in the supplement would be very helpful. Please add this information.
L204: Please check if this formulation is correct: “If a wet cell’s sediment load is less than the transport capacity, then soil is eroded from the cell using a probabilistic detachment equation by Nearing (1991)”.
Shouldn’t you first calculate soil detachment for a given cell, sum it to the sediment load delivered to this cell by upstream cells, and then compare it to the transport capacity of the overland flow for this given cell to estimate the amount of sediment routed downstream? That is, why would soil detachment (and not transport) be dependent on the transport capacity of the overland flow? Maybe I misunderstood something – please clarify.
L231: I found it strange to calibrate the parameter ‘DEM base level’, as this is a measurable quantity that would not need to be estimated via calibration. Can you explain your rationale here?
L242-243: Based on this initial simulation, how did you choose the parameters for calibration? Based on some kind of sensitivity measure?
What was the parameter space sampled with the Latin hypercube simulation? Please give ranges (assuming you were sampling from uniform distributions) for each calibrated parameter.
L254: How did you assess the suitability of a function for calibrating the model?
L269-280: How is the DL metric interpreted? The higher the value, the greater the similarity?
L310: Why did you smooth out the DoDs and not the simulated DEMs?
Explaining the methods employed in section 2.5 was difficult, but I think you did a good job. Still, I have some questions/comments:
Is it possible to compare the measured sediment yield from the plot to the soil loss calculated from the DoDs (e.g. Cândido et al., 2020)?

In Table 2, the column named “single objective function” for the space-time averaged and time-averaged model evaluations just mentions the type of data used for model testing, e.g. Total EC and Total SY. But I reckon you calculated e.g. the RMSE and the NSE for the total EC and total SY data. Is this correct?

In lines 286-292, you explain the metrics (the objective functions?) used for model evaluation for the time-averaged data. But they are also used for the other data, right? This got a little confusing.

Have you considered calibrating the model with the lab data and testing it against the field data or vice versa?

Instead of searching for ‘best models runs’, have you considered looking at model realisations within the observational data's measurement errors? A ‘best’ model is always only ‘conditionally-best’ (e.g. on the range of conditions used for calibration, on imperfect evaluation data, prior assumptions, boundary conditions, and the criteria used for evaluating the model – as you have shown) (Beven, 2009, 2012). Using a limits-of-acceptability approach based on the errors in the observational data would allow you to identify behavioural model realisations while avoiding Type I errors (rejecting good models because of uncertain forcing data) (Beven, 2019, 2018). The behavioural realisations could still be analysed in light of the objective functions you defined, but without this quest for a single ‘true’ parameter set that would optimise all functions.

Figure 4: The font size for the axis text in the DTW WC and DTW SY panels is too small. The legend for the rasters is missing – I do not think referring to the legend from figure 2 is very helpful here. Moreover, it would be nice to identify the panels (e.g. a, b, c…) to improve readability.
L339: Where is this shown in Figure 6?
L345-347: This is very cool!
L349-350: Maybe the DEM smoothing is necessary for this kind of model application. Edit: Why was this not necessary for the lab rainfall simulation?
L355: I did not understand that “the metrics capture different aspects of soil surface change, including erosion”. The example in the next sentence did not clarify your point to me. Moreover, are there any processes potentially leading to changes in the soil surface that RillGrow does not represent?
L359-363: This is what I meant above – a single model realisation that optimises all functions is irrelevant. If you choose different functions, repeat the rainfall simulation experiment, or change any steps in the DEM processing, you’ll end up with a different optimal parameter set. Moreover, what is a good fit in this case? How do you define if the realisation fits a function “well”? I suggest rephrasing this to something along the lines of “We are looking to explore the behavioural parameter space constrained by different sources of data and objective functions”.
L364-365: This is a great demonstration of the equifinality problem!
L378: Where are these metrics being used for “validation”? From what I understand, so far you have explored different metrics as part of the model calibration procedure.
L389: What does it mean that the model does not predict splash or interill erosion? Is this identified by a given parameterisation or by the outputs? Moreover, I thought RillGrow did not differentiate between rill and interill processes (L207).
L390: Do you mean more splash is modelled for the realisations with better DTW and SY metrics?
You go into a lot of detail describing single model realisations, which makes the text long and sometimes difficult to follow. I think this stems from your focus on identifying a single realisation to optimise all functions. I suggest focusing on more generalisable patterns and shortening some of the results.
L405-406: Calibration and validation seem to get confused, please check or define this somewhere. Beven (2009) defines calibration as “the process of adjusting parameter values of a model to obtain a better fit between observed and predicted variables”. A calibrated model can then be tested against new data not used doing the calibration procedure (Klemeš, 1986). After reading the manuscript, I understand you tested different data and functions for calibrating RillGrow. Of course, this can be considered part of an evaluation process, but I suggest being precise about the terminology.
L412: Could this result from model input variables changing during the simulation and this not being picked up by the model parametrisation?
L415: Please try to be more precise when describing model performance. What is a very close fit to the observation?
L440: Similar error metrics?
Figure 9: Please add a legend for the point colours. Moreover, while the ten ‘best’ realisations can be very scattered, using a larger number of behavioural realisations might help you identify and describe patterns in the dotty plots.
L453-455: Is this a limitation of the model or the data? As you mentioned above, the initial changes in the soil surface are too small to be detected, considering the DEM errors.
Figure 10: Here the comma is used as a decimal separator.
L455-464: I had a hard time understanding this paragraph. What are these ranges? Which parameters do they represent?
Figure 11: Would be great to have the observed DoD here. Also, shouldn’t the abbreviations in the panel titles be described in the figure legend?
L530-535: It makes sense that the same model realisations that simulate higher changes in elevation also simulate higher sediment yield, right? The output variables should be correlated. Getting the rill patterns right is a different story.
Figure 15: Please check the decimal separators. Why doesn’t the y-axis start at zero? I also don’t understand this figure; what is this parameter range?
L580-595: There have been multiple attempts to evaluate erosion models using spatial data, e.g. from field surveys, aerial images, and fallout-radionuclide data (Brazier et al., 2001; Fischer et al., 2018; Jetten et al., 2003; Saggau et al., 2022; Vigiak et al., 2006; Wilken et al., 2020). So, I am not sure that model evaluation has lagged behind the models – the technology is out there; the problem is that it is so much easier not to use it.
What I think is really unique and exciting in your approach is the quality, the spatiotemporal resolution, and the different sources of data (plot outlet and SfM) used for model calibration.
L605-610: Yes, I found this similarity index very useful!
References
Beven, K.: Towards a methodology for testing models as hypotheses in the inexact sciences, Proc. R. Soc. A Math. Phys. Eng. Sci., 475(2224), doi:10.1098/rspa.2018.0862, 2019.
Beven, K. J.: Environmental Modelling: An Uncertain Future, Routledge, Oxon., 2009.
Beven, K. J.: Rainfall-Runoff Modelling, 2nd ed., John Wiley & Sons, Chichester., 2012.
Beven, K. J.: On hypothesis testing in hydrology: Why falsification of models is still a really good idea, WIREs Water, 5, e1278, doi:10.1002/wat2.1278, 2018.
Beven, K. J. and Young, P.: A guide to good practice in modeling semantics for authors and referees, Water Resour. Res., 49(8), 5092–5098, doi:10.1002/wrcr.20393, 2013.
Brazier, R. E., Beven, K. J., Anthony, S. G. and Rowan, J. S.: Implications of model uncertainty for the mapping of hillslope-scale soil erosion predictions, Earth Surf. Process. Landforms, 26, 1333–1352, 2001.
Cândido, B. M., Quinton, J. N., James, M. R., Silva, M. L. N., de Carvalho, T. S., de Lima, W., Beniaich, A. and Eltner, A.: High-resolution monitoring of diffuse (sheet or interrill) erosion using structure-from-motion, Geoderma, 375(May), 114477, doi:10.1016/j.geoderma.2020.114477, 2020.
Fischer, F. K., Kistler, M., Brandhuber, R., Maier, H., Treisch, M. and Auerswald, K.: Validation of official erosion modelling based on high-resolution radar rain data by aerial photo erosion classification, Earth Surf. Process. Landforms, 43(1), 187–194, doi:10.1002/esp.4216, 2018.
Jetten, V., Govers, G. and Hessel, R.: Erosion models: Quality of spatial predictions, Hydrol. Process., 17(5), 887–900, doi:10.1002/hyp.1168, 2003.
Klemeš, V.: Operational testing of hydrological simulation models, Hydrol. Sci. J., 31(1), 13–24, doi:10.1080/02626668609491024, 1986.
Oreskes, N.: Evaluation (not validation) of quantitative models, Environ. Health Perspect., 106(6), 1453–1460, doi:10.1289/ehp.98106s61453, 1998.
Oreskes, N., Shrader-Frechette, K. and Belitz, K.: Verification, validation, and confirmation of numerical models in the Earth Sciences, Science (80-. )., 263, 641–646, doi:10.1126/science.263.5147.641, 1994.
Saggau, P., Kuhwald, M., Hamer, W. B. and Duttmann, R.: Are compacted tramlines underestimated features in soil erosion modeling? A catchment-scale analysis using a process-based soil erosion model, L. Degrad. Dev., 33(3), 452–469, doi:10.1002/ldr.4161, 2022.
Takken, I., Beuselinck, L., Nachtergaele, J., Govers, G., Poesen, J. and Degraer, G.: Spatial evaluation of a physically-based distributed erosion model (LISEM), Catena, 37(3–4), 431–447, doi:10.1016/S0341-8162(99)00031-4, 1999.
Vigiak, O., Sterk, G., Romanowicz, R. J. and Beven, K. J.: A semi-empirical model to assess uncertainty of spatial patterns of erosion, Catena, 66(3), 198–210, doi:10.1016/j.catena.2006.01.004, 2006.
Warren, S. D., Mitasova, H., Hohmann, M. G., Landsberger, S., Iskander, F. Y., Ruzycki, T. S. and Senseman, G. M.: Validation of a 3-D enhancement of the Universal Soil Loss Equation for prediction of soil erosion and sediment deposition, Catena, 64(2–3), 281–296, doi:10.1016/j.catena.2005.08.010, 2005.
Citation: https://doi.org/10.5194/egusphere-2024-2648-RC1
- AC1: 'Reply on RC1', Anette Eltner, 20 Nov 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2648/egusphere-2024-2648-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-2648-AC1
RC2:
'Comment on egusphere-2024-2648', Anonymous Referee #2, 23 Oct 2024
This is one of the most interesting and insightful studies I have read over the past few years. First, I would like to give a high respect to the heavy workload and computation efforts that the authors endeavored to produce such a meticulously designed and solid results. To be honest, it took me a couple of days to really finish reading through this excellent work. Very impressive.
As one of the most advanced techniques, cellular automation becomes increasingly popular in modeling soil erosion processes. In the Introduction part, the authors did a good job in reviewing and summarizing the state-of-the-art of cellular automation in the field of soil erosion modeling. This therefore makes it quite easy to comprehend the knowledge gaps in current studies, and in turn not difficult to understand the novelty of this study. Yet, I reserve some concerns that the authors may consider to include in the revised manuscript.
The opening of the Abstract seems a bit too long, and key research questions or knowledge gaps come in a bit too late. It would be better if the authors could specify the research question of this study more explicitly, and also save more writing space to present the key findings in the following part. Apart from the methods, the results and observations are a bit too general. More specific data or quantitative descriptions would be more helpful to underscore the novelty.

In general, the Results part is a bit too long and saturated with figures. I would suggest remove some figures into the supplementary, or selectively show some of the subfigures. I believe, there must have been far more figures plotted out during the entire modelling and analyzing. The authors must have already tried a lot to reduce the number of illustrations. Yet, still, as a piece of a regular research article, too many figures and too lengthy results somehow might make the readers feel overwhelmed. Is it possible to include a table listing out the major performances of different model runs under the three different approaches? Or, the authors may even consider to develop a sort of “graphical abstract” or “conceptual diagram” to summarize the research questions, critical methods and key findings? So as to help the authors develop a “holistic” comprehension over this study?

In addition, there are always bits and pieces of discussion mixed in the Result part. This on the one hand makes the Results part quite lengthy; on the other hand, in the current state, the Discussion part is more into “limitations and future implications”, but short in in-depths explanations, coherent arguments and discussions with other peer studies (most of which is actually scattering in Results).

L580 to L595 in the Discussion section is actually a review over currently available models. They should be moved to the Introduction part, to better specify the knowledge gaps of current studies.

Some of the results were described in present tense. Should they be in past tense? For instance, L485 to L515 in subsection 3.2.

Although the authors mentioned the specific subprocesses, such as raindrop detachment, splash transport, flow transport and flow detachment (mostly derived from Kinnell 2001), the potential effects or impacts of these subprocesses were not adequately discussed. For instance, the selectivity of runoff over eroding time in carrying soil particles of different sizes. This may partly contribute to the unmatched temporal variations of sediment yield.

L615, the statement on “erosion model calibration might use EC-based measures only, and even possible without using sediment yield” is somewhat a bit bold, I think. That the relatively smaller errors of EC-based approaches were valid in this study, at least to some extent, was because the soil surface was prepared with compaction and heavy bulk density. Some minor changes in EC, such as the settling of soil surface after wetting, the removal and in turn runout of depositional sediment over time, the periodic initiations of different rills and rejuvenated eroding surface, and the progressive equilibrium of runoff and sediment in the intervals of rill development, may trigger major changes in sediment yield.

Overall, this is a well-written manuscript offering a huge amount of information and new thoughts. Yet, I think it would be even better appreciated, if the authors may consider to trim and condense it a bit shorter (even just for the sake of less APC 😊).
Citation: https://doi.org/10.5194/egusphere-2024-2648-RC2
- AC2: 'Reply on RC2', Anette Eltner, 20 Nov 2024
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2024/egusphere-2024-2648/egusphere-2024-2648-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2024-2648-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (06 Dec 2024) by Nikolaus J. Kuhn

AR by Anette Eltner on behalf of the Authors (14 Jan 2025) Author's response Manuscript

ED: Referee Nomination & Report Request started (22 Jan 2025) by Nikolaus J. Kuhn

RR by Pedro Batista (17 Feb 2025)

EF by Polina Shvedko (06 Feb 2025) Author's tracked changes

ED: Publish subject to minor revisions (review by editor) (18 Feb 2025) by Nikolaus J. Kuhn

AR by Anette Eltner on behalf of the Authors (03 Mar 2025) Author's response Author's tracked changes Manuscript

ED: Publish subject to technical corrections (15 Mar 2025) by Nikolaus J. Kuhn

ED: Publish as is (17 Mar 2025) by Peter Fiener (Executive editor)

AR by Anette Eltner on behalf of the Authors (17 Mar 2025) Manuscript

Executive editor

Annette Eltner et al. presented a very insightful and innovative study, which is exactly what is needed in erosion modelling. The high quality of the manuscript was underlined by both anonymous reviewers who were quite enthusiastic about it. Overall, it is a manuscript worth reading for new ideas in erosion modelling.

Using 3D observations with high spatio-temporal resolution to calibrate and evaluate a process-focused cellular automaton model of soil erosion by water

Download

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection