08 Nov 2021
08 Nov 2021
Status: a revised version of this preprint is currently under review for the journal SOIL.

Performance of three machine learning algorithms for predicting soil organic carbon in German agricultural soil

Ali Sakhaee1, Anika Gebauer2, Mareike Ließ2, and Axel Don1 Ali Sakhaee et al.
  • 1Thünen Institute of Climate Smart Agriculture, Braunschweig, Germany
  • 2Department Soil System Science, Helmholtz Centre for Environmental Research – UFZ, Halle (Saale), Germany

Abstract. Soil organic carbon (SOC), as the largest terrestrial carbon pool, has the potential to influence climate change and mitigation, and consequently SOC monitoring is important in the frameworks of different international treaties. There is therefore a need for high resolution SOC maps. Machine learning (ML) offers new opportunities to do this due to its capability for data mining of large datasets. The aim of this study, therefore, was to test three commonly used algorithms in digital soil mapping – random forest (RF), boosted regression trees (BRT) and support vector machine for regression (SVR) – on the first German Agricultural Soil Inventory to model agricultural topsoil SOC content. Nested cross-validation was implemented for model evaluation and parameter tuning. Moreover, grid search and differential evolution algorithm were applied to ensure that each algorithm was tuned and optimised suitably. The SOC content of the German Agricultural Soil Inventory was highly variable, ranging from 4 g kg−1 to 480 g kg−1. However, only 4 % of all soils contained more than 87 g kg−1 SOC and were considered organic or degraded organic soils. The results show that SVR provided the best performance with RMSE of 32 g kg−1 when the algorithms were trained on the full dataset. However, the average RMSE of all algorithms decreased by 34 % when mineral and organic soils were modeled separately, with the best result from SVR with RMSE of 21 g kg−1. Model performance is often limited by the size and quality of the available soil dataset for calibration and validation. Therefore, the impact of enlarging the training data was tested by including 1223 data points from the European Land Use/Land Cover Area Frame Survey for agricultural sites in Germany. The model performance was enhanced for maximum 1 % for mineral soils and 2 % for organic soils. Despite the capability of machine learning algorithms in general, and particularly SVR, in modelling SOC on a national scale, the study showed that the most important to improve the model performance was separate modelling of mineral and organic soils.

Ali Sakhaee et al.

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on soil-2021-107', Anonymous Referee #1, 27 Dec 2021
    • AC1: 'Reply on RC1', Ali Sakhaee, 20 Feb 2022
  • RC2: 'Comment on soil-2021-107', Anonymous Referee #2, 13 Jan 2022
    • AC2: 'Reply on RC2', Ali Sakhaee, 20 Feb 2022

Ali Sakhaee et al.

Ali Sakhaee et al.


Total article views: 680 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
462 202 16 680 45 10 8
  • HTML: 462
  • PDF: 202
  • XML: 16
  • Total: 680
  • Supplement: 45
  • BibTeX: 10
  • EndNote: 8
Views and downloads (calculated since 08 Nov 2021)
Cumulative views and downloads (calculated since 08 Nov 2021)

Viewed (geographical distribution)

Total article views: 668 (including HTML, PDF, and XML) Thereof 668 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 27 Jun 2022
Short summary
The demand on high resolution maps is increasing drastically with soil carbon becoming key component of climate-smart agriculture. Meanwhile, machine learning algorithms get into wide application and open up new solutions in soil mapping. This paper shows which algorithms are performing best and how soil inventory data can be used for digital soil mapping most efficiently. This paper explores the different options and methods to derive high resolution soil carbon data at large regional scale.