Articles | Volume 6, issue 2
Original research article
17 Nov 2020
Original research article |  | 17 Nov 2020

The influence of training sample size on the accuracy of deep learning models for the prediction of soil properties with near-infrared spectroscopy data

Wartini Ng, Budiman Minasny, Wanderson de Sousa Mendes, and José Alexandre Melo Demattê

Related authors

Digital soil mapping of lithium in Australia
Wartini Ng, Budiman Minasny, Alex McBratney, Patrice de Caritat, and John Wilford
Earth Syst. Sci. Data Discuss.,,, 2023
Revised manuscript accepted for ESSD
Short summary

Related subject area

Soil and methods
Spatial prediction of organic carbon in German agricultural topsoil using machine learning algorithms
Ali Sakhaee, Anika Gebauer, Mareike Ließ, and Axel Don
SOIL, 8, 587–604,,, 2022
Short summary
On the benefits of clustering approaches in digital soil mapping: an application example concerning soil texture regionalization
István Dunkl and Mareike Ließ
SOIL, 8, 541–558,,, 2022
Short summary
An open Soil Structure Library based on X-ray CT data
Ulrich Weller, Lukas Albrecht, Steffen Schlüter, and Hans-Jörg Vogel
SOIL, 8, 507–515,,, 2022
Short summary
Identification of thermal signature and quantification of charcoal in soil using differential scanning calorimetry and benzene polycarboxylic acid (BPCA) markers
Brieuc Hardy, Nils Borchard, and Jens Leifeld
SOIL, 8, 451–466,,, 2022
Short summary
Estimating soil fungal abundance and diversity at a macroecological scale with deep learning spectrotransfer functions
Yuanyuan Yang, Zefang Shen, Andrew Bissett, and Raphael A. Viscarra Rossel
SOIL, 8, 223–235,,, 2022
Short summary

Cited articles

Acquarelli, J., van Laarhoven, T., Gerretzen, J., Tran, T. N., Buydens, L. M. C., and Marchiori, E.: Convolutional neural networks for vibrational spectroscopic data analysis, Anal. Chim. Acta, 954, 22–31,, 2017. 
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous systems, Software available from, available at: (last access: 1 July 2019), 2015. 
Barnes, R. J., Dhanoa, M. S., and Lister, S. J.: Standard Normal Variate Transformation and De-Trending of near-Infrared Diffuse Reflectance Spectra, Appl. Spectrosc., 43, 772–777,, 1989. 
Bellinaso, H., Demattê, J. A. M., and Romeiro, S. A.: Soil Spectral Library and Its Use in Soil Classification, Rev. Bras. Cienc. Solo, 34, 861–870,, 2010. 
Bendor, E. and Banin, A.: Near-Infrared Analysis as a Rapid Method to Simultaneously Evaluate Several Soil Properties, Soil Sci. Soc. Am. J., 59, 364–372,, 1995. 
Short summary
The number of samples utilised to create predictive models affected model performance. This research compares the number of samples needed by a deep learning model to outperform the traditional machine learning models using visible near-infrared spectroscopy data for soil properties predictions. The deep learning model was found to outperform machine learning models when the sample size was above 2000.