Digital soil mapping (DSM) has been widely used as a cost-effective method for generating soil maps. However, current DSM data representation rarely incorporates contextual information of the landscape. DSM models are usually calibrated using point observations intersected with spatially corresponding point covariates. Here, we demonstrate the use of the convolutional neural network (CNN) model that incorporates contextual information surrounding an observation to significantly improve the prediction accuracy over conventional DSM models. We describe a CNN model that takes inputs as images of covariates and explores spatial contextual information by finding non-linear local spatial relationships of neighbouring pixels. Unique features of the proposed model include input represented as a 3-D stack of images, data augmentation to reduce overfitting, and the simultaneous prediction of multiple outputs. Using a soil mapping example in Chile, the CNN model was trained to simultaneously predict soil organic carbon at multiples depths across the country. The results showed that, in this study, the CNN model reduced the error by 30 % compared with conventional techniques that only used point information of covariates. In the example of country-wide mapping at 100 m resolution, the neighbourhood size from 3 to 9 pixels is more effective than at a point location and larger neighbourhood sizes. In addition, the CNN model produces less prediction uncertainty and it is able to predict soil carbon at deeper soil layers more accurately. Because the CNN model takes the covariate represented as images, it offers a simple and effective framework for future DSM models.

Digital soil mapping (DSM) has now been widely used globally for mapping soil
classes and properties

The formalisation of the DSM methodology was done by the publication of

The usual steps for deriving the scorpan spatial soil prediction
functions include intersecting soil observations (point data) with the
scorpan factors (raster images at a particular resolution) and
calibrating a prediction function

Attempts have been made to incorporate more local information in the
scorpan covariates, in particular topography. Approaches to include
covariate information about the vicinity around the observations

DSM can be thought of as linking observable landscape structure and soil
processes expressed as observed soil properties. To effectively link
structure and processes,

Spatial filtering, multi-scale terrain calculation, and contextual mapping
approaches require the preprocessing of each covariate independently. The
useful scale for each covariate needs to be figured out via numerical
experiments and most of the time the process relies on ad hoc decisions.
Here, we take advantage of the success of deep learning models that are used
for image recognition, as an effective tool in DSM to optimally search for
local contextual information of covariates. This work aims to expand the
classic DSM approach by including information about the vicinity around

The theoretical background of DSM is based on the relationship between a soil
attribute and soil-forming factors. In practice, a single soil observation is
usually described as a point

Soils are highly dependent on their position in the landscape, and
information at a particular pixel might not be sufficient to represent that
complex relationship. Our method expands the classic DSM approach by
replacing the covariates, usually represented as a vector, with a 3-D array
with shape

As described in the introduction, while multi-scale or contextual mapping
approaches have been used in DSM, they still rely on a vector representation
of covariates and rely on machine learning methods such as random forest to
select important predictors. While deep learning methods have been used in
DSM (e.g.

In the following sections, we introduce the use of convolutional neural networks to exploit spatial information of covariates that will perform a more effective DSM.

Representation of the vicinity around a soil observation

Deep learning is a machine learning method that is able to learn the
representation of data through a series of processing layers. In agriculture
and environmental mapping, it is mainly used in hyperspectral and
multispectral image classification problems, e.g. land cover classification

In this section we briefly introduce CNNs and some associated methods used
during this work. For a more detailed and general description about CNNs we
refer the reader to

CNNs are based on the concept of a layer of convolving windows which move
along a data array in order to detect features (e.g. edges) of the data by
using different filters (Fig.

A CNN has a number of three-dimensional hidden layers, with each layer learning to
detect different features of the input images

Example of the first three steps of a convolution of a

A pooling operation merges similar features by performing non-linear
down-sampling. Here we used max-pooling layers which combine inputs from a
small

To obtain optimal weights for the network, we train the network using a
training dataset. Weights were adjusted based on a gradient-based algorithm
to minimise the error using an Adam optimiser

CNNs have the capacity to predict multiple properties simultaneously. By doing so,
a multi-task CNN is capable of sharing learned representations between
different targets and also using the other targets as “clues” during the
prediction process. In consequence, the error of the simultaneous prediction
is generally lower compared with a single prediction for each target

In DSM, where the combination of large extents, high resolution, and bootstrap routines leads to running multiple model realisations on billions of pixels, combined with the fact that CNNs use a group of pixels around the soil observation instead of a single pixel, the time and computational resources required for training and inference are an important factor. Due to the simultaneous training and inference of multiple targets, a multi-task CNN presents the advantage of reducing both training and inference time compared with a single-task model.

The data used in this work correspond to Chilean soil information. Since most
observations are distributed on agricultural lands, we complemented that
information with a second small data collection compiled from the literature
and collaborators. We selected soil organic carbon (SOC) content (%) at depths
0–5, 5–15, 15–30, 30–60, and 60–100 cm as our target attribute. In total, 485 soil
profiles were used after excluding soil profiles with total depth lower than
100 cm (in order to assure that all the profiles have observations at all
depth intervals). For more details about the data and depth standardisation
we refer the reader to

As covariates, we used (a) a digital elevation model (HydroSHEDS,

Deep learning techniques are described as “data-hungry” since they usually
work better with large volumes of data. The direct effect of data
augmentation is to generate new samples by modifying the original data
without changing its meaning

A secondary effect of data augmentation is regularisation, reducing the
variance of the model and overfitting

The multi-task CNN used in this study (Fig.

Architecture of the multi-task network. “Shared layers” represent the layers shared by all the depth ranges. Each branch, one per depth range, first flattens the information to a 1-D array, followed by a series of two fully connected layers and a fully connected layer of size equal to 1, which corresponds to the final prediction.

The multiple connection between the layers generates a high number of
parameters. In order to reduce the risk of overfitting, we introduce a
dropout rate. In between the layers, 30 % of the connections were randomly
disconnected

Sequence of layers used to build the multi-task neural network.

As explained in Sect.

As the vicinity size increases, so does the number of parameters of the
network (considering a fixed network architecture) and the risk of
overfitting. To minimise overfitting, we modified the architecture of the
network depending on the vicinity size (Table

List of modifications made to the base network architecture for specific input window sizes.

First, 10 % (

As a control, we compared our results with a previous study by

In this work (and in

The CNN was implemented in Python (v3.6.2;

To generalise and improve the CNN model, we created new data using only
information from the training data by rotating the original image input. Data
augmentation was effective at reducing model error and variability
(Fig.

In terms of the data spatial autocorrelation, we need to consider that after augmenting the data we have four samples in the same locations with exactly the same SOC content, therefore assuming that there is no variance when distance is equal to 0. That is theoretically true if we consider that the distance is exactly equal to 0. In practice, when calculating the semivariogram, the semivariance value of the first bin will be lower, but that does not significantly affect the final model.

Effect of using data augmentation as a pretreatment on a 7 pixel

Effect of vicinity size on prediction error, by depth range.
Ref_

To incorporate contextual information for DSM prediction, we represent the
input as an image. The image is represented as observation in the centre,
with surrounding pixels in a square format. The size of the neighbourhood
window (vicinity) has a significant effect on the prediction error
(Fig.

Distribution of the original dataset and the test dataset. The random sampling excludes some observation with high SOC values.

As described in Sect.

Soil-forming factors interact in complex ways and affect soil properties with
different strength. At the local scale, a broader context (i.e. larger vicinity
size) does not necessarily provide extra information to the model, for
instance when one of the factors is relatively homogeneous. The extra
information could be even detrimental if the vicinity size is well beyond the
area of influence of a factor, which is what probably happened when we
increased the vicinity size above 9 pixels (radius

We compared our approach with the Cubist model used in our previous study

To compare our results with a method that uses contextual information, we ran
a test using wavelet decomposition as per

Our approach uses a multi-task CNN to predict multiple depths simultaneously
in order to produce a synergistic effect. Compared with predicting each depth
range in isolation by training a network with the same structure
(Sect.

In DSM, there are two main approaches to deal with the vertical variation of
a soil attribute: 2.5-D and 3-D modelling. In the first one, an independent
model is fitted for each depth range. The latter explicitly incorporates
depth in order to obtain a single model for the whole profile. Interestingly,
both approaches show a decrease in the variance explained by the model as the
prediction depth increases. In a 3-D mapping of SOC for a 125 km

The prediction of the adjacent layers served as guidance, producing a
synergistic effect. A soil attribute through a profile usually has a
predictable behaviour (unless there are lithological discontinuities), which
has been described by many authors in the form of depth functions

Percentage change in model

Vertical SOC distribution for 20 randomly selected profiles. Predictions correspond to the multi-task CNN.

Visually, the maps generated with the Cubist tree model and our multi-task
CNN showed differences (Fig.

Detailed view of the

A recommended DSM practice is to present a map of a predicted attribute along
its associated uncertainty

In terms of the spatial patterns of the uncertainty
(Fig.

Median prediction interval width (PIW, SOC %) and proportion of observations that fell within the 90 % prediction interval (PICP) estimated at the test dataset locations. For the Cubist model, values were extracted from the final maps. For the CNN models, the values correspond to the mean of the 100 bootstrap iterations.

Percentage change on the prediction interval width using as a
reference a Cubist model

The incorporation of contextual information into DSM models is an important aspect
that deserves more attention. Since a soil surveyor will look at the
surrounding landscape to make a prediction of soil type, DSM models should
also incorporate information surrounding an observation. We demonstrated the
use of a convolutional neural network as an efficient, effective, and
accurate method to achieve this goal. In particular we introduce a deep
learning model for DSM which has the following innovative features:

Overall, in this study, we observed an error reduction of 30 % compared with conventional techniques. The resulting prediction also has less uncertainty. Furthermore, the use of this data structure with CNN seems to eliminate artefacts generally found in DSM products due to the incompatible scale of covariates and sharp discontinuities due to tree models.

A CNN can handle a large number of covariates and has advantages over other
machine learning algorithms used in DSM, such as random forests and Cubist
regression tree models, because its architecture is flexible and explicitly takes
spatial information of covariates around observations. While there have been
attempts to include information surrounding an observation as covariates in a
random forest model, those inputs still do not have a spatial relationship. CNN
does not require preprocessing such as wavelet transformation, rather such
a function is built into the model. There are other features such as handling
missing values via data imputation

The example presented in this paper is for a country-wide modelling at 100 m resolution, and we need to further test such an approach in the regional to landscape mapping. The CNN model would be highly suitable for mapping soil class. In addition, the presented model can be used for other environmental mapping.

The data were manually extracted from books, which are publicly available and cited on Padarian (2017).

The authors declare that they have no conflict of interest.

This research was supported by Sydney Informatics Hub, funded by the University of Sydney. Edited by: Bas van Wesemael Reviewed by: two anonymous referees