Comparing three approaches of spatial disaggregation of legacy soil maps based on DSMART algorithm

Abstract. Enhancing the spatial resolution of pedological information is a great challenge in the field of Digital Soil Mapping (DSM). Several techniques have emerged to disaggregate conventional soil maps initially available at coarser spatial resolution than required for solving environmental and agricultural issues. At the regional level, polygon maps represent soil cover as a tessellation of polygons defining Soil Map Units (SMU), where each SMU can include one or several Soil Type Units (STU) with given proportions derived from expert knowledge. Such polygon maps can be disaggregated at finer spatial resolution by machine learning algorithms using the Disaggregation and Harmonisation of Soil Map Units Through Resampled Classification Trees (DSMART) algorithm. This study aimed to compare three approaches of spatial disaggregation of legacy soil maps based on DSMART decision trees to test the hypothesis that the disaggregation of soil landscape distribution rules may improve the accuracy of the resulting soil maps. Overall, two modified DSMART algorithm (DSMART with extra soil profiles, DSMART with soil landscape relationships) and the original DSMART algorithm were tested. The quality of disaggregated soil maps at 50 m resolution was assessed over a large study area (6775 km2) using an external validation based on independent 135 soil profiles selected by probability sampling, 755 legacy soil profiles and existing detailed 1 : 25 000 soil maps. Pairwise comparisons were also performed, using Shannon entropy measure, to spatially locate differences between disaggregated maps. The main results show that adding soil landscape relationships in the disaggregation process enhances the performance of prediction of soil type distribution. Considering the three most probable STU and using 135 independent soil profiles, the overall accuracy measures are: 19.8 % for DSMART with expert rules against 18.1 % for the original DSMART and 16.9 % for DSMART with extra soil profiles. These measures were almost twofold higher when validated using 3 × 3 windows. They achieved 28.5 % for DSMART with soil landscape relationships, 25.3 % and 21 % for original DSMART and DSMART with extra soil observations, respectively. In general, adding soil landscape relationships as well as extra soil observations constraints the model to predict a specific STU that can occur in specific environmental conditions. Thus, including global soil landscape expert rules in the DSMART algorithm is crucial to obtain consistent soil maps with clear internal disaggregation of SMU across the landscape.



75
To improve soil variability knowledge and overcome the limitation of a coarse mapping scale,

128
This study aimed to test the hypothesis that adding soil landscape relationships in the disaggregation 129 procedure improved the accuracy of produced disaggregated soil maps. This involves assessing the   To assess the quality of disaggregated soil maps, three validation datasets were used (  All soil profiles were allocated after description and analysis by an expert to a suitable STU. Both 176 legacy soil profiles and detailed maps were converted to raster format to perfectly meet the 177 prediction raster at 50m spatial resolution.     All soil environmental covariates were converted to raster format at 50 m spatial resolution.     Including soil landscape relationships in the disaggregation process was explored by Vincent et al.

239
(2018) in a specific regional pedoclimatic context in Brittany (France). Expert soil landscape 240 relationships were used to assign STU to sampling points. These relationships were based on expert 241 pedological knowledge, which takes into account soil parental material as well as topography and 242 waterlogging in the UTS allocation procedure. This approach combines two sources of the dataset 243 to calibrate the model. The first one was derived from semantic information for each SMU/STU 244 combination. It consists in attributing a barcode to each SMU/STU combination, derived from a 245 concatenation of four features contained in the RRP database (parent material, SMU identifier, TPI 246 and waterlogging index), and to compare these barcodes to a stack of regional covariates 247 representing the same four features, to assign each pixel of the study area to a suitable STU. This Hirschberg, 2007). This is a spatial method developed to compare maps in the form of vector 297 objects and it was commonly used in computer science to compare (non spatial) clustering. 298 We divide the entire study area into 2 different sets of regions, referred to as regionalizations R and 299 Z. The first regionalization R divides the domain into n regions ri (i=1 to n) and the second 300 regionalization Z divides the domain into m zones zj (j=1 to m). Superposition of the 2 301 regionalization R and Z divides the domain into n x m segments having aij area. The total area of a 302 region ri is = ∑ ,1 , the total area of a zone zj is = ∑ ,1 and the total of the domain is The SABRE package calculates a degree of spatial agreement between two regionalizations using that measures a homogeneity of a given zone in terms of regions is given via Eq 2.

324
Analogous to homogeneity but with the roles of regions and zones reversed, the dispersion of zones 325 over the entire area is also computed using Shannon entropy (Eq 4 and Eq 7), and a global indicator 326 C (Eq 5) measures a homogeneity of a given region in terms of zones.
β is a coefficient that allows promoting the first or the second regionalization, and by default, β

379
The quality of maps resulting from DSMART based approaches was quantified via the probabilities 380 of occurrence of each STU predicted and the confusion index maps (Fig. 5). The latter measure 381 indicated areas where the probability of occurrence of the two most probable soil types was close.

382
Over the study area, the average probability of occurrence of the most probable soil type achieved These areas were predominantly deep loamy soils or developed in alluvial and colluvium deposits.
392 Figure 6 compares the cumulative area of the STUs estimated from the three disaggregated maps 393 and that derived from the regional soil database. For each STU, its relative predicted area was 394 estimated by counting the number of pixels where it was predicted. For the regional soil database,  (Table 2).    (Table 3).

439
Using a 3 x 3 window of pixels markedly improves the global accuracies, which increased for the 440 two validation datasets (  (Table 3). Visually, the Fig. 8.b map seemed to be more homogeneous than the map Fig. 8  A quantitative comparison between disaggregated soil maps was performed using a novel approach 564 called V measure method. This method was commonly used to assess the spatial agreement 565 between land cover maps and thematic biotic and abiotic factors maps, as done by Nowosad and 566 Stepinski (2018) in the United States, but never before for soil maps.

567
In the present study, V1 (0.53) was larger than V2 (0.47) suggesting that DSMART with expert soil