Development of pedotransfer functions for tropical mountain soilscapes: Spotlight on parameter tuning in machine learning

Machine learning algorithms are good in computing non-linear problems and fitting complex composite functions, which makes them an adequate tool to address multiple environmental research questions. One important application is the development of pedotransfer functions (PTF). This study aims to develop water retention PTFs for two remote tropical 10 mountain regions of rather different soil-landscapes, dominated by (1) organic soils under volcanic influence, and (2) tropical mineral soils. Two tuning procedures were compared to fit boosted regression tree models: 1) tuning by grid search, which is the standard approach in pedometrics and 2) tuning by differential evolution optimization. A nested cross-validation approach was applied to generate robust models. The developed area-specific PTFs outrival other more general PTFs. Furthermore, the first PTF for typical soils of Páramo landscapes, i.e. organic soils under volcanic influence, is presented. 15 Overall, results confirmed the differential evolution algorithm’s high potential for tuning machine learning models. While models based on tuning by grid search roughly predicted the response variables’ mean for both areas, models applying the differential evolution algorithm for parameter tuning explained up to 22 times more of the response variables’ variance.

mentioned above, it usually leads to better results and a comparatively low computing time (Price et al., 2005). 65 This study first of all aims to develop water retention PTFs for two tropical soil-landscapes dominated by organic soils under volcanic influence, like they are commonly occurring in Páramo regions, and tropical soils of a dry climate. Currently, PTFs suitable for the soils of these regions are at best very few, if any, at all. As we assume the parameter tuning technique to affect the performance of the machine learning based PTFs, our second and equally important aim is to test the power of differential evolution optimization for tuning by comparing it to the grid search approach usually applied in pedometrics. We 70 assume that the pre-eminence of optimization for parameter tuning in machine learning will particularly show when applying it to a machine learning algorithm that requires the fitting of numerous real-valued parameters. This is why we have chosen to fit boosted regression tree models. According to our knowledge, this is the first time both parameter tuning techniques are directly compared for a machine learning application in soil science.

Research areas
The two investigated soil-landscapes are situated in southern Ecuador (Fig. 1). The Quinuas Catchment encompasses an area of about 93 km 2 , including parts of the Cajas National Park (Fig. 1c). Being located between 3000 and 4400 m above sea level (a.s.l.), the mean annual temperature ranges between 5.3 and 8.7 °C with no seasonality (Carrillo-Rojas et al., 2016).
With peaks in March/May and October (Celleri et al., 2007) mean annual precipitation varies between 900 and 1600 mm 80 (Crespo et al., 2011). Due to volcanic ash deposits and the cold and wet climate, soils with a low bulk density and high SOC contents are typical . The Quinuas Catchment can be allocated to the Páramo ecosystem (Guio Blanco et al., 2018), which plays a major role in the water supply of the inter-Andean region (Buytaert et al., 2006a(Buytaert et al., , 2006b. The Laipuna dry forest region is part of the "Laipuna Conservation and Development Area" and covers approximately 16 km² (Fig. 1d). Its temperature profile shows little seasonal variability, while there is a rain period from January to May. 85 Depending on the altitude ranging between 400 and 1500 m a.s.l., the mean annual temperature varies between 16 and 23 °C and mean annual precipitation between 540 mm and 630 mm Richter, 2011b, 2011a). Additionally, El Niño-Southern Oscillation influences the area (Bendix et al., 2003(Bendix et al., , 2011. Laipuna is part of an ecosystem with high biodiversity and many endemic species (Best and Kessler, 1995;Linares-Palomino et al., 2009), which are strongly adapted to the ecosystem and may be threatened by possible climate-induced changes of the water supply. 90  Ließ (2015). Topographical data use with permission from the Ecuadorian Geographical Institute (2013, national base, scale 1:50.000), further GIS data provided by NCI and ETAPA.

Soil data 95
To select representative datasets for the two areas, random stratified sampling was applied using the algorithm "QC-arLUS" (Ließ, 2015). It allows representing a research area`s complete landscape structure in sampling site selection while actual site selection is limited to the accessible area. Two sites were sampled per landscape stratum, resulting in 46 sites for Quinuas and 55 for Laipuna. Water retention at pF 0, 0.5, 1.5 and 2.5 was measured applying hanging water columns of increasing length to undisturbed samples of 100 cm³. For BD measurements, the samples were oven-dried at 105 °C for three days. For 100 Laipuna, disturbed samples were oven-dried at 40 °C, sieved to 2 mm and PSD was determined according to DIN ISO 11277:2002-08 in two (sand fractions) and three (clay and silt fractions) replicate samples. Measurements distinguish the following particle size classes: clay (< 2 µm), fine silt (2 -6.3 µm), medium silt ( for carbonates prior to the SOC determination using dry combustion. Multivariate outlier removal was carried out using the 105 "hclust" function from R-package "fastcluster" (Müller, 2018), version 3.4.4.

Boosted regression trees
The BRT algorithm combines the machine learning techniques regression trees and boosting. Tree models use decision rules, which involve the predictor variables to recursively partition the response variable data into increasingly similar subgroups until terminal nodes are reached (Kuhn and Johnson, 2013). The values of the terminal nodes are averaged to be used for 110 prediction (James et al., 2017). The boosting machine learning technique improves the overall model accuracy by combining a number of simple models (Witten and Frank, 2005). To develop the PTFs, BRT models were trained using the "gbm" Rpackage, version 2.1.3 (Ridgeway, 2017), which is based on Friedman's (2002) stochastic gradient boosting. This boosting technique iteratively fits a number of simple regression tree models to random data subsets to finally make them form a linear combination. Thereby, each new regression tree is fitted to explain most of the previous model's residuals. To apply a 115 BRT model, usually up to four parameters are tuned: number of trees (n.trees), shrinkage, interaction depth, and bag fraction (e.g. Ottoy et al., 2017;Wang et al., 2017;Yang et al., 2016). Elith et al. (2008) provide a detailed analysis of their function: The n.trees parameter describes the number of regression tree models to be iteratively fitted. Shrinkage defines the model's learning rate by scaling the outcome of each simple regression tree, thereby controlling their contribution to the final prediction. The interaction depth parameter controls the number of splits in each tree to divide the response variable data into 120 subgroups. The bag fraction parameter determines the size of the randomly selected data subsets. This is able to reduce the risk of overfitting (Friedman, 2002), but may lower the model robustness (Elith et al., 2008). To develop PTFs for Quinuas and Laipuna, these four parameters were tuned following the steps described in Section 2.4 and 2.5.

Parameter tuning
Parameter tuning was done in two different ways: (1) by grid search and (2) by optimization applying the differential 125 evolution algorithm. In order to reduce computing time, the number of values to be compared by grid search was limited to five for each of the parameters. The selected values were based on the recommendations of Elith et al. (2008) and Ridgeway (2012). They are summarised in Table 1 four-dimensional vectors were compared. In contrast to this, the mathematical optimization algorithm "differential evolution" by Storn and Price (1995) is able to search the multivariate space between defined upper and lower parameter 130 limits. The parameter values are optimized by minimizing an objective function, which defines their suitability. The objective function is allowed to be stochastic and noisy and does not need to be differentiable or continuous (Mullen et al., 2011). Following evolutionary theory, this is done by repeating three steps for i iterations: mutation, crossover, and selection ( Fig. 2). At first, an initial parent population of -dimensional parameter vectors is generated randomly. With each iteration i, these vectors are changed by mutation and randomly mixed by crossover to generate a new population. Selection 135 compares the objective function values belonging to the parent and the new vector to decide whether a new vector replaces its parent vector. Differential evolution was applied using the R-package "DEoptim", version 2.2.4 (Ardia et al., 2016). For each tuning parameter, optimization limits correspond to the smallest and largest grid search values (Table 1). The number of vectors of size = 4 tuning parameters was set to = 100. The R-package's default mutation strategy was used, which changes each parent vector by adding two summands: (1) the difference between two random parent vectors and (2) the 140 difference between the vector to be perturbed and the best vector found in the parent population. Summands were scaled by the factor 0.8. During crossover the probability of randomly mixing the parent and the mutated vectors' elements was set to 50%. To reduce computing time, the optimisation process was stopped after i max = 10 iterations without improving the objective function or a maximum number of 200 iterations, respectively. Prior to the selection step, the discrete tuning parameter values (n.trees, interaction depth) were rounded, as the differential evolution algorithm treats all values as real 145 numbers during mutation and crossover. To select the final tuning parameter values, grid search and differential evolution both minimised the cross validated RMSE T as objective function. RMSE T calculation is explained in Section 2.5.  bag fraction 0.5; 0.6; 0.7; 0.8; 0.9 0.5 0.9

Performance evaluation
In order to build robust models, we followed a nested cross-validation approach. Stratified five-fold cross-validation (CV) 160 was applied for two purposes: (1) to conduct robust parameter tuning on resampled data subsets by either grid search or the differential evolution algorithm and (2) to evaluate the final performance of models built on tuned parameter values. CV provides error metrics with good bias and variance properties, is beneficial for small datasets and avoids overfitting (Arlot and Celisse, 2009;James et al., 2017). Following the steps shown in Fig. 3, the stratified five-fold CV was implemented with five repetitions for model evaluation and one repetition for parameter tuning. In Step 1, the complete dataset (n = 100%) was 165 split into five folds with each of them (n = 20%) once used as the test set, leaving the remaining folds as the model training set. For resampling in parameter tuning (Step 2), each model training set was again subdivided, similar to Step 1. Each tuning parameter vector in grid search and the differential evolution algorithm was evaluated by the cross-validated RMSE T (Step 3, Step 4). By comparing the RMSE T , the best vector of tuning parameter values for each model evaluation training set was selected and applied (Step 5, Step 6). To assess model performance, the coefficient of determination (R² E ) and the root 170 mean squared error (RMSE E ) of model evaluation were calculated by predicting the associated test set data (Step 7). To divide the datasets into folds, the function "partition_cv_strat" from R-package "sperrorest", version 2.0.0 (Brenning et al., 2017) was applied, with three equal probability strata of the response variable's density function. 175 (2018). The tree icons symbolize BRT models, which are repeatedly (circular arrows) trained and tested on different data sets. The numbers within black circles belong to the modeling steps' description. RMSE T = root mean squared error of parameter tuning, RMSE E = root mean squared error of model evaluation, R 2 E = coefficient of determination of model evaluation, GS = grid search, DE = differential evolution.
3 Results and discussion 180

Model input
After removing multivariate outliers, the dataset contained 51 and 46 values for Laipuna and Quinuas, respectively. A summary of the data is shown in Fig. 4 and 5. As expected both areas show huge differences regarding the values of response and predictor variables. BD values in Quinuas range from 64 to 807 kg m -3 , while SOC values vary between 8.8 and 46.4 wt. %. SOC values are normally distributed, while BD data is skewed right. Averagely decreasing by 22%, water 185 retention ranges from 0.25 cm 3 cm -3 (pF 2.5) to 0.94 cm 3 cm -3 (pF 0). While the data is skewed to the right for pF 0, the data distribution for the other pF values is left-skewed. For Laipuna, BD ranges between 1157 and 1727 kg m -3 , displaying a right-skewed distribution. The SOC content is normally distributed and varies between 0.4 and 3.8 wt. %. Clay content ranges between 17 and 48 %, silt between 24 and 45 % and sand between 14 and 50 %. Especially fine and medium silt show a skewed distribution. Water retention values are ranging between 0.25 cm 3 cm -3 (pF 2.5) and 0.61 cm 3 cm -3 (pF 0). On 190 average, they decrease by 37 ± 0.09 % with increasing water tension. Data is skewed right for pF 0 and skewed left for pF 0.5.
Quinuas soils go along with the low density, porous soils, rich in organic material, that are found throughout the Paute river basin Poulenard et al., 2003). Loosely bedded volcanic ash deposits further explain the low bulk density values . High SOC contents are caused by low redox potential and the presence of 195 organometallic complexes inhibiting degradation processes (Buytaert et al., 2006a). Comparatively high water retention values can be attributed to the porous structure of Páramo soils being able to retain a lot of water . The relatively small decrease in water retention with increasing water tension is assumed to be caused by the high amounts of organic matter being characterized by a high water holding capacity . Measured BD and SOC content are in accordance with data observed for other Páramo regions (e.g. Buytaert et al., 2007Buytaert et al., , 2006b. The water retention values 200 are also comparable to data obtained in other Páramo areas (Buytaert et al., 2005) and soils with high organic matter content (Schwärzel et al., 2002(Schwärzel et al., , 2006. Extreme values in BD and water retention ( Fig. 4 and Fig. 5) correspond to soil samples with a higher proportion of mineral components or andic properties (Guio Blanco et al., 2018). Expecting these values to be reliable, they were not removed from the model input. The BD and SOC data measured for the Laipuna soil samples correspond to other dry forest ecosystems (e.g. Conti et al., 2014;de Araújo Filho et al., 2017;Singh et al., 2015), whereas 205 the PSD shows higher clay content compared to the dry forest soils investigated by Cotler andOrtega-Larrocea (2006), Jha et al. (1996) and Sagar et al. (2003). Measured water retention values are higher than those obtained in a tropical dry forest in Brazil (Vasques et al., 2016), probably caused by the higher clay content enhancing the water holding capacity.

Model performance
The performance of the final models built on parameters selected by grid search and the differential evolution algorithm is 215 demonstrated by the error metrics R² E and RMSE E in Fig. 6 (Quinuas) and 7 (Laipuna) as well as by scatterplots comparing observed and predicted water retention values in Fig. 8 (Quinuas) and 9 (Laipuna). Models optimized by the differential evolution algorithm show a better predictive performance than models tuned by grid search. Based on scaled water retention values, all grid search models resulted in very similar RMSE E values (mean 0.20 ± 0.01) for Quinuas, while for Laipuna mean RMSE E range between 0.19 ± 0.00 (pF 2.5) and 0.25 ± 0.00 (pF 0 and pF 0.5). Looking at both error metrics, the 220 performance of the differential evolution based Quinuas models worsens with increasing pF values, whereas for Laipuna model performance improves: final models with parameter tuning by differential evolution correspond to mean RMSE E values ranging from 0.11 ± 0.01 (pF 0) to 0.17 ± 0.01 (pF 2.5) for Quinuas and from 0.15 ± 0.01 (pF 2.5) to 0.26 ± 0.01 (pF 0) for Laipuna. Mean R² E values resulting from grid search, are varying between 0.04 ± 0.03 (pF 0) and 0.09 ± 0.07 (pF 1.5) for Quinuas and between 0.03 ± 0.04 (pF 0.5) and 0.05 ± 0.05 (pF 1.5) for Laipuna. The differential evolution algorithm 225 resulted in mean R² E values ranging from 0.58 ± 0.09 (pF 2.5) to 0.79 ± 0.04 (pF 0) for Quinuas and from 0.27 ± 0.08 (pF 0) to 0.61 ± 0.04 (pF 2.5) for Laipuna. In comparison to grid search, models tuned by differential evolution show a higher predictive performance for all Quinuas models and the Laipuna models for pF 1.5 and pF 2.5: mean R² E values are up to 22 (Quinuas, pF 0) and 18 (Laipuna, pF 2.5) times higher, while RMSE E values are up to 1.9 (Quinuas, pF 0) and 1.3 (Laipuna, pF 2.5) times lower than those obtained by grid search. This corresponds to the scatterplots ( Fig. 11 and 12): the largest difference between grid search and differential evolution can be recognized for the pF 0 (Quinuas) and pF 2.5 (Laipuna) models. Additionally, as demonstrated by the scatterplots, the grid search models roughly predict the water retention values' mean, while the models with parameter tuning by differential evolution are able to explain more of the observations' variance. However, the five grid search predictions for each observation (Fig. 9 a-d and 10 a-d), cover a smaller range than the differential evolution predictions. Especially the differential evolution results of the Laipuna pF 0 and pF 0.5 models are 235 characterized by comparatively high variance.
Probably caused by the better adjustment to the modeling problem, differential evolution models are able to explain more of the water retention's variance than the models tuned by grid search. The higher variability of the differential evolution predictions corresponds to the differential evolution tuning parameter values covering a wider range than those achieved by applying grid search (Section 3.2). For Laipuna, the improvement of the differential evolution models with increasing pF 240 values is assumed to be caused by the used predictors. With increasing pF values the influence of PSD on water retention increases. At the same time the influence of BD decreases, which improves model performance as BD measurements were not corrected for stone content and are, therefore, not as good a predictor as they could be. For Quinuas the decreasing predictive performance with increasing pF values can probably be attributed to the lack of further predictors related to the soil matrix. PSD measurements were not conducted due to the prevalence of soils with organic properties in this area. 245 Overall, the predictive power of all differential evolution based Quinuas models and the Laipuna pF 2.5 models are comparable to other studies. Botula et al. (2013) for example obtained R² ranging from 0.32 to 0.68 (pF 0) and from 0.60 to 0.68 (pF 1.5) by using k-nearest neighbor for soil data originating from the Lower Congo. Keshavarzi et al. (2010) used an artificial neural network to predict water retention at different pF values for soils from the Qazvin province in Iran. Haghverdi et al. (2012) used the same machine learning technique on soils from northeastern and northern Iran. While 250 Keshavarzi et al. (2010) gained R 2 of 0.77 (pF 2.5) and 0.72 (pF 4.2), Haghverdi et al. (2012) reached R 2 ranging from 0.81 to 0.95.
Applying existing PTFs on the Laipuna dataset confirmed the good performance of the differential evolution BRT models.
PTFs were selected based on four criteria: (1) developed for tropical soils, (2) similar predictor variables, (3) regression equation provided and (4) included in the peer-reviewed Clarivate Analytics' Web of Science database. Following the 255 approach of Shang (2013), soil texture was previously converted to the USDA classification system by spline interpolation.
Mean RMSE E values of the differential evolution tuned BRT models were between 1.5 times (pF 2.5, Hartemink (2011) andTomasella et al. (2000)) and 8.8 times (pF 1.5, Barros et al. (2013) better ( Table 2). Because of differences in the predictors, it is difficult to find organic PTFs applicable to the Quinuas dataset. An exhaustive literature search revealed only the PTF of Boelter (1969), who related water retention at pF 0 to BD for temperate peat soils in 260 northern Minnesota. Water retention values predicted by the existing PTF differed a lot from measured values. For BD higher than 370 kg m -3 predictions became even negative. While applying differential evolution BRT models resulted in a mean RMSE E of 0.032 (unscaled), applying the Minnesota PTF resulted only in an RMSE of 1.860. The high RMSE value is assumed to be caused by large differences between the Minnesota and Quinuas soils and underlines the necessity of developing water retention PTFs specifically for tropical organic soils. 265 In general, we expect BRT model performance to improve further by removing extreme values in the model input or by using larger datasets. Especially for Quinuas, low water retention values are underrepresented in the dataset. According to Guio Blanco et al. (2018), these values are primarily observed in the lower part of the river valley and include measurements from mineral soils.

Model parameters
The final tuning parameter values obtained by grid search and the differential evolution algorithm are summarised in Fig. 10 (Quinuas) and 11 (Laipuna). The displayed boxplots show the selected 25 tuning parameter values corresponding to the 290 models built by five-fold CV with five repetitions. Grid search compared 625 previously defined parameter vectors, while the differential evolution algorithm tested at least 1000 parameter vectors in the defined ranges of the four-dimensional parameter space. Differences between the two parameter tuning techniques are most distinct for n.trees and shrinkage for both research areas. Neglecting outliers, values obtained by the differential evolution algorithm cover a wider range than those resulting from grid search: while n.trees was in most cases set to the lowest tested value (100) by grid search, the 295 differential evolution algorithm resulted in mean n.trees values ranging from 310 ± 321 (pF 0) to 810 ± 1132 (pF 1.5) for Quinuas and from 727 ± 851 (pF 0) to 1688 ± 1345 (pF 2.5) for Laipuna. Thereby, the mean n.trees values obtained by differential evolution parameter tuning are more than five (Quinuas) and more than ten times (Laipuna) higher than the mean grid search values. Neglecting extreme values, the shrinkage values resulting from the differential evolution algorithm also cover a wider range than the values obtained by the grid search tuning technique. For both areas, the shrinkage values were 300 usually set to 0.001 or 0.01 by grid search, while applying the differential evolution algorithm resulted in mean shrinkage values ranging from 0.040 ± 0.028 (pF 0.5) to 0.047 ± 0.030 (pF 2.5) for Quinuas and from 0.034 ± 0.03 (pF 0) to 0.062 ± 0.027 (pF 2.5) for Laipuna. On average, the differential evolution shrinkage values are approximately 14 (Quinuas) and 17 (Laipuna) times higher than those obtained by grid search. The observed pattern is more complex for the other two tuning parameters: interaction depth and bag fraction. Although the selected parameter ranges differ for most pF values, the median 305 interaction depth values are the same for half of the cases for grid search and tuning by the differential evolution algorithm.
The median of the selected bag fraction is at the upper limit for the Quinuas models that were tuned by the differential evolution algorithm, while grid search resulted in median bag fraction values at the lower limit in two cases. Laipuna bag fraction values do not show this pronounced difference between grid search and tuning by the differential evolution algorithm. 310 The selected tuning parameter values correspond to the differential evolution based models having more predictive power than those adapted by the common grid search approach. Usually higher n.trees values, as received from the differential evolution algorithm, are known to improve model performance (Elith et al., 2008). However, according to the results of Elith et al. (2008), by using more trees the shrinkage parameter gets smaller. The comparatively high differential evolution shrinkage values might be an indication of the n.trees values still being too small. For both areas, the differential evolution 315 values for n.trees and shrinkage, covering a wider range than the grid search results, might be explained by the differential evolution optimization not being completed or being stuck in a local optimum. This corresponds to the high prediction variability of the final differential evolution models (Section 3.2). As n.trees and shrinkage control how precisely the model learns the input data's structure, these parameters are assumed to be more important than interaction depth and bag fraction.
In this case, there might not even be an optimum for the latter two parameters. Especially for Laipuna, this might explain the 320 interaction depth and bag fraction values of both tuning techniques covering the whole range of possible values. For Laipuna the bag fraction differences between differential evolution and grid search tuning remain unexplained. The results of both tuning techniques might be improved by testing more parameter values. For grid search, this can be realized by increasing the number of values to be compared for each tuning parameter. Increasing the number of iterations and starting with larger and thereby more heterogeneous initial populations is expected to do the same for differential evolution. This might also 325 result in less variable differential evolution results. Besides increasing the number of iterations and the number of initial vectors, the risk of getting stuck in a local optimum can also be reduced by changing the parameters "crossover probability" and the "mutation scaling factor" as well as applying another mutation strategy (Das and Suganthan, 2011). To overcome the problem of choosing the right control parameters as well as the mutation strategy, self-adaptive differential evolution algorithms (e.g. Nahvi et al., 2016;Pierezan et al., 2017;Qin et al., 2009), which are able to automatically adjust their 330 settings during the optimisation process, could be applied in future studies. In general, the superiority of differential evolution needs to be verified by applying it to further machine learning algorithms and applications.

Conclusions
We successfully developed new PTFs for two tropical mountain regions. The comparison with ready available PTFs showed their high performance to predict soil water retention for the soils in these areas. This is of particular importance for soil process and hydrological modeling. Whether the two PTFs may also be applied to other areas of similar soils still has to be 345 tested. The developed PTF for the Páramo area provides a novelty since PTFs for tropical organic soils under volcanic influence were unavailable until now. Furthermore, our study presents the first successful application of parameter tuning by differential evolution in PTF development. The comparison with the standard grid search technique revealed the superiority of the differential evolution algorithm and emphasizes the importance of parameter tuning for the successful application of machine learning models. Of course, this finding has to be confirmed by further applications in pedometrics including different types of machine learning algorithms. We, therefore, hope to promote the implementation of optimization for parameter tuning within the pedometrics community by this study.
To sum up, especially for parameter tuning of real-valued machine learning algorithms future applications of the differential evolution algorithm are highly recommended.

Competing interests 355
The authors declare that they have no conflict of interest.