Quantifying soil and critical zone variability in a forested catchment through digital soil mapping

Quantifying catchment-scale soil property variation yields insights into critical zone evolution and function. The objective of this study was to quantify and predict the spatial distribution of soil properties within a high-elevation forested catchment in southern Arizona, USA, using a combined set of digital soil mapping (DSM) and sampling design techniques to quantify catchment-scale soil spatial variability that would inform interpretation of soil-forming processes. The study focused on a 6 ha catchment on granitic parent materials under mixed-conifer forest, with a mean elevation of 2400 m a.s.l, mean annual temperature of 10 C, and mean annual precipitation of ∼ 85 cm yr. The sample design was developed using a unique combination of iterative principal component analysis (iPCA) of environmental covariates derived from remotely sensed imagery and topography, and a conditioned Latin hypercube sampling (cLHS) scheme. Samples were collected by genetic horizon from 24 soil profiles excavated to the depth of refusal and characterized for soil mineral assemblage, geochemical composition, and general soil physical and chemical properties. Soil properties were extrapolated across the entire catchment using a combination of least-squares linear regression between soil properties and selected environmental covariates, and spatial interpolation or regression residual using inverse distance weighting (IDW). Model results indicated that convergent portions of the landscape contained deeper soils, higher clay and carbon content, and greater Na mass loss relative to adjacent slopes and divergent ridgelines. The results of this study indicated that (i) the coupled application of iPCA and cLHS produced a sampling scheme that captured the greater part of catchment-scale soil variability; (ii) application of relatively simple regression models and IDW interpolation of residuals described well the variance in measured soil properties and predicted spatial correlation of soil properties to landscape structure; and (iii) at this scale of observation, 6 ha catchment, topographic covariates explained more variation in soil properties than vegetation covariates. The DSM techniques applied here provide a framework for interpreting catchment-scale variation in critical zone process and evolution. Future work will focus on coupling results from this coupled empirical–statistical approach to output from mechanistic, process-based numerical models of critical zone process and evolution.


Introduction
The spatial complexity of soils presents a significant hurdle to predicting and modeling critical zone (CZ) processes and characteristics, where the CZ is defined as the earth surface system that extends from the top of the canopy down to groundwater (NRC, 2001).Developing robust data-driven methods that provide accurate, reliable, and high-resolution characterization of soil properties is a major challenge to earth scientists and is needed for better understanding and quantification of CZ process and function such as soil erosion, hydrologic cycling, and carbon cycling (NRC, 2010).Here we address this challenge in a forested catchment using a combination of digital soil mapping (DSM) and statistical approaches to quantify soil physical and chemical properties Published by Copernicus Publications on behalf of the European Geosciences Union.
at high spatial resolution (∼ 2 m pixels).The analyses described herein provide one means to unravel the catchmentscale soil complexity that is central to CZ function and evolution.
The coupled use of DSM and statistically based sampling designs has greatly improved the quality and resolution of predicted soil variability (McBratney et al., 2000(McBratney et al., , 2003;;Park and Vlek, 2002;Kempen et al., 2012;Goovaerts, 2000).These approaches build from traditional methods of conceptualizing soil variability (Dokuchaev, 1967;Jenny, 1941) and soil survey (Soil Survey Staff, 1999) using ancillary data, such as remotely sensed imagery and digital elevation data, as spatially extensive measures and proxies of soil-forming factors that may be used to develop predictive models of soil properties and spatial variability (Buchanan et al., 2012;Scull et al., 2005).A core principle of DSM may be derived from the classic statement of soil-forming factors (Jenny, 1941) restated as SCORPAN (McBratney et al., 2003): S = f (c, o, r, p, a, n), where a soil property or soil type (S) is a function of the external factors of climate (c), organisms (o), relief (r), parent material (p), age (a), and its location in space (n).Recent advances in environmental sensing, ancillary data production, and modeling facilitate quantifying these external factors, or what have been termed "environmental covariates", as digital spatial data sets.For example, remotely sensed data provide proxies for organisms and density of vegetative cover in addition to mineralogy of soils and parent materials (Sullivan et al., 2005;Saadat et al., 2008), whereas digital elevation data provide proxies for relief and local-scale variation in climate, namely solar radiation and surface water redistribution (Ziadat, 2005;Moore et al., 1993).
These quantitative environmental covariates may then be used to predict soil property spatial distribution using spatial soil prediction models.For example, remotely sensed reflectance has been quantitatively related to soil properties such as particle size (Dematte and Nanni, 2003;Salisbury and Daria, 1992;Ben-Dor et al., 2002), mineralogy (Ben-Dor et al., 2003;Dematte et al., 2004;Galvao et al., 2008), soluble salts (Howari et al., 2002), organic matter, and extractable iron (Ben-Dor, 2002) in both laboratory and field settings.Similarly, digital terrain models and terrain attributes, such as slope, aspect, surface curvature and roughness, and wetness indices, have been used to predict surface redistribution of water and sediment as well as variation in solar energy inputs to the soil surface (Moore et al., 1991;Irvin et al., 1997;Florinsky, 1998).These models include a wide range of methods, including but not limited to regression analyses, principal component analyses, supervised and unsupervised classification techniques, and geostatistical methods (Eldeiry and Garcia, 2010;Hengl et al., 2007a;McKenzie and Ryan, 1999).
Developing sampling schemes from a set of geospatial data that captures the greater part of landscape variance can improve the efficiency and effectiveness of the sampling and modeling process (Brus and Heuvelink, 2007;Vasat et al., 2010).Here we apply a data-driven approach that combines an iterative principal component analysis (iPCA) data reduction with a conditioned Latin hypercube sampling (cLHS) sampling scheme to develop a sample and model framework for soil property prediction.The iPCA approach selects the covariate data that account for the greatest range of landscape variability, whereas the cLHS approach is used to design a sample set that effectively captures the geographic space and the feature space of covariate layers (Mulder et al., 2013;Hansen et al., 2012;Van Camp and Walraevens, 2009;Minasny and McBratney, 2006).Levi and Rasmussen (2014) applied similar iPCA-cLHS techniques to characterize soillandscape variability and predict soil physical properties for a 6250 ha area in southern Arizona.The iPCA reduced an initial set of 13 environmental covariates to 4 that accounted 95 % of the variance in covariate data.Soils were then sampled in real geographic space based on the cLHS approach.Field data were coupled with covariate data using a regression kriging approach to predict soil physical properties which exhibited good agreement with vector-based soil survey maps and captured internal map unit spatial variability.Regression kriging includes using regression models to extract information from sampled locations using covariate layers and then interpolating model residuals with ordinary kriging (Odeh et al., 1994) or other geostatistical approaches.
The objective of this study was to apply a coupled iPCA-cLHS routine to build robust statistical models for predicting high-resolution soil physical and chemical properties as measures of critical zone structure and function for a forested catchment in southern Arizona, USA.Similar methods have been successfully applied over larger areas with diverse soils, parent materials, relief, and vegetation (Levi and Rasmussen, 2014); however, these techniques have not yet been tested on smaller areas, such as a single first-order catchment with limited variability in soil-forming factors.We hypothesized that (i) the coupled method of iPCA and cLHS modeling would efficiently quantify landscape variability in a small-forested catchment and (ii) that it would facilitate the use of relatively simple statistical methods to predict soil chemical and physical properties throughout the catchment.

Study site
This study was based in Marshall Gulch (MG), a forested catchment located in the Santa Catalina Mountains, just outside of Tucson, AZ (111 • 16 41 W and 33 • 25 46 N) (Fig. 1).This study focused on one small ∼ 6 ha north-facing catchment with an elevation range of 2300-2500 m a.s.l.The catchment was occupied by mixed conifer vegetation, primarily Pinus ponderosa, with lesser amounts of Pseudotsuga menziesii and Abies concolor (Whittaker and Niering, 1968).The climate of MG includes an ustic soil moisture regime (Soil Survey Staff, 1999) that receives an average of ∼ 85-90 cm of annual precipitation split evenly between a combination of cold rain and snow during winter months and convective summer thunderstorm rainfall.Mean annual temperature at MG is 10 • C, with a mesic soil temperature regime (Soil Survey Staff, 1999).The catchment is bounded by ridgelines to the east and west, with very steep slopes, exceeding 45 % slope gradient in some areas, and several small ephemeral drainages that only flow following extreme precipitation events or during snowmelt.
The Santa Catalina Mountains encompass a metamorphic core complex system (Arca et al., 2010) that yields a complex array of bedrock materials proximal to the MG field site, including Tertiary-aged granitic rocks and Paleozoicaged meta-sedimentary (Dickinson, 1992).The 1 : 250 000 geologic map of the area indicates the study catchment is situated on the Wilderness granite suite that consists of an Eocene-aged two-mica granite; however, detailed field investigation indicated the roughly 20 % of the catchment was underlain by a combination of a hornblende-rich amphibolite, in addition to areas underlain by quartzite (Figs. 2 and 3).

Environmental covariates
Covariate layers included remotely sensed 1 m pixel resolution four-band aerial imagery from the National Agriculture Imagery Program (NAIP) collected in June 2010, and topographic covariates derived from a 2 m resolution lidarderived bare earth digital elevation model.Derived covariate layers included NAIP band ratios as well as topographically derived annual solar radiation, slope, regolith depth, and wetness.All covariate layers were clipped using 5 m buffer polygon around the selected catchment, resampled to a 2 m pixel resolution to match the lidar data, and projected to a common datum, NAD83 UTM zone 12N.
The System for Automated Geoscientific Analysis (SAGA) version 2.0.4 (Conrad, 2006) was used to calculate annual solar radiation [W m −2 yr −1 ], slope [degrees], and the SAGA wetness index [unitless].Terrain analysis was performed with parallel processing module using a multiple flow direction algorithm (Freeman, 1991) to compute slope, contributing area, and SAGA wetness index (Boehner et al., 2002).Total incoming solar radiation (direct and diffuse) was calculated with the incoming solar radiation module in SAGA for 1 year on a 14-day time step using (Wilson and Gallant, 2000).Additionally, a modeled soil depth layer was included as a topographic variable.Modeled soil depth was generated and validated previously for this catchment using a geomorphic framework implementing a nonlinear depth-and slope-dependent sediment transport model   with an exponential soil production function (Pelletier and Rasmussen, 2009).

Data reduction
A data-driven iterative principal component analysis (iPCA) was used to determine those layers dominating soillandscape variance based on Nauman (2009) and Levi and Rasmussen (2014).Similar methods have been used with multivariate soil prediction mapping (Hengl et al., 2007b;Vasat et al., 2010) to optimize sample locations and to ensure landscape variability is captured in the sampling scheme.
Here we used iPCA coupled with a factor loading analysis to select the final set of covariate layers used for soil prediction models.All covariate layers were standardized using a z score prior to iPCA: where Z ij is the z score of pixel i in layer j , x ij is the untransformed value of pixel i of layer j , µ j is the mean of layer j , and σ j is the standard deviation of layer j prior to iPCA.The standardized data were grouped into topographic and NAIP indices, and each group was handled separately for the initial step of the data reduction.The iPCA eigenmatrix and eigenvalues were used to calculate loading factors (R kp ) of each input band using the degree of correlation: where a kp is the eigenvector for band k and component p, λ p is pth eigenvalue, and Var k is the variance of band k in the covariance matrix (Jensen, 2005).The absolute values of loading factors for each covariate layer were summed and ranked from greatest to lowest, providing a quantitative metric of the total contribution of each covariate layer to the overall variance of the data set.The number of principal components required to reach 95 % cumulative explained variance in the data set determined the number of covariate layers to retain for subsequent iterations.The covariate layers retained were those with the greatest absolute summed loading factors.This was repeated until all principal components were needed to achieve 95 % of cumulative variance.After processing topographic parameters and NAIP reflectance ratios separately, the final layers from each group were merged and reduced in the same manner.The covariate layers determined in the final PCA included NAIP B3 : B2, NAIP NDVI, solar radiation, SAGA wetness index, slope, and modeled regolith depth.This set of covariates was used for field sample design and modeling of soil properties.

Sample design
A conditioned Latin hypercube sampling (cLHS) scheme was used to develop a sampling design to sample locations that were randomly distributed in geographic space throughout the catchment and that captured the distribution of the selected covariate layers (Minasny and McBratney, 2006).We implemented cLHS using the z-scored layers of the six covariate layers determined from the iPCA to ensure landscape variability was accounted for with the chosen sample sites.We ran the cLHS model using 40 000 iterations to randomly determine sample site locations (n) within the study site using a n = 5, n = 10, n = 15, n = 20, and n = 30.It was concluded that n = 20 best captured the distribution of the environmental covariates using the least number of sample locations based on comparison of sample site covariate statistics (mean, range, skewness, etc.) with the original covariate layers.Four supplemental sample points (sample ID 21-24) were also included from Lybrand et al. (2011).The additional samples were incorporated where it was subjectively determined that the cLHS had missed key landscape positions, such as in the middle of the main drainage flowing out of the catchment and divergent summit positions.

Field methods and sample characterization
A rugged Trimble Yuma outdoor tablet and GPS unit was used to locate the sample sites in the field to sub-meter accuracy.At each of the 24 sample sites, soil pits were dug with a spade to the depth of refusal that coincided with the saprolite-saprock interface.Each pit was described and sampled by genetic horizons following standard protocols and soil morphological and physical properties including color, structure, consistence, root abundance, and rock fragment content were described in the field (Schoeneberger et al., 2002) (Fig. 2).In general, five horizons per pedon were described, resulting in a total of 100 collected samples.Following field sampling, it was determined that five locations were underlain by different parent materials.Specifically, pits 1 and 17 were underlain by quartzite; pits 5, 12, and 20 were underlain by a hornblende-rich amphibolite; and all remaining sample locations were underlain by granite (Fig. 3).Soil bulk density measurements were collected at each pedon using a simple core and hammer method (Blake and Hartge, 1986).Assuming parent material homogeneity within the soil profile, representative samples of the parent material was collected at the saprolite-saprock boundary at each pedon for chemical and mineralogical analyses.All soil samples were air-dried and sieved to isolate the < 2 mm fine-earth fraction and all soil characterization conducted on this fraction unless otherwise stated.Soil pH was measured on all samples at weight to volume ratios of 1 : 2 (soil : water) solution, 1 : 2 (soil : 1 M KCl) solution, and 1 : 4 (soil : 0.02 M CaCl 2 ) solution (Soil Survey Staff, 2004).Soil electrical conductivity (µS cm −1 ) was determined for all samples using a 1 : 2 (soil : water) extract (Burt, 2004).
Soil organic matter content was determined by loss on ignition (Konen et al., 2002) for all collected samples.Samples (20 g) were placed in a furnace at 105 • C for 24 h, weighed, and then placed in a muffle furnace at 360 • C for 2 h, with the change in weight as a proxy for organic matter.Total carbon and nitrogen content (wt/wt %) and stable isotope signature (δ 15 N and δ 13 C) were measured for all samples on a continuous-flow gas-ratio mass spectrometer (Finnigan Delta PlusXL) coupled to an elemental analyzer (Costech) at the University of Arizona, Environmental Isotope Laboratory.The samples did not contain carbonates and it was assumed that total carbon was equivalent to total organic carbon.
Particle size was determined by laser diffraction using a Beckman Coulter LS 13 320 laser diffraction particle size analyzer at the University of Arizona, Center for Environmental Physics and Mineralogy.Following pretreatment to remove organics using NaOCl adjusted to pH 9.5, roughly 0.2 and 0.1 g of sample were weighed and put into tubes and then mixed for 24 h with 5 mL of deionized water using a Thermo Scientific Labquake ® shaker/rotator, followed by the addition of 5 mL of 5 % sodium hexametaphosphate solution for an additional 24 h to ensure dispersion of soil particles prior to particle size analysis.

Soil elemental and mineralogical characterization
Elemental concentrations for major and trace elements were determined for all soil and rock samples by X-ray fluorescence (XRF).Soil and rock samples were ground by ball milling ∼ 3.5 g of sample in a plastic scintillation vial containing three tungsten carbide bearings for 10 min and then pressed into pellets at a pressure of 25 t for 120 s bound with a layer of cellulose wax (3642 cellulose binder -SPEX SamplePrep PrepAid ® ) and analyzed using a polarized energy-dispersive X-ray fluorescence spectrometer (EDXRF -SPECTRO XEPOS, Kleve -Germany) at the University of Arizona, Arizona Laboratory for Emerging Contaminants.
For mineralogical analyses, samples were pretreated to remove organic matter (Jackson, 2005) with a 100 mL solution of 6 % NaOCl adjusted to a pH of 9.5, rinsed with deionized water, centrifuged, dried, and mixed.Mineral phases for soil and rock samples were identified by quantitative Xray diffraction.A known amount of internal standard (corundum) was added to each sample to allow for quantitative interpretation of diffraction peaks.Samples were ground using a McCrone micronizing mill (Eberl, 2003).All sample preparation steps were intended to maximize random orientation and to increase the exposed surface area of the included minerals.Samples were run as random powder mounts, measured from 5 to 65 • 2θ, with a step size of 0.02 • 2θ at the University of Arizona, Center for Environmental Physics and Mineralogy, using a PANalytical X'Pert PRO-MPD X-ray diffraction system (PANalytical, Almelo, AA, the Netherlands) generating Cu-Kα radiation at an accelerating potential of 45 kV and current of 40 mA.The resulting diffractograms were analyzed using Rietveld analyses to identify mineral phases and mineral abundance (Moore and Reynolds, 1997).

Elemental mass transfer and elemental mass flux
Soil chemical denudation was determined using the loss of Na relative to the parent material.Sodium serves as a proxy of plagioclase feldspar weathering in granitic terrain as Na is for the most part biologically inert, and plagioclase minerals are often the first minerals to chemically decompose in granite bedrock weathering to regolith (Brantley and White, 2009).We used a dimensionless mass transfer coefficient, τ , as a measure of soil Na loss and chemical denudation relative to the parent material (Chadwick et al., 1990): where C is the concentration of an immobile element i, in this case the element zirconium, and a mobile element j , here Na, in the parent bedrock p and the weathered soil w to calculate relative elemental loss or gain.Values of τ = 0 indicate no change from parent material, whereas τ > 1 is equivalent to elemental gain, and τ < 1 equivalent to elemental loss.
In addition, the volumetric strain, ε, associated with soil formation was calculated as (Brimhall and Dietrich, 1987): where ρ p is the bulk density of the parent bedrock and ρ w is the bulk density of the soil.The total mass flux of Na [m j,flux ; M L −2 ] was calculated for individual soil horizons (Egli and Fitze, 2000;Heckman and Rasmussen, 2011): where z w is horizon thickness and η w is the fractional volumetric rock fragment content.The summation of m j,flux for all horizons in a single pedon yields the total Na elemental flux from the pedon that has occurred during pedogenesis: where n is the total number of elements of interest, in this case simply Na, and k is the total number of horizons in the profile.

Statistical analyses and parent material spatial modeling
Summary statistics of measured soil properties were performed on all collected soil horizons (n = 103) and for pedon summed values reported on a mass per area basis (n = 24) (Table 1).Simple means comparison of soil physicochemical properties among the different parent materials were conducted using unequal variance t tests given the unequal sample number per parent material: quartzite n = 7, amphibolite n = 10, and granite n = 86, where n is the number of horizons per parent material (Table 2).Analysis of parent rock geochemistry and mineral composition indicated substantial variation of several key parameters among the three parent materials, including Ti : Zr, [Ca + Mg + Fe], and weight percent hornblende and quartz (Table 3).To define the spatial extent of the various parent materials, the lowermost C or Cr horizon values for these parameters from each sample pit were interpolated across the catchment using an inverse distance weighting (IDW) routine.Specifically, the IDW power was set equal to 2, with a circular search radius of 250 m, a maximum of 23 neighbors, and 1 search sector.The interpolated values for each parameter were then exported, standardized, and categorized into three classes using a hierarchical clustering routine using Ward's minimum variance method for computing the distance between clusters (Milligan, 1979).The resulting three clusters summarized the variation in C and Cr horizon geochemistry and mineral composition of the three parent materials and provided a spatial estimation of the extent and location of the different parent materials (Fig. 3).

Regression modeling
Soil prediction models were developed using all 24 sample locations.Target variables selected to model were clay [%], KCl pH, τ Na , soil depth [cm], organic carbon content [kg m −2 ], clay content [kg m −2 ], and Na mass flux [kg m −2 ]; note that modeled soil depth was not used as a predictor in the models for measured soil depth.These target variables were chosen to represent a range of important chemical, physical, and biological soil properties.The variables clay %, KCl pH, and τ Na were modeled as depth-weighted averages for each profile where individual horizon values were weighted according to their relative fraction of total soil and saprolite depth, whereas soil depth, clay content, organic carbon content, and sodium mass flux data were calculated as profile sums.Logit transformations were performed on all values prior to regression modeling (Hengl et al., 2004): where the raw values z were standardized to the min and max of the raw data z + .Logit transformations normalize distributions of non-normal data sets and ensure that predicted soil values remained within the range of the target variable values, making prediction more efficient and statistically robust.
The z + values were then standardized from 0 to 1 (Hengl et al., 2004): where z ++ are the logit values used in soil property prediction models.After the prediction models were constructed and interpolated residuals were added to the regression predictions (see below) using logit z ++ values, predicted z ++ values were back-transformed (Hengl et al., 2004): www.soil-journal.net/1/47/2015/SOIL, 1, 47-64, 2015 for displaying results and to ensure that back-transformed values, z bt , represent the scale of the original data, z.
Two sets of multiple linear regressions were developed using either the z-scored environmental covariate layers or the principal components (PCs) resulting from the final iPCA iteration as the independent variables, and the logittransformed soil data as the dependent variables.The total number of data points for developing regression models was n = 24.This represents a relatively limited number of data points for model development.We set a minimum cutoff of 17 degrees of freedom for each model, with the goal of maximizing model degrees of freedom and minimizing the number of independent variables used in each model.We further limited interaction terms among independent variables to a maximum of two-way interactions to reduce model complexity and the risk of over-fitting the data.For regression models developed using the environmental covariate data we calculated the variance inflation factor (VIF) for each model component to determine whether any possible independent variable colinearity exerted undue influence on model fit and parameter estimation.A VIF value of > 5 indicates colinearity (Freund et al., 2003); all measured VIF values were < 3, indicating colinearity did not affect model fit and parameter estimation (Table 6).
Multivariate stepwise linear regression models were constructed using JMP (JMP v11, SAS Institute Inc., Cary, NC, 2012).Regression modeling was performed using a combination of forward and backward stepwise least-squares regression including all independent variables and their interaction terms.Model and independent variable parameter selection was based on a combination of metrics including maximizing the k-fold cross validation R 2 , where k was set to 24 and therefore equivalent to a leave-one-out cross validation (LOOCV).We also calculated the prediction residual sum of squares (PRESS) statistic (Freund et al., 2003): , where y i is the value at the ith location, and ŷi is the value predicted for the ith location when removed from the model, providing a measure of model fit based on LOOCV.From this parameter, we calculated the PRESS root-mean-square error (PRMSE) as which may be compared to model RMSE; large differences between the two indicates the model is sensitive to the presence/absence of specific observations.In addition to calculating regression model R 2 and P value, we also calculated the PRESS R 2 or where SST is total sums of squares.The R 2 Predicted is similar to model R 2 in that it ranges from 0 to 1, with values of 1 indicating that the model explains 100 % of the variance of the predicted data.These values were calculated for all models, and the final model selection for soil property prediction was based on comparison of these values among regression models using the set of z-scored covariate data or the PC data.
During model selection we also calculated Cook's D for each location to determine whether any point was exerting undue influence on model parameters (Freund et al., 2003).A Cook's D value > 1 indicates a specific point exerts undue influence on model fit.Only one soil variable, Na mass flux [kg Na m −2 ], exhibited locations with Cook's D values > 1 for regression models developed using either the environmental covariates or the PCs.Specifically, one location, pit 17, underlain by quartzite, exerted undue influence on model fit.This point was excluded from model development and parameter estimation for the Na mass flux model.
Finally, we calculated the normalized mean square error (NMSE) to compare the relative goodness of model fits among models with different units and to deal with comparing models developed using logit-transformed variables as dependent variables: where n is the number of observations, p i is the predicted value at location i, o i is the observed value at location i, and s 2 is the variance of the observed samples (Park and Vlek, 2002;Hengl et al., 2004).

Residual spatial analysis
Logit-transformed regression model residuals were spatially interpolated using an optimized IDW algorithm, IDW opt , in ArcGIS 10.0 (ESRI, Redlands, CA) following Molotch et al. (2005).Due to the small size of the study area and only having 24 sample points, kriging was not deemed an appropriate technique for interpolating residuals.The IDW opt routine is a conservative interpolation, where predicted values are not over-or underestimated relative to the maximum or minimum values of the data set, generating a smooth surface with peaks and valleys between data points.The IDW opt option allows for the optimization of the power, search radius, and maximum number of neighbors used in the weighted distance calculation using LOOCV with the goal of minimizing the RMSE of the residual prediction.The IDW opt was performed using a search radius ranging from 75 to 250 m and varying number of neighbors and final parameter selection based on the combination that best minimized RMSE of LOOCV and provided an error mean that approached zero.The interpolated values were then added back to the values predicted using the regression model and the resulting sum back-transformed from logit units to the units of the predicted dependent variable for "final" maps of predicted soil properties.

General soil properties
The observed soil profiles exhibited consistent taxonomic and morphologic soil properties, despite differing parent material and a relatively broad range of soil physicochemical properties (Table 1).Taxonomically, all MG soils were classified as Ustorthents or Haplustolls, with umbric or mollic epipedons and no subsurface diagnostic horizons (Holleran, 2013).The average soil profile included a surface Oi horizon consisting of pine litter and partially decomposed organic materials and several A horizons darkened by organic matter accumulation that transition to multiple AC and C horizons grading towards saprolite and saprock (Fig. 2).Soil depth ranged from 40 to 130 cm, with an average of 72 ± 25 cm.Soils averaged 2.1 ± 1.6 % organic C with enrichment of C in surface horizons.Surface A horizons averaged organic C content of 4.3 ± 1.6 %, which decreased roughly linearly with soil depth to organic C values averaging 0.9 ± 0.4 % for C and Cr horizons.Soil carbon stocks averaged 8.0 ± 3.4 kg m −2 , with the maximum and minimum values of 16.8 and 4.2 kg m −2 located in convergent and divergent portions of the landscape, respectively.Soil pH in general was moderately acidic, with KCl pH values averaging 4.9 ± 1.0, ranging from acidic at KCl pH of 3.0 to a nearneutral pH of 7.0 (Table 1).Higher pH values were generally associated with the organic-matter-rich surface horizons and the convergent portions of the landscape, with the most acidic values in saprolite and saprock horizons.The majority of MG soils were texturally classified as sandy loams, with some loams and sandy clay loams.Clay and sand weight percentages averaged 11 ± 4 and 58 ± 8 %, respectively.Despite substantial variation in parent material composition, none of these soil physicochemical properties exhibited significant variation among the three parent materials (Table 2; all P values > 0.05).Rock fragment abundance increased with depth, from a range of 2-10 % in surface A horizons to > 50 % in the deepest C and Cr horizons.Rock fragment content was significantly greater in the quartzite soil horizons (P < 0.001, Table 2).

Elemental composition and mineral assemblage
Elemental and mineral composition of parent material samples from each profile indicated the catchment was dominantly underlain by granitic materials with small intrusions of quartzite and a mafic, hornblende-enriched amphibolite (Fig. 3).There was large variation in quartz content, ranging from ∼ 2 % by weight in the amphibolite to nearly 80 % in the quartzite (Table 3).The amphibolite was dominantly hornblende, averaging > 70 % hornblende by weight, with ferropargasite as the other dominant amphibole.The mineralogical differences were also expressed in variation in rock geochemical composition, with much greater values for [Ca + Mg + Fe] and Ti : Zr in the amphibolite (Table 3).
The elemental and mineral composition of the MG soils was relatively heterogeneous, following the variation in parent material (Table 2).The 21 profiles on granitic and quartzitic bedrock were relatively uniform in chemical composition, with a limited range of values for each element.In contrast, profiles located on the amphibolite exhibited significant differences in elemental and mineral composition relative to the other soils.Specifically, the granite and quartzite soils exhibited Si concentrations that averaged 32 ± 1.3 g Si kg −1 , relative to the mafic soils that averaged 24 ± 2.9 g Si kg −1 (P < 0.0001).The mafic soils were enriched in Ca, Mg, and Fe, with [Ca + Mg + Fe] values that averaged 17 ± 4 %, compared to the quartzite and granite soils that averaged [Ca + Mg + Fe] of 4.2 ± 0.9 and 2.9 ± 2.8 %, respectively (P < 0.001).Additionally, the soils on amphibolite exhibited Ti : Zr of 159 ± 101 relative to granite and quartzite soils where values were < 14.Quantitative XRD analyses indicated hornblende content of 31 ± 16 % in the amphibolite soils, relative to the granite and quartzite soils that averaged < 1.0 % (P < 0.001).The quartzite soils exhibited a greater concentration of quartz, with an average of 48 ± 13 % compared to the granite soils that averaged 36 ± 5.7 %, and the mafic soils with average quartz concentration of 18 ± 7 % (P < 0.001).
Sodium mass transfer, τ , and total Na mass flux values, m flux,total , indicated Na loss in all soil profiles (Table 1), with the exception of the two profiles on quartzite, pit 17 and pit 1.All the granite and amphibolite soils exhibited similar Na depletion profiles, with τ Na values averages of −0.19 ± 0.07 and −0.19 ± 0.12, respectively (Table 2).This equates to an average of ∼ 20 % loss of Na due to chemical weathering.In contrast, the quartzite profiles averaged > 80 % enrichment in Na.Analysis of parent rock and soil geochemistry in the quartzite profiles indicated that much of the soil material in the quartzite profiles exhibited a geochemical signature more similar to that of soil materials derived from granite rather than the underlying quartzite parent material (Fig. 4); thus Na mass transfer and mass flux calculations for these locations were strongly influenced by the discrepancy in soil and parent material geochemistry used in Eq. ( 3).The quartzite parent material contained little Na, on the order of 0.4 %,   compared to an average of 1.4 % in the soils putatively derived from quartzite.Greater Na concentration of Na in the quartzite profile and the variation in Ti : Zr indicates the addition of granitic and/or mafic materials via colluvial transport from neighboring soils.This supposition is supported in that the relative Na concentration in the quartzite soils was equivalent to the Na concentrations noted in the surrounding granite and mafic soils.

Principal component analysis and regression modeling
The iPCA and factor loading analyses indicated that six environmental covariates accounted for 95 % of the variance in the landscape (Table 4).The selected environmental covariates included solar radiation (Sol_Rad), SAGA wetness index (Wet_Ind), slope, NDVI, NAIP B3 : B2, and numerically modeled soil depth (Mod_Depth) from Pelletier and Rasmussen (2009), with the largest amount of variance attributed to slope, solar radiation, SAGA wetness index, and B3 : B2 (Table 4).
The final set of PCs from the final iPCA were strongly correlated with distinct sets of environmental covariates (Table 5).Specifically, PC1 and PC2, which accounted for 56 % of cumulative variance (Table 4), were strongly correlated to topographic variables where PC1 was most strongly related to modeled soil depth, slope, and wetness index, and PC2 was strongly correlated with wetness index, modeled soil depth, and solar radiation.In contrast, PC3 and PC4 were dominantly weighted towards NDVI and B3 : B2, respectively.PC5 exhibited significant correlation with all of the environmental covariates, whereas PC6 was only significantly correlated to wetness index and slope.Environmental covariates exhibited relatively few significant correlations with each other, with the strongest correlation of r = 0.73 between modeled soil depth and wetness index.
The final set of soil prediction models for the soil variables of KCl pH, clay [%], τ Na , soil depth [cm], carbon and clay stocks [kg m −2 ], and Na mass flux [kg m −2 ] varied in terms of prediction parameters, the relative strength of model fit, and model predictive capability (Table 6).The variables of KCl pH, τ Na , and Na mass flux exhibited the best prediction models using PCs.Specifically, prediction parameters for KCl pH included PC2 and PC4 that were strongly correlated to topographic parameters and vegetation indices (Table 5).Similarly, prediction parameters for Na mass flux included topography-related components PC2 and PC6, and the vegetation-related component PC3.The prediction model for τ Na was comprised of topography-related components of PC1, PC2, and PC6.In contrast, the prediction models for clay %, soil depth, and carbon and clay stocks all exhibited the strongest fits with environmental covariate layers related to topography including wetness index, solar radiation, and modeled soil depth (note that modeled soil depth was excluded from models developed for measured soil depth) (Table 5).
All of the prediction models, except for clay %, included one two-way interaction term and all models had 19-20 degrees of freedom for the model error term.The exception was the model for Na mass flux that only had 17 error term degrees of freedom.As indicated previously, the Na mass flux models were developed excluding pit 17 as Cook's D values indicated this location exerted a large effect on model parameter estimation and model fit.All variance inflation factors were < 5, indicating that colinearity among independent variables was not an issue, even for those models that included both modeled soil depth and wetness index that were significantly correlated.
The best regression models based on model R 2 , NMSE, and R 2 p were the models for clay and carbon stocks, KCl pH, and τ Na (Table 6).These models exhibited the lowest relative NMSE values, ranging from 0.32 to 0.44, and explained between 56 and 68 % of variance in the dependent variable values, and between 42 and 52 % of variance in predicted values based on LOOCV.In comparison, while the models developed for clay %, soil depth, and Na mass flux exhibited a reasonable fit to all dependent variables used in the model, explaining 41 to 46 % of variable variation, they explained a relatively small fraction, only 16 to 21 %, of the variance for predicted variables using LOOCV.The worst model fit was for Na mass flux, with the greatest NMSE of 0.87 and model P value of only 0.028.All models exhibited near-equivalent values for RMSE and PRMSE.

Residual interpolation
Logit-transformed regression model residuals were interpolated using IDW opt .The IDW opt yielded variable power values, with most values equal to 1 and only IDW models for τ Na and carbon stock with power values roughly equivalent to 2 (Table 7).The LOOCV mean errors approached zero for all variables.Further, we observed no consistent correlation of model residuals to landscape position (Fig. 5).The IDW opt results indicated the coupled iPCA and cLHS generated randomly distributed model errors.Interpolated model residuals were then added back to values predicted using the regression models.

Predicted soil properties
The final sets of values from the regression modeling and residual interpolation were back-transformed to the relative scale of each soil variable and expressed as maps of soil physicochemical properties (Fig. 6).The final maps indicated clear spatial variation in soil properties that were largely related to topographic variation, with a general trend of greater depth, clay and carbon content, and Na loss in the convergent drainages.Final predicted value summary statistics were comparable to those of the locations used to develop the prediction models (Table 8).Measured and predicted values generally exhibited similar median and quartile values and summary statistics.The largest deviations between measured and predicted values were observed for τ Na and Na mass flux.This deviation was driven by the area of the catchment underlain by quartzite parent material (Fig. 3) that exhibited positive values for τ Na and Na mass flux (Fig. 5) due to the variation in chemistry between the quartzite rock and the overlying soil material (Fig. 4).

Factors controlling soil property spatial variation
The observed and modeled data indicated that water and energy, represented by the topographically derived variables of wetness index and solar radiation, were central factors for predicting soil properties across the catchment.Solar radiation and wetness index were included directly as environmental covariates or principal components in all of the regression models developed for soil property prediction (Table 6).The significance of solar radiation and wetness index is also apparent in the final PCA, where the dominant eigenvectors of PC1 and PC2, which accounted for a combined 56 % of landscape variability, were strongly correlated to solar radiation and wetness index (Table 4).These findings agree with other studies at this scale of observation that note significant correlations of soil properties to solar radiation, wetness index, and slope (Moore et al., 1993;Gessler et al., 2000;Ashtekar and Owens, 2013).Solar radiation is a primary control on soil water availability, serving as a metric of radiative energy transfer to the soil system that drives soil heating, evaporative transpiration, photosynthesis, and soil water availability (Beaudette and O'Geen, 2009).The degree of soil wetness drives primary production, weathering rates, and biological activity, and has previously been reported as an important topographic index for predicting soil properties at scales ranging from 10 0 to 10 3 ha (Bodaghabadi et al., 2011;Levi and Rasmussen, 2014;Browning and Duniway, 2011).
Remotely sensed NDVI and NAIP bands B3 : B2 characterizing vegetation variability were only included in the models for KCl pH and Na mass flux, and in those cases only indirectly through their contribution to PC3 and PC4.In the final output from the iPCA, vegetation layers only accounted for ∼ 14 % of landscape variance (Table 4).The lack of stronger relationships of soil properties to vegetation was most likely a function of the relative homogeneity of the vegetation assemblage in the observed catchment and the relatively small catchment area, 10 0 ha, encompassed by this study.These results contrast with those of previous DSM studies at larger spatial resolutions spanning areas on the order of 10 4 to 10 6 ha that identified strong relationships between soil variation and remotely sensed vegetation indices (Ballabio et al., 2012;Dobos, 2003).

Soil property variation with topography
The predicted spatial patterns of soil properties indicated strong correspondence to local topography and landscape structure (Fig. 5); however, this was expected given the   amount of landscape variance accounted for by topographic variables and the inclusion of these parameters in all of the derived soil prediction models (Table 5).At the scale of the catchment studied here, variation in relief/topography was the primary variable accounting for soil variability, with minimal variation in other soil-forming parameters, including climate, organisms, landscape age, and parent material.The observed significant variation in parent material appeared to exert minimal control on spatial soil property variation and model residuals, further highlighting the dominant role of topographic control on soil variation at this scale.In particular, the geochemical signature of the soils located on the quartzite parent material indicated mixing with adjacent granitic and mafic parent materials, likely via topographically driven processes of colluviation and bioturbation.All modeled soil properties exhibited strong trends related to the convergent portions of the landscape.Specifically, in the convergent areas we observed deeper soils, greater concentration of clay and clay stocks, increased Na depletion and Na mass flux, and greater soil organic C stocks (Fig. 5); much of this variation may be attributed to topographic controls on water and sediment redistribution.
In terms of water redistribution, the convergent areas of the landscape collect both surface and subsurface flow from adjacent water-shedding landscape positions (Beven and Kirkby, 1979).Wetness index values for the observed catchment ranged from ∼ 0.9 on summit and ridge landscape positions to over 14 in convergent positions.These data suggest that the convergent positions may receive an amount of water up to an order of magnitude greater than other portions of the landscape.Increased hydrologic inputs facilitate greater physical and chemical weathering owing to a combination of (i) greater water available to serve as both reactant in weathering reactions as well as a transport agent to remove soluble weathering products from the reaction zone (White and Brantley, 1995) and (ii) through the input of surface and subsurface waters enriched in dissolved organic matter, including organic acids and ligands, sourced from adjacent landscape positions (e.g., Vazquez-Ortega et al., 2014).We observed maximum Na depletion and Na mass flux, on the order of 30 % Na and −900 kg Na m −2 , in the convergent areas relative to depletion and mass flux values of ∼ 5-10 % and −150 to −250 kg Na m −2 in ridge and summit positions (Fig. 5).Greater Na loss may be attributed to the greater potential for chemical weathering, and likely higher weathering rates of plagioclase feldspar and amphibole in these portions of the catchment.Additionally, an unknown portion of the greater Na depletion and mass loss in the convergent areas may be attributed to the accumulation of previously weathered materials transported to the convergent areas from adjacent portions of the landscape via colluviation and other mass transport processes (Yoo et al., 2007;Yoo and Mudd, 2008).
www.soil-journal.net/1/47/2015/SOIL, 1, 47-64, 2015  Convergent portions of the landscape typically exhibit deeper soils than adjacent ridges as a result of the localized accumulation of soil and sediment produced in upslope positions (Dietrich et al., 2003), consistent with the patterns observed here (Fig. 5).Based on observed soil morphology in convergent areas, these soils represent cumulic profiles with a series of deep A and AC horizons that exhibit some particle size stratification, suggesting individual packets of sediment accumulated from episodic physical deposition events (Dickinson, 1992).Soils located on summit/ridgetop positions were generally shallow and rocky, with morphology indicative of in situ soil production, whereas hillslope soils contained colluvial surface horizons darkened by organic matter over soil derived from bedrock weathering in place (Fig. 2).Soil production rates, i.e., the physical conversion of bedrock to potentially mobile soil material, typically present an inverse relation with soil depth in that rates decrease exponentially with increasing soil depth (Heimsath et al., 1997).For the catchment observed here, this translates into localized variation in soil production, with the lowest potential rates in convergent areas where soil production rates would be limited due to the deep accumulation of sediment from adjacent landscape positions.
The convergent areas also contained a greater magnitude of clay and carbon stocks (Fig. 5e, f), driven in part by the greater soil depth observed in these areas.Clay percentages were also substantially greater in convergent areas, with up to 14 % clay by weight on average, relative to summit and ridgetop positions that contained only ∼ 5-6 % clay on average (Fig. 5a).Thus greater clay stocks in the convergent areas were not only a function of deeper soils but also increased concentration of fine-grained materials.The enrichment of clay in these areas is likely due to a combination of sediment transport, particularly surface wash processes that preferentially transport fine-grained materials (Pelletier, 2008), and enhanced weathering and in situ transformation of primary minerals to secondary phases (Maher, 2010).
The largest carbon stocks, on the order of 17 kg m −2 , were observed in the convergent landscape positions (Fig. 5f).Topographic variation in C stocks may be related to localized variation in primary production and C inputs, greater frequency of soil saturation and anaerobic conditions, and accumulation of C-rich surface material from adjacent landscape positions via physical transport (Berhe and Kleber, 2013;Harden et al., 2008).We did not observe a significant difference in canopy cover and vegetation density in the convergent areas relative to other portions of the catchment, so local variation in primary production likely does not account for increased C stocks.The convergent areas do tend to have longer and more frequent occurrence of saturated conditions (Heidbuchel et al., 2013) such that localized and temporally variably saturation and water table rise may limit aerobic decomposition of C inputs in these areas.However, in contrast to clay percent, the average profile percent C did not exhibit any significant relationship to topography or other environmental covariates that would be expected if localized chemical processes were enhancing C accumulation.Thus, given the cumulic nature of the convergent area soil profiles, and the lack of topographic variation in C percent, enhanced C storage in the convergent areas is most likely the product of physical transport and accumulation of C-rich sediments from upslope positions.

Conclusions
Digital soil mapping methods were applied to model soil property spatial variation in a 6 ha forested catchment in southern Arizona.The process included use of an iterative data reduction technique to isolate those environmental covariates that accounted for the greater part of landscape variance and that were then coupled with a conditioned Latin hypercube sample design focused on capturing the overall numeric and spatial distribution of the selected environmental covariates.The key conclusions drawn from this work include the following: -The coupled iPCA and cLHS routine produced a sampling scheme that captured much of the soil variability in the catchment, demonstrating the utility of this approach for establishing robust sample designs for soil and geomorphic studies.
-Simple linear regression models and IDW interpolation of residuals captured the variance in measured soil properties and predicted spatial correlation of soil properties to landscape structure.
-For the catchment-scale observation studied here, topographic covariates explained more variation in soil www.soil-journal.net/1/47/2015/SOIL, 1, 47-64, 2015 properties than reflectance variables, a function of the relatively small scale of observation and uniform vegetation cover in this forested system.
-The modeled spatial distribution of soil properties provides a framework for interpreting catchment-scale variation in critical zone process and evolution.
Future work will focus on coupling results from this coupled empirical-statistical approach to output from mechanistic, process-based numerical models of critical process and evolution.

Figure 1 . 6 Figure 1 .
Figure 1.Location of study catchment in the (a) Santa Catalina Mountains in southern 3 Arizona, USA, the (b) Marshall Gulch watershed, and (c) sample locations in the study 4 catchment located in Marshall Gulch.5 6

Figure 3 .
Figure 3. Modeled spatial distribution of parent materials in the study catchment and examples of typical rocks sampled from the soil profiles sampled on each parent material. 41

Figure 4 .
Figure 4. Depth profiles of fine earth fraction Ti:Zr in the profiles location on quartzite relative to average Ti:Zr of soils derived from granite in the study catchment.

Figure 4 .
Figure 4. Depth profiles of fine-earth fraction Ti : Zr in the profiles location on quartzite relative to average Ti : Zr of soils derived from granite in the study catchment.

Table 1 .
Summary statistics of soil variables.

Table 2 .
Unequal variance t tests -all horizons with parent material as the main effect.
F ratio and P value are result of unequal variance t tests for each soil variable by parent material.

Table 3 .
Summary of key parent material mineral and geochemical parameters.

Table 4 .
Eigenvectors, variance, and cumulative variance for final PCA.

Table 5 .
Correlation matrix of environmental covariates and principal components.

Table 6 .
Summary statistics of regression models for predicting logit-transformed soil physicochemical properties.
IDW performed with a minimum of 10 neighbors and maximum of 24.Circular search radius.

Table 8 .
Comparison of measured and modeled soil property summary statistics.
n for measured is 24; n for modeled is 20 910.