Continental-scale controls on soil organic carbon across sub-Saharan Africa

Department of Environmental Systems Science, ETH Zurich, Zurich, Switzerland Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA Department of Sustainable Agriculture Sciences, Rothamsted Research, Harpenden, UK World Agroforestry Centre (ICRAF), Nairobi, Kenya Department of Live and Environmental Sciences, University of California Merced, Merced, CA, USA 10


Introduction
Soil conservation and sustainable management are crucial to address some of the main challenges humanity is facing, such as 35 climate change, food security, environmental degradation, and loss of soil biodiversity. Assessing the state of soils and their potential responses to climate and land-use change requires carefully designed sampling strategies, combined with systematic analytical and statistical analyses across locations and scale (IPCC 2019). One key component is soil organic carbon (SOC).
Due to its variety of sources, transformations and stabilization mechanisms, SOC is chemically very complex and spatially heterogeneous. This complexity causes significant uncertainties in global climate models (Friedlingstein et al. 2014). It also 40 complicates the extrapolation of SOC to a global scale, using statistical relationships to build robust global SOC products, such as SoilGrids and the Harmonized World Soil Database (Tifafi et al. 2018). To improve our understanding of global C dynamics, it is important to better understand the factors that control SOC stabilization and destabilization in soils from regional to global scales (Blankinship et al. 2018;Heimann and Reichstein 2008).
SOC-stabilizing drivers and processes have been intensively studied over the past several decades. Dokuchaev (1883) and 45 Jenny (1941) shaped the understanding that soil properties are correlated with (independent) variablesthe so-called soilforming factors (eq. 1): (cl, o, r, p, t) (1) where s stands for any type of soil property, such as pH, carbon content, mineralogy, etc., and is determined by the function f' of soil-forming factors: clclimate, oorganisms, rtopography, pparent material, and ttime. This concept is still 50 relevant and forms the basis for many experiments and research attempting to understand SOC storage. However, the importance of the individual factors of equation (1) at different spatiotemporal scales remains unclear (Doetterl et al. 2015;Rasmussen et al. 2018;Wiesmeier et al. 2019). This uncertainty hinders implementation of equation (1) in Earth System models, resulting in a gap between the theoretical understanding of SOM dynamics and our ability to improve terrestrial biogeochemical projections that rely on existing models (Blankinship et al. 2018;Rasmussen et al. 2018;Schmidt et al. 2011). 55 Despite the long history of studying SOC stabilization (Greenland 1965;Oades 1988), there still is increasing demand for data on SOC dynamics at landscape to global scales (Blankinship et al. 2018), especially from sub-tropical and tropical ecosystems.
SOC stabilization is commonly conceptualized as competition between accessibility for microorganisms versus chemical associations with minerals (Oades 1988;Schmidt et al. 2011). These processes are often only considered implicitly by models (Blankinship et al. 2018;Schmidt et al. 2011). Instead, models commonly rely on broader variables such as clay content, which 60 is used as a proxy for sorption and other organo-mineral interactions (Rasmussen et al. 2018;Schmidt et al. 2011). These more generic variables integrate a variety of stabilization processes which can be difficult to disentangle. They can differ in their relative importance and may not adequately capture soil mineralogy and chemistry across different ecosystems and climate zones. Hence, improving the predictive capacity of such models requires not only a better understanding of the factors that control SOC dynamics, but also verification (or falsification) of those new findings in regions that are underrepresented in 65 field studies and models.
For example, Rasmussen et al. (2018) found that exchangeable Ca was correlated with the quantity of SOC in water-limited soils, while Alox was a better predictor of SOC in wet, acidic soils. However, those findings may not be directly transferable to sub-tropical and tropical soils, since they differ greatly in climate, parent material and vegetation (Six et al. 2002b), which usually results in more weathered and older soils compared to those in temperate regions (Feller and Beare 1997). This was 70 illustrated recently in Quesada et al. (2020), where SOC variation in highly weathered forest soils from across the Amazon basin was best explained by clay content, whereas the best explanatory variables for less-weathered soils were Al species, pH and litter quality. Feller and Beare (1997) also found that tropical soils, dominated by low-activity clays (i.e. 1:1 clays), show a strong relationship between SOC and clay + silt content. In addition, Barthès et al. (2008) found that sesquioxides (Al and Fe) play an important role in SOC stabilization for various tropical soils. However, the relationship for high activity clays (i.e. 75 2:1 clays) is less clear and contrasting trends between SOC and clay + silt content have been reported (Feller and Beare 1997;Six et al. 2002a). In terms of SOC distribution across sub-Saharan Africa, Vågen et al. (2016) showed, by using a data set similar to this paper, that SOC content was highest in equatorial and warm temperate climates, where sand content, sum of base concentrations and pH values were low. With regard to land cover, it has been shown for several sites across Africa that forests usually contained the highest amount of SOC, whereas differences between cropland, grassland and shrubland were 80 less distinct (Abegaz et al. 2016;Olorunfemi et al. 2020;Winowiecki et al. 2016a). Cropland cultivation decreased carbon content by 50% compared to forested and semi-natural plots for sites in Tanzania, regardless of sand content and topographic position (Winowiecki et al. 2016b). However, land degradation (i.e. erosion) resulted in SOC concentration decreases in those ecosystems; independent of vegetation cover (Winowiecki et al. 2016a).
To address these diverging explanations of SOC variations on regional scales, we analyzed a comprehensive soil data set 85 collected across the African continent using the Land Degradation Surveillance Framework (Vågen et al. 2010). This data set covers a wide range of climatic and mineralogical conditionsfrom very arid to humid regions, with different pHH2O values, soil texture, weathering status, exchangeable cations and extractable metalsallowing us to test different parameters to explain the variation in SOC content in subtropical and tropical soils across sub-Saharan Africa for two distinctive depth layers (topsoil: 0-20 cm and subsoil: 20-50 cm). Here, we use this continental-scale data set to address the following research questions: 90 1. Which soil properties and climate parameters best explain SOC content variation across sub-Saharan Africa?
We explored the importance of soil texture, exchangeable Ca, oxalate-extractable Al and Fe, soil pHH2O, mean annual temperature, aridity index (PET/MAP), land cover and weathering status to explain variation in SOC content on a continental scale. We expect that oxalate-extractable metals, soil texture and climate will be among the most important predictors of SOC concentration. 95 2. How do geochemical SOC-controlling factors vary between environmentally distinct sub-regions?
Due to the heterogeneity of climate and soil conditions across sub-Saharan Africa, we expect to see different geochemical controls explaining variations in SOC content between regions. For example, we expect exchangeable Ca will be most important in regions that are drier with less weathered and alkaline soils, while oxalate-extractable Al and Fe will mainly be important in humid regions with highly weathered and acidic soils. 100

Study area and data collection
Soil data used in this study were collected during the AfSIS (Africa Soil Information Service) project. In total, 18,257 soil samples were taken from 60 sentinel sites and from two different depths (topsoil: 0-20 cm and subsoil: 20-50 cm). Samples stem from 19 countries across sub-Saharan Africa and were collected between 2009 and 2012, following the well-established 105 Land Degradation Surveillance Framework (Vågen et al. 2010). The sixty sentinel sites (each 100 km²) were stratified across sub-Saharan Africa according to Koeppen-Geiger zones ). Ten 1000 m² plots were randomized within sixteen spatially stratified 1 km² clusters per site (Figure 1). This hierarchical sampling design allows process identification at a continental scale without losing the ability to understand and quantify local heterogeneity (Nave et al. 2021;Vågen et al. 2010). For more details about sampling design and field survey, see Towett et al. (2015), Vågen et al. (2013a), and Winowiecki 110 et al. (2016a).
Our analyses built upon a subset of samples (11% of total, n = 2,002) which were originally selected as reference samples for laboratory measurements. These samples were used to calibrate mid-infrared spectroscopy models (Terhoeven-Urselmans et al. 2010) and to predict properties in the remaining 16,255 soil samples Winowiecki et al. 2017). The calibration subset was chosen to maximize the variation of the spectral data using the Kennard-Stone algorithm (Kennard and 115 Stone 1969). More information about this approach can be found in Terhoeven-Urselmans et al. (2010). This selection strategy results in unequally distributed samples across 51 of the 60 sentinel sites, yet captures the variation of the original data set.

Sample and data processing
Soil material was air-dried and sieved to a particle size <2 mm in the Soil-Plant Spectroscopy Laboratory at the World Agroforestry Centre (ICRAF) in Nairobi, Kenya. All soil properties (except for soil texture, which was measured at ICRAF), were analyzed at Rothamsted Research in Harpenden, U.K. 125 Data for soil organic carbon (SOC; wt-%), pHH20, amorphous oxalate-extractable aluminum (Alox, wt-%) and iron (Feox, wt-%), exchangeable calcium (Caex, cmol + kg -1 ), clay + fine silt content (<8 µm, %), and total element concentrations (in wt-%) of Al, Ca, K, and Na, were selected in order to cover a wide range of soil properties that have been identified to relate to SOC stabilization mechanisms (Oades 1988;Rasmussen et al. 2018), while maximizing the number of samples and minimizing the correlation among variables included in our analysis. 130 SOC was calculated from the difference of total C and inorganic C. The latter was directly measured with a Primacs AIC100 analyzer (Skalar Analytical B.V., Breda, Netherlands) by treating the sample with phosphoric acid and heating it to 135 °C in a closed system. Inorganic C in the sample was converted to CO2 and then measured by Non-Dispersive Infrared Detection (NDIR). Total C was determined with the TruMac Total N and C combustion analyzer (Leco, St. Joseph, Michigan, USA).
Soil pHH20 was performed in a 1:2.5 soil:water suspension. The extraction of Al and Fe with oxalic acid and ammonium oxalate 135 solution was done by shaking the solution for 4 h at 25 °C in the dark. Carbonate-rich samples were pre-treated with ammonium acetate at pH 5.5 to remove any CaCO3. Acid-oxalate extraction in particular dissolves short range-order minerals such as ferrihydrite (Fe), allophane and imogolite (Al), as well as other amorphous and organic Fe and Al minerals (Parfitt and Childs 1988). Hexamine-cobalt trichloride solution was used as extractant to determine Caex. Aqua regia acid digestion was applied for major and trace elements, including Al, Ca, K and Na. Although this method does not give absolute total contents, it does 140 give results sufficiently close to accepted values for different soils (McGrath and Cunliffe 1985). Samples were digested in tubes in time and temperature-controlled heating blocks. All elements were measured with ICP-OES (Optima 7300 DV, Perkin Elmer, Waltham, Massachusetts, USA). Particle size distribution was measured using a Laser Diffraction Particle Size Analyzer (LDPSA) Model LA 950 (Horiba, Kyoto, Japan). Each sample was shaken for 4 min in a 1% sodium hexametaphosphate (calgon) solution with ultrasonic energy before measuring. to disperse aggregates. We used 8 µm as cut-145 off to capture all clay and fine silt particles. Results were comparable to <20 µm (see SI material Figure A1), but <8 µm was selected because it is more relevant to our interest in studying the influence of smaller particles with large surface area on SOC concentration. In addition, particles <8 µm resulted in a reproducible fraction across soil types, unlike using only clay particles <2 µm ( Figure A1). Aluminum, Ca, K and Na concentrations were used to calculate the chemical index of alteration (CIA) after Nesbit and Young (1982), using the following equation: 150 where CaO is the amount incorporated in the silicate fraction. Correction is necessary for samples that contain carbonates and apatite (Nesbit and Young 1982). We adopted an approach introduced by McLennan (1993): The correction assumes that Ca is typically lost more rapidly than Na during weathering. If a soil sample contained inorganic C (Ctotal -Corg; used as a proxy for carbonates and apatite) and the CaO content was greater than that of Na2O in the same sample (n = 476), then the CaO 155 concentration was set to that of Na2O from the same sample (Malick and Ishiga 2016). After applying the correction, no obvious correlation remained between CIA and inorganic C ( Figure A2). The index increases (i.e. more highly-weathered soil) with the loss of Ca 2+ , K + , and Na + .
Samples were removed that contained missing or negative values for one or more of the above-mentioned parameters. In addition, a single sample with extraordinarily high SOC content (>22 wt-%) was excluded. This resulted in a total of 1,601 soil 160 samples (out of the original 2,002 samples) at 45 sentinel sites across 17 countries. Note that due to the sample selection, not all profiles had data from both topsoil and subsoil layers (Table B1).
The remaining soil samples (n = 1,601) were paired (based on longitude and latitude at the profile level) with mean annual temperature (MAT, °C) and mean annual precipitation (MAP, mm) from the WorldClim data set at 30 sec resolution (Fick and Hijmans 2017). Potential annual evapotranspiration (PET, mm) was added from Trabucco and Zomer (2019), who calculated 165 it after the Penman-Monteith method, based on the WorldClim data. Mean annual precipitation and PET were used to calculate an annual aridity index, defined as PET/MAP (Budyko 1974). Values >1 indicate water-limited (dry) regions and ratios <1 point to energy-limited (wet) regions. For the monthly aridity index, we used monthly climate data at the same spatial resolution and from the same data sources.
Land-cover data was used from the collected field data. The land-cover groups were re-classified into four major groups: 170 a) Cropland (including all cultivated plots), b) Forest, c) Grassland and d) Other (including mainly woodland, shrubland, and bushland, but also samples classified as other). Ten missing values were gap-filled from a prototype high resolution Africa land-cover map at 20 m resolution based on one-year of Sentinel-2A observations from December 2015 to December 2016 (http://2016africalandcover20m.esrin.esa.int/).
Due to the lack of precise data products for lithology and soil types in sub-Saharan Africa, we did not include these variables 175 in our analyses. Soils at AfSIS sites ( Figure 1) developed mainly from two parent material types: i) metamorphic and ii) volcanic rocks (Hartmann and Moosdorf 2012;Jones et al. 2013;Schlüter 2008), likely modified throughout the Quaternary.
i) Metamorphic rocks are most commonly found in West Africa, Southern Africa and Madagascar. These regions are characterized by old cratons, except for Madagascar, which is influenced by Mesozoic volcanism (Schlüter 2008). Most of these soils are classified as Ferralsols (WRB soil classification system; Jones et al. 2013). Related AfSIS soils from those 180 regions are usually highly weathered with low pHH2O values. In contrast, soils derived from ii) volcanic rocks are mainly found in the East African Rift System. They are usually younger and less weathered (Buringh 1970). Beyond the influence of volcanic rocks, Ca 2+ rich soils are frequent in East Africa.

Statistical analyses
We used three different statistical approaches, including linear mixed-effects models, regression trees and random forests to 185 determine geochemical and climatic parameters that best explain SOC variation across sub-Saharan Africa. In brief, we used linear mixed-effects models to handle the hierarchal sampling design of the AfSIS data set, whereas regression trees and random forests enabled us to account for non-linearities within the data. More precisely, we used regression trees as a qualitative tool to explore and understand the structure of the data, whereas random forests offered more generalizable models.
All statistical analyses were performed within the R computing environment (Version 4.0.0, R Core Team 2020). The R 190 Markdown file in the SI provides the code to reproduce all our analyses.
Linear mixed-effects modeling was performed by using the nlme R package (Pinheiro et al. 2020) to account for the nested sampling scheme (clusters within sites and two sampling depths within one profile). This allows the intercept of the regression to vary for each site, for each cluster within the same site, and for each sample within the same profile (Harrison et al. 2018).
The variance inflation factor was used to check for multi-collinearity among predictor variables with a threshold of <3.0 (Zuur 195 et al. 2010). To meet linear mixed-effects model assumptions and to standardize variation among variables, all continuous parameters were transformed to a normal distribution using Box-Cox transformation, followed by standardization to a mean of 0 and standard deviation of 1 by using the R package bestNormalize (Peterson and Cavanaugh 2019). The relationship between SOC and the predictors of the original data may not be linear.
To answer our 1 st research question, which soil properties and climate parameters best explain SOC content, we started from 200 a constant null model with siteID/clusterID/plotID as random effects and then extended the model step-wise by fitting the following sequence of fixed effects: MAT, PET/MAP, depth, land cover, clay and fine silt, pHH20, CIA, Mox (Alox + ½Feox), Caex, pHH2O*Mox. The order and selection of fixed effects was pre-defined based on a-priori knowledge out of a larger set of variables (Burnham and Anderson 2002), starting with large-scale climate variables and ending with fine scale physiochemical soil properties. The oxalate-extractable metals Alox and Feox were summed to Mox (Alox + ½Feox) to normalize the atomic mass 205 difference between Al and Fe (Wagai et al. 2020) and to account for their similar behavior over their concentration range ( Figure 5b). The maximum likelihood method and likelihood ratio tests (L.ratio) were applied to evaluate model performance and the statistical significance of the added fixed effects (Table B4- (Table 1). For this model, we did not include plotID as a random effect since each profile only 215 contained one sample in each depth model.
For the 2 nd research question, how geochemical controls on SOC content vary between environmentally distinct sub-regions, we grouped the data based on a) pHH2O, b) wetness, c) weathering, and d) land cover (Table 1). Soil pHH2O and weathering data were grouped with the number of categories chosen to maximize and equalize the number of samples in each category and to correspond with common pHH2O and weathering groups (Nesbit and Young 1982). In order to take seasonality of the 220 sites into account separately, the data were divided into three categories based on the number of wet months (i.e. months with P/PET > 1). Land cover was grouped based on the four pre-defined categories. For each category within each sub-group, we built a linear mixed-effects model as previously described, yet only included the geochemical properties (clay + fine silt, pHH2O, CIA, Mox, Caex, pHH2O*Mox) as fixed effects, since we intended to test if the importance of these predictors changed between environmentally distinct sub-regions (Table 1). When CIA or pHH2O were used to create the categories, they were not 225 included as a fixed effect in the corresponding sub-models. Neutral (6.1-7.5 pHH2O) 398 Alkaline (7.5-9.9 pHH2O) 400 Wetness (Number of wet months (P/PET > 1)) 0 wet months 572 with the same explanatory variables as for the linear mixed-effects models. However, no data transformation was needed due to the non-linearity of the models.
Regression tree analysis was applied to obtain an easily interpretable and non-linear model for the entire data set and for both depth layers (topsoil vs subsoil) that best describes the existing data (Breiman et al. 1984). Since regression trees are known to easily overfit data, we used a grid search to prune the model (Boehmke and Greenwell 2020) according to the minimum 240 number of data points required to attempt a split, and the maximum number of internal nodes between the root node and terminal nodes in order to minimize the cross-validation error (Breiman et al. 1984). The overall performance of the regression tree analysis was tested using five-fold spatial cross-validation (R package: mlr; Bischl et al. 2016). Spatial partitioning was used to split the data into five disjoint subsets, using the coordinates from each sample, and repeating the partitioning 100 times ( Figure A3). This results in a bias-reduced assessment of model performance (Brenning 2012;Lovelace et al. 2019). 245 Absolute values at the bottom of each node indicate the predicted SOC content [wt-%] and the percentage corresponds to the relative number of samples in this node ( Figure A5).
Random forest was used to build more generalized models since it is an ensemble of multiple decorrelated trees. Tuning of the model hyperparameters was done based on spatial tuning (R package: mlr; Bischl et al. 2016;Lovelace et al. 2019). These hyperparameters included the number of predictors used at each split, the minimum number of observations in a terminal node 250 and the fraction of samples used in each tree (Probst et al. 2019). The best hyperparameter combination search was done for the complete data set via a five-fold spatial cross-validation with one repetition. In each of these five spatial partitions, we ran 50 models to find the optimal hyperparameter combination (Lovelace et al. 2019).
Partial dependence plots were used to further explore the relationship between the predicted SOC content and the explanatory variables of the tuned random forest models (R package: pdp; Greenwell 2017). These plots were used to investigate the 255 marginal effect of individual explanatory variables (such as Alox, Caex, etc) on the predicted SOC content (Friedman 2001).
This allowed us to identify thresholds within the data and provided an indication of how important each explanatory variable was to predict SOC concentration across specific value ranges.

Data distribution across sub-Saharan Africa
All soil and climate variables spanned at least one order of magnitude (except MAT and PET), demonstrating the diversity of this continent-wide data set. Based on skewness, kurtosis, histograms, and Shapiro-Wilk-tests (data not shown for the latter two), no variable was normally distributed (Table 2).      Within the pHH2O sub-models, Mox was most important in the strongly acidic model. The opposite was observed for Caex ( Figure 4b), which corresponds to higher concentrations of Caex in neutral and alkaline soils compared with moderately and strongly acidic soils. However, Caex was also found to have a positive relationship with SOC in acidic soils ( Figure 5; Table  305 B2). The direction of the correlation between clay + fine silt and SOC concentration was not consistent across the four pH groups, in contrast to the other fixed effects (Table B2). The alkaline sub-model had the lowest marginal R² of all pHH2O submodels, which suggests that important predictors were missing (Figure 4b). Within the land cover models, the Cropland and grassland models had the highest marginal R² and were both dominated by Mox. The variation explained by Caex was smallest for the forest model, whereas it did not change much for the other three models (Figure 4e).
In summary, in the linear mixed-effects models, Mox was more important in wetter regions, acidic and highly weathered soils, whereas Caex was more important in drier regions, alkaline and less weathered soils. The other fixed effects usually did not 325 explain much of the SOC variation.

Regression tree and random forest
The root mean squared error (RMSE) for the topsoil regression tree was 1.47 wt-% (range: 0.80-3.11 wt-%) and for the subsoil regression tree was 0.67 wt-% (range: 0.44-2.26 wt-%); the relative RMSEs were 0.65% and 0.48%, respectively. In the topsoil regression tree ( Figure A5a) Feox, MAT and PET/MAP were the most important predictors to split and explain variation 330 in SOC concentration. About 23% of the SOC data could be explained by Feox and MAT alone. In general, higher Feox, Alox and Caex values resulted in higher SOC content. This was equally true for the subsoil tree ( Figure A5b). While much of the SOC variation was explained by climate parameters in topsoils, the subsoil regression tree was more dominated by geochemical variables, namely Feox and Alox. About 40% of the subsoil SOC variation could be explained by Feox only. In both trees, clay + fine silt content and land cover poorly predicted SOC. 335 In summary, topsoil and subsoil regression trees contained the same predictors, yet with climate variables playing a larger role in the topsoil regression tree and geochemistry having a larger influence in the subsoil regression tree. Overall, the results showed that the explanatory variables did not differ much between the depth intervals (topsoil vs subsoil), while their magnitude did. Figure 6: Partial dependence plot for each explanatory variable of the random forest models (topsoil and subsoil). X-axes always correspond to the range of the explanatory variable. Arrows indicate splitting points in the regression tree ( Figure A5). Each colored tick mark along the x-axes represents one sample.
The random forest models had a RMSE of 1.31 wt-% and a R² of 0.70 for the topsoil samples, and for the subsoil samples a RMSE of 0.87 wt-% and a R² of 0.72. Based on the partial dependence plots (Figure 6), Alox and Caex were important in 345 predicting SOC over the entire range of each variable (Figure 6a and b). However, in subsoils, the predictive power of Caex was reduced (Figure 6b). We observed a decrease in predicted SOC with increasing soil weathering status (CIA). However, due to the low number of samples with CIA values below 60%, the relationship should be interpreted with caution in this range ( Figure 6c). Clay + fine silt content had almost no effect on SOC, with only a weak positive trend in subsoil samples ( Figure   6d). The relationship between Feox concentration and predicted SOC content varied with Feox concentration. At low 350 concentrations (< 0.25 wt-%), there was a strong positive relationship between predicted SOC content and Feox. For higher concentrations, the predicted SOC content was relatively constant (Figure 6e). MAT correlated negatively over the entire range with predicted SOC concentration (Figure 6f). For PET/MAP, the predicted SOC content declined sharply as PET/MAP increased from 1 to 2 (transition from wet to dry water regimes; Figure 6g). The relationship between pHH2O and predicted SOC content was not strong (Figure 6h). For land cover, there was almost no difference between the classes within the same 355 depth layer; however, topsoils had higher SOC content (2.2 wt-%) compared with the subsoil samples across all land covers (1.5 wt-%; Figure 6i).

Discussion
Climate and geochemical variables are similarly important in explaining SOC variations across sub-Saharan Africa ( Figure 3); in line with findings from a global study (Luo et al. 2021). However, the explanatory power of climate and geochemical 360 variables is not independent of each other, reflecting the overall strong interaction between climate and geochemistry (Doetterl et al. 2015). Since it is likely that, in the long term, climate variables have predominantly indirect effects on SOC dynamics through their influence on soil geochemistry, we focus our discussion on those geochemical variables (Caex, Alox and Feox) that showed the highest explanatory power with respect to SOC content across all models. In addition, we discuss the role of depth, clay + fine silt content, and land cover in explaining SOC variations on a continental scale, since these were identified by other 365 studies to play an important role in SOC dynamics.

Exchangeable Calcium
Strong and positive relationships emerged between Caex and SOC concentration across all models, even though Caex concentration showed strong pHH20 and precipitation dependence ( Figure 5). Typical Ca 2+ sources in soils are from a) weathering of bedrock or surface rock formations, b) decomposition of Ca 2+ -rich organic materials, c) lateral movement of 370 Ca 2+ -rich water, d) atmospheric dust and rain deposition or e) anthropogenic inputs (Likens et al. 1998;Rowley et al. 2018).
Characteristically, Ca 2+ is weathered easily from both primary and secondary minerals (Likens et al. 1998). This usually leads to its accumulation in semi-arid to arid environments that are characterized by low rates of water flow through the soil profile that drives slow weathering rates and high pHH2O values (Figure 4b-d). In such environments, Ca 2+ plays an important role as a cation bridge that facilitates aggregate formation (Rimmer and Greenland 1976;Tisdall and Oades 1982) and bonding of 375 clay minerals to organic matter functional groups because of their divalent charge, relative abundance and modest hydration radius (Likens et al. 1998;Muneer and Oades 1989). However, we found that Caex was not only important in alkaline and lessweathered soils in dry regions, but also in acidic and more-weathered soils under wetter conditions ( Figure 5). It is likely that the main Ca 2+ source in those regions derives from atmospheric deposition (Albani et al. 2015;Goudie and Middleton 2001) and/or biological cycling by plants (Likens et al. 1998). This is supported by the fact that Caex showed a stronger relationship 380 with SOC in topsoil than subsoil layers (Figure 4a and 6b). Since land cover, which is a major driver of C inputs into the soil, did not show a strong relationship with SOC in the models, we speculate that biological cycling of Ca 2+ does not play a major role in explaining the observed differences in SOC content. Yet, further analysis with better proxies for biological Ca 2+ inputs is needed to test this hypothesis. High Ca 2+ concentrations in acidic soils can also be derived from the development of those soils from Ca 2+ -rich parent material which are out-of-equilibrium with modern climate conditions (Slessarev et al. 2016). 385 In conclusion, the important role of Caex in our data set was most pronounced in dry regions, dominated by alkaline and less weathered soils. However, it also played a role in explaining the SOC variation in wetter regions and more acidic soils, which is supporting the overall importance of Caex in stabilizing SOC.

Oxalate extractable Al and Fe
Similar to Caex, short range-order minerals (Mox, Alox and Feox) showed a positive and strong correlation with SOC content 390 across all models. The relationship was strongest in wet regimes, acidic and highly weathered soils (Figure 4b-d and 5b).
Hydrous oxides of Al and Fe are usually highly reactive because of their large specific areas with a high proportion of reactive sites (Parfitt and Childs 1988). This results in the adsorption of organic matter to Fe and Al oxides and the formation of stable soil aggregates (Tisdall and Oades 1982). In humid regions, high rates of mineral weathering may release Fe, Al and Si faster than crystalline minerals can precipitate (Rasmussen et al. 2018). Therefore, Feox and Alox are usually found to be important 395 in SOC stabilization in humid and acidic soils (Eusterhues et al. 2003;Kramer and Chadwick 2018).
In our study, short range-order minerals were also identified to play an important role for SOC stabilization in soils of sub-Saharan Africa. However, even though Alox and Feox showed similar trends in their concentrations (Figure 5b), we observed diverging behavior in their predictive power of SOC in the regression trees ( Figure A5) and the random forests (Figure 6a and   6e). For example, Feox was one of the most important explanatory variables in the regression tree and partial dependence plots, 400 although only within a very narrow range and at low Feox concentrations (Figure 6e), whereas Alox was important over the entire range (Figure 6a). Inagaki et al. (2020) showed that higher amounts of soil organic matter were co-localized with Fe in drier regions compared to sites with higher rainfall, whereas the content of Alox co-localized with organic matter was not affected by precipitation changes. This may be linked to the different oxidation levels of Fe. At higher precipitation levels, Fe oxides can be reduced, resulting in a release of associated SOC to the aqueous phase (Berhe et al. 2012;Chen et al. 2020;405 Thompson et al. 2011). This mechanism is probably responsible for the low correlation between SOC and high Feox concentrations in our data (Figure 6e), pointing to the fact that Feox can act as pedogenic threshold, depending on its oxidation level in the soil system.
In summary, short range order minerals also play an important role in SOC stabilization across sub-Saharan Africa, similar to other regions. However, Alox and Feox do behave differently in explaining SOC content, even though they showed covariance 410 in terms of their concentrations. Since we only have data for acid-oxalate extraction, we cannot speculate further about their diverging behavior in the models.

Depth
For the depth models, predictor differences were small between topsoil (0-20 cm) and subsoil (20-50 cm) samples (Figure 4a and 6). This may reflect the large depth increments for each of the two sampling depths, which may also explain the overall 415 small explanatory power of depth in the linear-mixed effects model (Figure 3a). Since the identified SOC-controlling factors were similar for both depth layers (Figure 4a), differences in SOC content were likely driven by the fact that subsoil samples usually contain less SOC due to lower C inputs at greater depth (Jobbágy and Jackson 2000). Soil erosion at some sites (data not shown) might also dilute differences between the two depth layers, since water and wind can permanently remove surface soil. 420

Clay + fine silt content
Clay + fine silt content (<8 µm) did not emerge as an important predictor of SOC concentration within our different models ( Figure 3, 4 and 5e). This is in contrast to some earlier studies that indicated that total clay content explains a large proportion of SOC storage and stabilization due to the sorption of soil organic matter to surfaces of clay minerals and building of aggregates (Amelung et al. 1998;Kahle et al. 2002). The relationship between SOC and total clay content is used in various 425 models to describe turnover and storage of SOC. However, this simplified correlation may not account for the different stabilization mechanisms related to various clay minerals, e.g. 1:1 vs 2:1 clay minerals (Oades 1988). Past research has yielded contradictory results on whether clay content explains SOC variation in subtropical and tropical soils or not. For example, Bruun et al. (2010) showed for various tropical soils that clay mineralogy, Feox and Alox are better explanatory variables for SOC content than clay content alone (<2 µm). In contrast, Quesada et al. (2020) found a strong relationship between clay and 430 SOC content for highly weathered soils in the Amazon Basin that are dominated by 1:1 clay minerals such as kaolinite, whereas soils in the same system, dominated by 2:1 clay minerals, showed stronger relationship between SOC and Al species. In a comparison between tropical and temperate soils, Six et al. (2002b) found that less C was associated with the clay and silt fraction (<20 µm) in tropical soils than in temperate soils. Even though these studies used various cut-offs to define the clay (<2 µm), clay and fine silt (<8 µm), and clay and silt fraction (<20 µm), they all illustrate that the relationship with SOC can 435 be complex in subtropical and tropical soils.
Due to the broad spatial scale, soils in the AfSIS data set contain different clay minerals (Butler et al. 2020). No clear relationship between clay + fine silt content (<8 µm) and SOC concentration was observed in the models, although the raw data indicate an overall positive trend between clay + fine silt content (<8 µm) and SOC concentration (Figure 2b). This positive relationship does not hold across all sites (Figure 2c and A4). Variable relationships with SOC (Table B2) may explain 440 the low predictive power of clay + fine silt content in this data set. Instead, variables that better capture the different behavior of clay-sized minerals, e.g. Caex, Feox and Alox, are likely more suitable soil parameters to explain the variation of SOC content even in highly weathered soils across sub-Saharan Africa. This is supported by the fact that a clay + fine silt-only model resulted in a very small R² (linear mixed-effects model: 0.01; random forest: 0.12; Table B3).

Land cover 445
The effect of land cover on SOC content was generally small in our models, even in topsoils (Figure 6i). Similar findings were recently encountered in a global study (Luo et al. 2021). One possibility may be that the relatively large 0-20 cm depth interval might dilute differences that could be more marked in the top few centimeters. However, we did observe differences in SOC content across land cover classes, with forests containing the highest amount of SOCespecially in topsoils (Figure 2a).
Croplands had higher SOC content than grasslands, opposite of what is commonly observed in temperate regions (Prout et al. 450 2020).
Another possible explanation for the absence of land cover as an important predictor in our models, is that we lacked the detailed data necessary to disentangle impacts of different practices and land-use history. The land cover class cropland contained a wide variety of cultivated plots while more detailed information about land management practices were missing. This is particularly important since prior research in other regions showed that SOC stock changes in tropical cropland soils 455 may be driven by C inputs (Fujisaki et al. 2018b). Additionally, historical land use may even play a more important role in explaining current stocks compared to recent land use (Vågen et al. 2006). Furthermore, land cover may covary with other parameters (temperature, precipitation, geochemistry) to such a degree that it is not an explanatory variable. This might be the reason why the sub-models grouped by land cover did not show a clear pattern ( Figure 4e). However, the land cover-only models resulted in small R² (linear mixed-effects models: 0.01; random forest: 0.10-460 0.16) which suggests that land cover is a poor predictor for our SOC data at this large spatial scale (Table B3). This may be due to the high variation of SOC content within the different land cover classes (Figure 2a). Land use changes and their impact on soil physico-chemical properties are scale-dependent and likely to be more distinct at smaller scales (Holmes et al. 2004(Holmes et al. , 2005. For example, land management and land degradation (i.e. erosion) are known to impact SOC stocks in regionals scales in sub-Saharan Africa (Winowiecki et al. 2016a). 465 Future studies are needed to better understand the impacts of land management and carbon storage potential in soils across sub-Saharan Africa at different scales (Fujisaki et al. 2018a;Vanlauwe et al. 2015). Overall, our data for sub-Saharan Africa suggests that SOC content on a continental scale is better explained by stabilization potential in soils (climate, geochemistry) than by different aboveground C inputs (vegetation).

Conclusions
We used a continental-scale data set from sub-Saharan Africa to test relationships between SOC content, various soil properties and climate variables in order to address our core research questions:

Which soil properties and climate parameters best explain SOC content variation across sub-Saharan Africa?
Parameters similar to temperate regions are important to explain SOC variation for tropical and subtropical soils under various 475 climate conditions across sub-Saharan Africa; namely Caex, Mox (Alox, and Feox), and PET/MAP. At this large spatial scale, climate and geochemical parameters are equally important and share some of the explained SOC variation. However, land cover and clay + fine silt content did not explain much of the variation in SOC content, in contrast to some findings from other regions and studies.
The selected climatic and geochemical parameters, which can be seen as proxies for most of the soil forming factors, explain 480 about two thirds of SOC variation across sub-Saharan Africa. The remaining third likely reflects those soil forming factors that were not or only poorly represented within our selected variables, namely organisms, relief and time. Given the large spatial scale targeted, it appears unlikely to be able to explain all of the SOC variation measured.

How do geochemical SOC-controlling factors vary between environmentally-distinct sub-regions?
In dry regions with alkaline and less-weathered soils, Caex explained most of the SOC concentration variation, whereas Mox 485 was more important in wetter regions with acidic and highly weathered soils. Still, Caex remained important in acidic and more weathered soils and in wetter regions. Feox as predictor of SOC content was only important at low concentrations in moderately weathered and wet soils. This observed trend leads to the assumption that Feox can play an important role in pedogenic thresholds in various soils across sub-Saharan Africa.
Overall, a combination of PET/MAP, Caex and Mox seems to be an appropriate set of variables to explain SOC-content variation 490 on a continental scale across sub-Saharan Africa. This does not imply that other variables, such as clay + fine silt content and land cover are no good predictors on a regional scale as shown by previous studies. However, the variables identified by this study showed a consistent predictive power of SOC content across various climate regions.
Future studies on large-scale SOC stabilization should consider measuring those soil properties to include them in models.
This would likely improve the predictive capacity of these models and contribute to closing the gap between our theoretical 495 understanding of SOC concentration across large scales and our ability to improve terrestrial biogeochemical projections that rely on existing models.

Code availability
As a R markdown file (pdf) in the supplement materials. 500

Data set availability
The soil properties data set used in this study is available from the authors upon reasonable request and under the following DOI: https://doi.org/10.34725/DVN/66BFOB (Vågen et al. 2021). Field data (i.e. land cover) for the sampling locations can be received from Vågen et al. (2013b). The climate data used (MAT, MAP and PET) can be downloaded from the sources cited: WorldClim: Fick and Hijmans (2017) and Trabucco and Zomer (2019). Land-cover data used for gap-filling can be retrieved from 505 http://2016africalandcover20m.esrin.esa.int/.

Author contribution
Conceptualization of the study for this manuscript was done by SvF, AH, AAB, ST, and SD, with input from EA, SH, SMG, KS, JS, TGV and LW. Data curation, investigation and resources were done and provided by GA, EA, SH, SMG, KS, AS, ET, TGV, EW and LW. The formal analysis, methodology, and visualization for the manuscript was performed by SvF with substantial input from 510 AH, ML, SD and ST as well as feedback from all authors. SvF wrote the initial draft and all authors were involved in the review and editing of the manuscript.

SD and AAB are liaison editors of the special issue Tropical biogeochemistry of soils in the Congo Basin and the African Great
Lakes region and JS is executive editor of the SOIL journal. However, none of them was involved in the review process of this 515 manuscript. All other authors declare that they have no conflict of interest.       Table B3: Summary table of R² for the different models (linear mixed-effects model and random forest) with different explanatory variables (clay and fine silt, land-cover, clay and fine silt + land-cover, full) included for the entire data set. The R² in brackets for the linear-mixed-effects models refer to the conditional R² which include the variation explained by the random effects (siteID/clusterID/plotID).