Additional soil organic carbon storage potential in global croplands

Soil organic carbon sequestration (SOCseq) is considered the most attractive carbon capture technology to partially mitigate climate change. However, there is conflicting evidence regarding the potential of SOCseq. The additional storage potential on existing global cropland is missing. SOCseq is region-specific and conditioned by management but most global estimates use fixed accumulation rates or time frames. Here, we show how the SOC storage potential and its steady state varies globally depending on climate, land use and soil. Using 83,416 soil observations, we developed a quantile regression neural 5 network that quantifies the SOC variation within soils with similar characteristics. This allows us to identify similar areas that present higher SOC with the difference representing an additional storage potential. The estimated additional SOC storage potential of 29 to 67 Pg C in the topsoil of global croplands equates to only 2 to 5 years of emissions offsetting and 32% of agriculture’s 92 Pg historical carbon debt estimate due to conversion from natural ecosystems. Since SOC is temperaturedependent, this potential is likely to reduce by 18% by 2040 due to climate change. 10

x SOC(x) q50 q75 management Figure 1. Diagram of a linear quantile regression fitted to the 50 th , 75 th percentiles to explain SOC content based on a covariate x. The points correspond to different sites, where the lower values of SOC are due to unmeasured limiting factors (such as management). A conventional regression model is usually adjusted to the mean (close to "q50") but a quantile regression is also capable of capturing the response of high-performing sites ("q75"). Since our model includes soil, climate and topography, we hypothesise that the difference is mainly due to management practices.
where BD organic and BD mineral are the densities of the organic and mineral fractions, 0.223 and 1.32 respectively, and 1.72 is the factor used to convert from OC to organic matter content. The use of pedotransfer functions and standardisation introduces uncertainties that will be fully accounted and propagated in a future study but that are unlikely to change the trends and 5 conclusions of this study.
Land cover information was extracted from the MCD12Q1v6 MODIS product, generated by the Land Processes Distributed Active Archive Center, U.S. Department of the Interior and U.S. Geological Survey (DOI: 10.5067/MODIS/MCD12Q1.006), specifically the IGBP classification (Loveland and Belward, 1997) matching the year of each sample.
From the initial 83,416 samples, 5% was held out as a test dataset. The remaining 95% was split into training and validation datasets using a bootstrapping routine (Efron and Tibshirani, 1993) in order to find the optimal set of hyperparameters. The covariates used as predictors include: a) digital elevation model (GTOPO30 (USGS EROS, 2020)), which is provided at 30 arc-second resolution; and b) long term mean annual temperature (MAT) and total annual rainfall (TAP) derived from information provided by WorldClim (Hijmans et al., 2005), at 30 arc-second resolution. All data layers were resampled to a 500 m grid and standardised using the mean and standard deviation estimated from the training dataset.

Quantile CNN model
In this work we used a simple fully-connected, multi-task neural network with three hidden layers of 20 units each and ReLu activation functions. Since we were interested in predicting multiple sections of the SOC distribution simultaneously, the head of the network consisted of five branches of a single unit with linear activation, which corresponds to the 25 th , 50 th , 75 th , 90 th and 95 th percentiles. Multi-task neural networks (i.e. that predict multiple targets simultaneously) have shown excellent predictive capability compared with predicting a single target in digital soil mapping and we refer the reader to Padarian et al. percentiles where included for regularisation (Ruder, 2017). The model was trained during 100 epochs, using a batch size of 32 samples and a learning rate of 0.001. For each percentile, the loss is estimated by: as per (Koenker and Hallock, 2001), where τ is the corresponding percentile and n is the number of training samples. The final, total loss corresponds to the sum of the five individual losses. 20 Since most global SOC models use a central estimate, such as the mean or median, here we assume the median (50 th percentile) as the current state of SOC in the world. Higher quantiles represent situations where better management practices are in place (Fig. 1). These represent locations in the world with similar climate, soil, topography, and land use where higher SOC content values can be observed. To avoid considering very extreme cases and ensure that the target SOC content was reached in an important proportion of the locations, we used a regression to the 90 th percentile as a technical maximum.

25
Acknowledging that increasing SOC content is a challenging task and this technical maximum might not be always achievable, we considered a regression to the 75 th percentile as an intermediate, more achievable storage goal.

Model interpretation
To understand how the different covariates control SOC distribution and to corroborate that our model is capturing sound relationships, we used an approximation of Shapley values (Lundberg and Lee, 2017) (SHAP), to estimate the contribution of 30 each covariate to the model predictions. This is a seldom used method in soil sciences but it has been applied to large extent digital soil mapping showing large potential to interpret complex models .

Future climate projections
To estimate the carbon stocks and additional storage capacity under future climate projections, we ran our model using down- The SHAP values corroborated that the model captured sensible relationships between the environmental covariates and SOC distribution ( Fig. 3a-d). Here, we describe the SOC dependence on environmental factors, but there are also intrinsic 20 edaphic factors that control soil carbon storage. Soil clay content has been recognised as a key factor in SOC stabilisation (Oades, 1988;Schimel et al., 1994) and, ideally, our model should include this significant relationship. We used global soil texture information (Hengl et al., 2017), however, the resulting model did not show the expected spatial global patterns. This is probably due to the current global texture maps not capturing enough local variation. In consequence, we excluded clay content from our model but we stress the need to add it to local models when good covariates are available.

25
For all the SOC percentiles, climatic variables have the greatest influence, closely followed by elevation and land use.  (Fig. 3a), the model assigns a modest negative contribution to land use. As we move towards higher percentiles, the negative effect of croplands (blue dots in the land use row) increases substantially, clearly differentiating itself from the other two classes (pastures and forest). The contribution of MAT also increases considerably, particularly for the observations with low temperatures, due to the decrease in carbon turnover (Carvalhais et al., 2014).
The dependence of SOC on temperature and precipitation at the global scale has been thoroughly described in the literature.
Our model captured this dependency, showing a clear interaction between both factors. In Fig. 3d

Global additional SOC storage potential
Considering the difference between a) the current and most common practices described by the central tendency of the SOC distribution (50 th percentile) and b) the higher ends of the SOC distribution (75 th and 90 th percentile) of similar soil under similar climate and defined land use as the additional storage potential, we estimated its magnitude and spatial distribution at the global scale. Our results show that the soils with the highest SOC additional storage potential are located towards the circumpolar region, which corresponds to areas with high carbon density (Stockmann et al., 2015). Continental climates present   (Minasny et al., 2017), estimates that increasing SOC stocks by 4‰ yr −1 could offset some fraction of annual CO 2 emissions into the atmosphere. There has been a debate on its actual potential as the assumption was made that all soils of the world would increase its SOC more or less uniformly. Accumulation of SOC, regardless of the rate, can only be achieved for a limited time as soils have a natural upper limit for carbon storage 10 which is also limited by management. Using our additional storage potential estimates, we generated global maps simulating a 4‰ yr −1 accumulation rate from the current condition (Fig. 5) and calculated the number of years to reach the practicable and technical maximum. It is important to note that soils will accumulate carbon at different rates, but we used the fixed rate of 4‰ yr −1 because it is currently being used to design policies in many places. Our results showed a large spatial variation of the maximum amount of years under the 4 per mille initiative, with a median period of 94 and 216 years to reach the 75 th 15 and 90 th percentiles, respectively. For both percentiles, the average capture rate is around 0.31 Pg C yr −1 which corresponds to only 3.5% of the C emissions used to estimate the 4‰rate (8.9 Pg C yr −1 ) (Minasny et al., 2017).
In addition to using a fixed accumulation rate, the results presented in Fig. 5 assume a linear accumulation. Of course, soils behave differently with SOC accumulation diminishing approximately exponentially in time (Minasny et al., 2017;Franzluebbers et al., 2012) but it might still be a valid estimate since SOC accumulation rates, in many cases, could be greater than 20 4‰(Francaviglia et al., 2019). Our estimates are in line with other studies that report croplands reaching a new, higher SOC concentration equilibrium after over a century (Smith et al., 1996;Soussana et al., 2004).Regardless of the accumulation rate, the total additional carbon storage potential of the topsoil in croplands is limited. Our total estimates of 29.0 and 66.6 Pg C for the 75 th and 90 th percentiles are equivalent only to 2 and 5 years of global emissions (49 Pg CO 2 eq. yr −1 of greenhouse gas derived from anthropogenic activity (Pachauri et al., 2014)). Our practicable potential (29 Pg C) is close to the 31.2 Pg C historical debt due to agriculture estimated by a recent study (Sanderman et al., 2017), although their estimate is at the lower end of the 21-186.0 Pg C range derived from a) their 62 Pg C estimate for the current SOC stock in croplands and b) the fact that soils of agroecosystems contain 25% to 75% less SOC than their counterparts under natural ecosystems (Lal et al., 2018). Using our model, we estimated a historical carbon debt ranging between 10 and 174 Pg C by replacing all croplands with a range of region-specific natural ecosystems, from non-forest with 30 average (50 th percentile) carbon density to forest with high carbon density (90 th percentile). If we consider a midpoint within the latter range (92 Pg C), our practicable potentials account for only 32% of the historical carbon debt due to agriculture (72% for the technical maximum). We only considered croplands in our analysis as these areas have lost more SOC. There is potential for managed grasslands, however currently we cannot differentiate managed and natural grassland using satellite imagery at the global level.
Compared with previous estimates, our results show a slightly higher additional carbon storage potential for global croplands.
A total of 18 to 37 Pg C, under medium and high storage scenarios with accumulation rates of 0.9 and 1.85 Pg C yr −1 and the assumption of reaching a new equilibrium after 20 years as been reported (Zomer et al., 2017), with estimates based on gridded predictions (Hengl et al., 2017) and considering a uniform sequestration rate for all croplands. A slightly wider range 5 of sequestration potential has been reported (Lal et al., 2018) with a total of 7.63 to 43.25 Pg C over a period of 25 to 50 years, assuming the adoption of region-specific best management practices. According to an extensive review (Fuss et al., 2018), the best estimate of realistic technical potential (close to the median of the minimums of their reviewed studies) is between 20.1 and 46.2 Pg C until 2050. From that year onwards, the accumulation rates could not be maintained due to sink saturation. It is important to remember that our approach estimates the additional storage potential based on real observations, within a similar 10 climatic context, and not on technical accumulation rates of specific management practices. Since our approach is not based on fixed technical accumulation rates, our results are not necessarily constrained to the 20-50 years period which most studies consider, and could be another reason for our higher estimates.
Several studies have raised concerns about the barriers to sequester SOC (Rumpel et al., 2020). One of the main advantages of our study is that we use a large global database and the estimates are based on real world observations, meaning that a group 15 of locations already reached the target SOC stocks for a given combination of environmental conditions. However, a current limitation to have in mind is that our model is based on biophysical factors but does not take into account socio-economic barrier, disproportionally affecting developing countries, that can imped the adoption of new management practices.

Effect of climate change
An important point to consider is that the sequestration potential could vary under the future climate. Given the high dependence of SOC on temperature, it is expected that relatively fast global warming will shift most ecosystems toward a lower SOC equilibrium ( Fig. 3d-e) and that this effect will be more pronounced in areas with larger SOC concentrations (Fig. 3a-c). These projections have been reported in many studies (Robinson, 2007;Crowther et al., 2016;Melillo et al., 2017) and they are likely to result in reduced sequestration potential.
Utilising CMIP6 downscaled future climate projections for a moderate "business as usual" shared socio-economic pathway (SSP3-7.0), we estimated a mean reduction of 18% in the total sequestration potential in croplands in the next 20 years, from 29.0 Pg C to 23.8 Pg C, and from 66.6 Pg C to 54.7 Pg C for the 75 th and 90 th percentiles, respectively. That estimate does not include the drop in carbon concentration of the current state (50 th percentile) which implies an additional loss of 7.4 Pg C.
3.4 SOC sequestration still a priority 10 We have shown that total amount of additional carbon croplands can store is relatively modest compared to the sustained emission of greenhouse gases derived from anthropogenic activity. It is unreasonable to expect that a single sector can offset global emissions, especially considering their increasing trend. Nevertheless, incorporating carbon into soils by improving management practice should still be a priority to ensure food security. According to our estimates, agriculture generates a carbon debt, so we need to properly manage croplands to be sustainable and avoid the expansion of agricultural land due to 15 loss of soil productivity.
Agricultural productivity has been directly related to SOC contents. If reduced below some critical limits, soil condition declines and so does crop yield. By increasing SOC to its practicable and technical upper limits (75 th and 90 th percentiles), between 224 and 418 million hectares could be taken above the critical SOC limits of 1.1% and 2% for tropical (Aune and Lal, 1997) and temperate (Loveland and Webb, 2003) areas, respectively.

20
There are still many knowledge gaps that need to be filled and that could help to improve our current model. As mentioned in previous sections, detailed texture information at the global scale is still required and that is also applicable to other soil properties that highly correlate with SOC (Rasmussen et al., 2018). Additionally, our approach places management practices into different quantiles of a distribution based on their SOC density, but it is not capable of identifying them. More research is needed to identify region-specific management practices that can enhance soil carbon. Many countries have a national registry 25 of lands (e.g. Land Parcel Identification in Europe (Leo and Lemoine, 2001)), at least for the management of agricultural subsidies, which should be integrated into soil information systems and added to this type of model.