Comment on soil-2021-19

Throughout the paper, can authors use the treatment names as in the original experimental dataset (Wolf, 2018 page 6): CONV = CMT conventional maize-tomato, ORG=OMT organic maize-tomato, and (page 5) WCC – winter cover crop? I understand that Tautges and Chiartas 2019 used the CONV, ORG notation, but a brief explanation would be helpful. The theory of cover crops providing a macropore system for transport of DOC is interesting, but the data do not support this theory (no measurement of porosity, change in bulk density, or changes in soil hydraulic properties). It is appropriate for a discussion, but I might exclude this as a main finding from the abstract.


Introduction:
I appreciate that the abstract and introduction mention soil health, but there is no clear definition or explanation of its importance to the paper. Either simply remove this term and focus solely on soil carbon and microbial processes, or please directly connect soil health and often associated shallow sampling regimes to this "outsized perceived role in ecosystem services". This is a good argument and dataset to support deeper sampling. Authors may also include references summarized by Mobley et al 2015 in their article "Surficial gains and subsoil losses of soil carbon and nitrogen during secondary forest development": Post & Kwon, 2000;West& Post, 2002 review 360 articles on land use change, with only 10% sampling below 30cm. In this paragraph, please clarify, at what depth are the authors designating topsoil v subsoil for this study? This first paragraph of the introduction discusses "longer C residence times" of deep soil C, which requires further explanation.
Overall, the introduction structure can be strengthened with by clarifying topic sentences (e.g., specify cover crops L51) and adding updated references. Can you support the Jenny citation with more modern references, even Brady and Wei Nature and Properties of Soils, or USDA technical information "Designations for Horizons and Layers" in Soil Survey Manual -Ch 3 (https://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/ref/?cid=nrcs142p 2_054253#designations). The introduction structure may flow better using paragraphs separated into chemical, physical and biological controls or layered as (1) depth; (2) chemistry of C inputs and stabilization at depth; (3) management impacts at depthspecifically cover crops; and (4) management interaction with other factors (microbial).
The introduction touches upon stoichiometry, a critical highly manipulated factor in managed conventional systems that effects soil C storage. To go further in depth on soil chemistry (e.g., at L40), authors can address changes over time in stoichiometric constraints on decomposition (e.g., see Soong et al 2019 "Microbial carbon limitation: The need for integrating microorganisms into our understanding of ecosystem carbon cycling").
Also, authors can mention higher physical disturbances in surface soils (L55), and the types of management associated with cover crops, such as crimping/rolling. Please also include specific soil type, climate and cropping system when comparing to other studies, otherwise direct comparisons are not particularly informative. Can also cite McClelland et al 2020 "Management of cover crops in temperate climates influences soil organic carbon stocks: a metaâanalysis" that analyzed soils only down to 30cm.
As for the sampling strategy by depth, can the authors please describe why they separated out into these depths 0-15, "intervening", and the subsurface as 60-100cm? How do these depths compare to the horizons in these two soils? (Looking up the series descriptions Yolo has A horizons down to 66cm and then C horizons, and Rincon has A down to 20, B 20-100cm. Should the analysis be completed on A and B horizons rather than depth profiles?) How do these depths relate to roots of corn (100cm+), tomato (60cm+) and cover crops (variable)? Please stay consistent with the terms "subsoil" versus "subsurface soil", as depth is a major component of this study. Can authors please justify why 15-60cm is combined into a single sample in 2018, when historical data had an additional delineation? (Is it simply limited time/costs or another reason?) The overarching question and hypothesis require further editing to clearly lead into the results and discussion. There seems to be a disconnect between the main question and the methods of this paper. The main question includes "carbon formation" (does that mean microbially processed C? or stabilized C formation?) and "storage processes" (that obviously includes aggregation, but the carbon content of these size classes was not measured). Also, what is meant by the term "SOC-related indicators", does that mean SOC stability or reactivity-related indicators? As written the hypotheses are just predictions, there is no description as to the mechanisms behind the described expected results. An interesting hypothesis arises in the discussion around cascade theory, can authors pull that into the introduction? This can provide a way to integrate the study of carbon chemistry (FTIR) and microbial biomarkers that otherwise are not included in the hypotheses. Finally, I agree with the previous reviewer comment, that the treatments CONV (fertilizer), CONV+WCC (fertilizer + cover crops), and ORG (compost + cover crops) do not disentangle the effect of compost. I don't think there is there a treatment in the Century Experiment that was maize-tomato plus compost only or fertilizer + compost, but this should be mentioned as a limitation int eh study, particularly in the subtraction of FTIR spectra.
This manuscript covers many aspects of deep soil C and management, no need to emphasize the complicated factors of global change (L87) at the end of the introduction, unless those are also analyzed over time.

Materials and Methods:
Thank you for a concise description of the site and experiment. I recommend authors also add basic climate data such as climate type, mean annual mix and max temperature, mean annual precipitation, and also specific 2018-19 climate data for comparison. Authors write that the 'horizon information' is available from Wolf et al, but I only can find soil chemistry by depth, not the soil description in that dataset (horizon delineations are online). Can authors add in the horizon depth into the methods for both the Yolo and Rincon soils, and key chemistry such as pH and texture? A table in the materials and methods section could organize all of this soil and climate information for quick reference. This could also include the other key management notes that will impact DOC transport, such as the conversion from furrow to drip in 2014, as well as information from the 2018-2019 season such as crop planting/harvest dates, total irrigation amount, and the anomalous compost application in September 2019. These details can then be incorporated smoothly into the discussion.
The differentiation between the sampling and analysis of the older data and 2018-2019 methods is now clearer. Thank you for the new methods section. However, without hypotheses asking seasonal questions over time -why sample at four time points in a single year? Particularly as the authors state that a single year of data is not sufficient to look at differences at depth (L81-82) to justify use of the historical data. Perhaps authors can create one or two hypothesizes for the 2018-19 season, and other for the long-term effects and historical data.
The use of PLFA and FTIR is not justified from the hypotheses or introduction. The use of these techniques, particularly stress ratios for PLFA needs to be explain within a wider context in the introduction. 2.7 Please clarify the statement that 9 out of 18 plots were sampled for hydraulic conductivity (those under tomato). Were half of the plots under corn and the other under tomato during this 2018 sampling? That needs to be included in the methods section. Or are you referencing to the full 18 plots of all the Century experimental treatments? Finally, why are 8 dates included for soil moisture content, when soils are sampled only 4 times? 2.8 I have some concern over the use of averaging and subtraction of the spectra. What was the variance between the historic soils of 15-30 and 30-60 cm? What information is provided via subtraction of the conventional plus cover crop from the organic spectra? I am unfamiliar with this subtraction analysis, so I am curious, what information is revealed from subtraction as the reflectance intensity does not represent quantity, but rather soil chemical signature? 2.9 Can the authors please describe the details of the ANOVA. Was this a mixed effect model accounting for the block design? Was there an effect of block? (That difference would be interesting to see due to the two soil types). It would be helpful if the authors state that they checked normality of the data prior to ANOVA. If variability was high for certain metrics (hydraulic conductivity), it seems there may be some outliers, how were those assessed?
The lack of differences in the field may simply be due to low power with only three field replicates. Rather than splitting the data by depth to do comparisons between treatments, can the authors run an analysis that accounts for autocorrelation over depth? On that same note, do authors need to account for repeated measures across sampling dates in 2018-2019 and within the historical data?
I appreciate access to the data and code used for this analysis. Thank you for supporting trasparancy in data analysis.

Results:
3.1 The cumulative inputs over 25 years are useful, but would be more comparable to other studies if averaged per year. This data also may be well suited for a table including all C inputs and nutrient inputs over the 25 year period (transform Fig 1 to Table 1 using Mg/ha/yr). Perhaps with the level of detail from the Century Experiment on all organic inputs, the statistical analysis could incorporate the treatments as continuous variables (amount of mineral/organic N input) rather than categorical variables? L227 If a result is non-significant, than I would remove any interpretation of 'increase'.  You can zoom in on the y-axis and add precipitation and irrigation events. Otherwise, a simple average across the time and bar graph or box plot would tell the story more clearly, since the statistical analysis was not over time.
L270 Why do authors state "largest seasonal variation" in nutrient data was in June, when only mineral N and DOC were highest in June? S and P were higher in August.
3.7 Authors must introduce microbial stress indicators earlier in the introduction and hypothesis. How does this relate to stoichiometry and soil C stability?

Discussion:
Authors list the key finding of increased SOC and then write what I perceive as the hypothesis of the paper: "that high concentrations of mobile C and essential nutrients for microbial activity provided by the compost, combined with the easier movement of water downward associated with a history of cover-cropping, helped transport the material needed to build C in the subsurface." Having this in the introduction will help to set up the statistical analysis, results, and discussion. However, this hypothesis was not supported by the aggregation data or the hydraulic conductivity data.
Please go into more detail on how no differences in aggregation "rule out" increased pore space as the increase in water content. What is the alternative explanation? Is this just an issue with statistical power? L335 This also seems like a great candidate sentence for another hypothesis: "Due to the fact that tillage in all systems would likely eliminate differences among them in the top 30 cm, we would expect any differences in macroporosity and infiltration among treatments to be most affected by those roots that extend below the 30 cm plow layer". This is the first mention of tillage depth. Please specify the depth of disking in the methods, and if this was applied to the conventional fields as well.
L340-345 This paragraph on cascade theory describes why FTIR analysis was necessary. This also should be included, or at least alluded to, in the introduction. This is a really interesting discussion (L350-355), and could also be a good place to bring up the variability in the conductivity data. L375 Support with values from the results. The nutrient values may all be better represented by tables, although the graphs show dynamics across the season, I would argue that depth, not season is the key factor in this analysis.
L380 Consider rewriting this section title, as there was no direct comparison to a compost treatment alone.
L382 Is the microbial processing near the surface based on the FTIR data? Please reference.
L388-L390 This paragraph seems speculative. Please input FTIR data that supports these ideas (C chemistry from this dataset).
L388 What does "high variability of soil C measurements" refer to? Dry combustion measurements of total C are very consistent.

Conclusion:
L406-407: "This was facilitated by increased soil macropores created by cover crop roots leading to higher rates of transport of soluble C". Macropores were not analyzed in this study, and no increases were found in hydraulic conductivity or aggregation, please clearly delineate quantified results versus hypotheses in this conclusion.