<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">SOIL</journal-id><journal-title-group>
    <journal-title>SOIL</journal-title>
    <abbrev-journal-title abbrev-type="publisher">SOIL</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">SOIL</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">2199-398X</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/soil-12-619-2026</article-id><title-group><article-title>Estimating soil carbon sequestration potential with mid-IR spectroscopy and explainable machine learning</article-title><alt-title>Estimating carbon sequestration potential with mid-IR spectra</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Hu</surname><given-names>Yang</given-names></name>
          <email>yang.hu4@postgrad.curtin.edu.au</email>
        </contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Viscarra Rossel</surname><given-names>Raphael A.</given-names></name>
          <email>r.viscarra-rossel@curtin.edu.au</email>
        <ext-link>https://orcid.org/0000-0003-1540-4748</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Soil &amp; Landscape Science, School of Molecular &amp; Life Sciences, Faculty of Science &amp; Engineering,  Curtin University, GPO Box U1987, Perth WA 6845, Australia</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Yang Hu (yang.hu4@postgrad.curtin.edu.au) and Raphael A. Viscarra Rossel (r.viscarra-rossel@curtin.edu.au)</corresp></author-notes><pub-date><day>13</day><month>May</month><year>2026</year></pub-date>
      
      <volume>12</volume>
      <issue>1</issue>
      <fpage>619</fpage><lpage>631</lpage>
      <history>
        <date date-type="received"><day>1</day><month>October</month><year>2025</year></date>
           <date date-type="rev-request"><day>14</day><month>October</month><year>2025</year></date>
           <date date-type="rev-recd"><day>3</day><month>April</month><year>2026</year></date>
           <date date-type="accepted"><day>20</day><month>April</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Yang Hu</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://soil.copernicus.org/articles/12/619/2026/soil-12-619-2026.html">This article is available from https://soil.copernicus.org/articles/12/619/2026/soil-12-619-2026.html</self-uri><self-uri xlink:href="https://soil.copernicus.org/articles/12/619/2026/soil-12-619-2026.pdf">The full text article is available as a PDF file from https://soil.copernicus.org/articles/12/619/2026/soil-12-619-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e90">Soil carbon sequestration refers to the process of capturing atmospheric carbon through plant photosynthesis and storing it in soil as organic carbon. The primary mechanism for carbon sequestration is the adsorption of organic carbon molecules onto the mineral surfaces of the soil's fine fraction (clay <inline-formula><mml:math id="M1" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> silt <inline-formula><mml:math id="M2" display="inline"><mml:mo>≤</mml:mo></mml:math></inline-formula> 20 <inline-formula><mml:math id="M3" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m), forming mineral-associated organic carbon (MAOC). Soil has a finite capacity to stabilise and sequester organic carbon, known as carbon saturation capacity, which depends on the proportion of reactive minerals in the soil. The difference between the current MAOC content and the carbon saturation capacity is referred to as the organic carbon saturation deficit (<inline-formula><mml:math id="M4" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) or sequestration potential. Fourier-transformed (FTIR) mid-infrared (mid-IR) spectroscopy can simultaneously measure soil properties relevant to carbon stabilisation: organic carbon functional groups, clay and iron-oxide mineralogy and particle size. Therefore, we hypothesise that mid-IR spectroscopy can effectively and accurately estimate <inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. Here, we aim to (i) develop spectroscopic models to estimate the MAOC and <inline-formula><mml:math id="M6" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> of 482 Australian topsoil samples, (ii) model MAOC and <inline-formula><mml:math id="M7" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> using mid-IR spectra and an interpretable machine learning  algorithm, and (iii) further interpret the MAOC and <inline-formula><mml:math id="M8" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> models using SHapley Additive exPlanations (SHAP). Using frontier line analysis, we fitted a function to the upper envelope of the MAOC vs. clay <inline-formula><mml:math id="M9" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> silt relationship to derive <inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. We recorded mid-IR spectra of the samples and used the regression trees method CUBIST to model MAOC content and <inline-formula><mml:math id="M11" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. We interpreted these models by examining the regression trees and using SHAP. The models were unbiased and estimated MAOC content with <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.86 and RMSE of 2.77 (g kg soil<sup>−1</sup>), and <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> with <inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.89 and RMSE of 3.72 (g kg soil<sup>−1</sup>). Model interpretation showed that <inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> estimates relied on negative interactions with absorptions from organic matter functional groups and positive interactions with absorptions from clay minerals. Our results demonstrate that mid-IR spectra can effectively estimate MAOC and soil <inline-formula><mml:math id="M18" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, providing a rapid, cost-effective method for assessing and monitoring this critical soil function.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>Australian Research Council</funding-source>
<award-id>DP210100420</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e290">Soil organic carbon (C) sequestration refers to the process by which plants capture atmospheric C through photosynthesis and store it in the soil. The United Nations Framework Convention on Climate Change (UNFCCC) has identified soil C sequestration as a critical, nature-based process for withdrawing atmospheric carbon dioxide (CO<sub>2</sub>) <xref ref-type="bibr" rid="bib1.bibx47" id="paren.1"/>. Soil organic C sequestration also improves soil health, food and nutritional security, water quality, biodiversity, and elemental recycling <xref ref-type="bibr" rid="bib1.bibx26" id="paren.2"/>. Thus, it is crucial to estimate the amount of C stored in soil and how much it could store in the future to advance our scientific understanding of C cycling. This understanding will provide the foundation for land managers to develop practices that enhance C sequestration and for policymakers to formulate climate change adaptation strategies. However, rapidly, cost-effectively, and scientifically estimating the soil C saturation deficit remains challenging.</p>
      <p id="d2e308">Soil C from plants begins as particulate organic C (POC). Over time, soil microorganisms consume this POC, and some of it is broken down into smaller molecules. Some of these molecules are protected from further decomposition through adsorption onto mineral particles, forming mineral-associated organic carbon (MAOC) and providing protection within soil microaggregates <xref ref-type="bibr" rid="bib1.bibx17 bib1.bibx40 bib1.bibx5" id="paren.3"/>. Soils with higher silt and clay content have a larger mineral surface area and a greater capacity to adsorb and stabilise C. <xref ref-type="bibr" rid="bib1.bibx16" id="text.4"/> found a positive linear relationship between the proportion of clay and silt (particles <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">20</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M21" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m) and the amount of C in this fraction. This relationship has been used to estimate the soil's maximum capacity to stabilise C <xref ref-type="bibr" rid="bib1.bibx16" id="paren.5"/>, referred to as the C saturation capacity (<inline-formula><mml:math id="M22" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">sat</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>). The difference between actual MAOC content and <inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">sat</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is known as the C saturation deficit (<inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) or C sequestration potential.</p>
      <p id="d2e372">Subsequent studies, such as <xref ref-type="bibr" rid="bib1.bibx40" id="text.6"/>, also found a direct relationship between MAOC and the amount of clay and silt in soil, further recognising that this relationship depends on the reactivity of the soil's clay minerals. Many researchers have since used such linear relationships to estimate <inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">sat</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. <xref ref-type="bibr" rid="bib1.bibx14" id="text.7"/> found this approach underestimated <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">sat</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and proposed a boundary line method as an alternative, fitting a line to the upper tenth percentile of the data in the MAOC vs. clay and silt relationship. More recently, the relationship has been fitted using quantile regression at the 95th percentile of the data <xref ref-type="bibr" rid="bib1.bibx15" id="paren.8"/>. However, these methods underestimate <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">sat</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> by fitting through the data rather than capturing the maximum values. To address this shortcoming, <xref ref-type="bibr" rid="bib1.bibx51" id="text.9"/> proposed using a bootstrapped frontier lines analysis that fits an envelope to the maximum values of the relationship between MAOC and the soil's fine fraction, thereby preventing underestimation of the soil's C storage capacity and providing uncertainty estimates. Additionally, considering the maximum attainable C storage under a given environment (<inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">Amax</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) approaches maximum asymptotically <xref ref-type="bibr" rid="bib1.bibx19 bib1.bibx51" id="paren.10"/>, the frontier line approach better reflects <inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">Amax</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> by showing the asymptotic increase in soil C storage capacity with increasing soil clay and silt content <xref ref-type="bibr" rid="bib1.bibx51" id="paren.11"/>.</p>
      <p id="d2e449">Establishing reliable estimates of <inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">Amax</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> using the methods above requires many soil samples with measured MAOC and clay-plus-silt content. Measuring MAOC involves fractionating soil to isolate the C in the <inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">50</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M33" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m soil fraction and then measuring the organic C content <xref ref-type="bibr" rid="bib1.bibx34" id="paren.12"/>. Fourier-transformed (FTIR) mid-infrared (mid-IR) spectroscopy offers a faster, more cost-effective, and repeatable alternative. It measures soil composition by capturing interactions between mid-IR wavelengths and the vibrations of bonds in soil molecules, providing data on a soil's organic and mineral composition <xref ref-type="bibr" rid="bib1.bibx53" id="paren.13"/>. These spectra have been used to estimate organic and inorganic C, clay, sand and silt contents, cation exchange capacity and other chemical, physical and biological properties through calibration that relates the measured soil properties to their spectra <xref ref-type="bibr" rid="bib1.bibx42" id="paren.14"/>.</p>
      <p id="d2e503">Mid-IR spectra serve as an integrative “molecular fingerprint” of the soil, reflecting its mineralogy, organic matter, and physical properties <xref ref-type="bibr" rid="bib1.bibx50" id="paren.15"/>, which directly determine a soil's biological activity, soil structure and ultimately the ability to sequester C <xref ref-type="bibr" rid="bib1.bibx58" id="paren.16"/>. <xref ref-type="bibr" rid="bib1.bibx4" id="text.17"/> estimated the C saturation deficit (<inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) of New Zealand soils using pedotransfer functions derived from the quantile regression approach, modelling <inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> with mid-IR spectra through partial least squares regression (PLSR), showing good predictability. Similarly, <xref ref-type="bibr" rid="bib1.bibx21" id="text.18"/> estimated the <inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> of Australian soils using a quantile regression approach and modelled it with mid-IR spectra coupled with PLSR, also achieving good predictability. We did not find other research that estimates <inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> using soil spectra. We hypothesise that mid-IR spectra, combined with explainable machine learning, can be used to estimate soil MAOC content and <inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> while also providing insights into how the model uses spectral absorption features to identify the soil constituents important for prediction. Thus, we aimed to: <list list-type="order"><list-item>
      <p id="d2e576">Develop spectroscopic models to estimate the MAOC content and the <inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> of Australian soils using mid-IR spectra with the regression trees algorithm CUBIST;</p></list-item><list-item>
      <p id="d2e591">Interpret these models by analysing the CUBIST rulesets and SHapley Additive exPlanations (SHAP) values to understand how the absorptions of soil organic and inorganic constituents affected model prediction.</p></list-item></list></p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Methods</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Soil samples</title>
      <p id="d2e609">We used 488 topsoil samples from 275 sites across Australia (Fig. <xref ref-type="fig" rid="F1"/>). The soils were sampled from three depth layers (0–10, 10–20 and 20–30 cm). All soil orders in the Australian soil classification were present, except Anthroposol and Organosol <xref ref-type="bibr" rid="bib1.bibx46" id="paren.19"/>. Kandosols were the most abundant soil type, followed by Tenosols and Calcarosols, Chromosols and Vertosols, while Rudosols, Dermodols, Kurosols, Ferrosols, and Podosols were present in smaller numbers. Three Hydrosols were excluded from further analysis due to the distinct C storage mechanisms in anoxic soils <xref ref-type="bibr" rid="bib1.bibx41" id="paren.20"/>.</p>

      <fig id="F1"><label>Figure 1</label><caption><p id="d2e622">Location of sampling points.</p></caption>
          <graphic xlink:href="https://soil.copernicus.org/articles/12/619/2026/soil-12-619-2026-f01.png"/>

        </fig>

      <p id="d2e631">The sampling area spans the main Köppen-Geiger climate zones <xref ref-type="bibr" rid="bib1.bibx6" id="paren.21"/>, with most of the samples collected from arid hot deserts, with smaller proportions from arid hot steppes and tropical savannahs. Samples were primarily collected from areas with minimal human impact, particularly nature conservation sites, native vegetation grazing lands, and other minimally used areas. Only a small proportion of samples came from production or intensive land use. The vegetation at the sampling sites was diverse, comprising 24 major vegetation groups, with eucalyptus woodlands being the most common <xref ref-type="bibr" rid="bib1.bibx9" id="paren.22"/>. Most samples were taken from native vegetation or natural bare land, with the rest from non-native vegetation or cleared land <xref ref-type="bibr" rid="bib1.bibx1" id="paren.23"/>.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Soil fractionation</title>
      <p id="d2e651">Soil samples were fractionated through physical granulometric separation. The samples were dispersed in deionised water using an ultrasonic probe (Sonics VCX 500 Sonicator, Newtown, Connecticut) with an energy output of 500 J mL<sup>−1</sup> for 200 s <xref ref-type="bibr" rid="bib1.bibx56" id="paren.24"/>. After dispersion, the samples were fractionated using an automated wet sieving apparatus (Analysette 3 Pro, Fritsch GmbH, IdarOberstein, Germany) with 250 and 50 <inline-formula><mml:math id="M41" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m sieves. The resulting soils were in three size fractions: macroaggregates (2000–250 <inline-formula><mml:math id="M42" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m), microaggregates (250–50 <inline-formula><mml:math id="M43" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m), and the fine fraction (<inline-formula><mml:math id="M44" display="inline"><mml:mo lspace="0mm">≤</mml:mo></mml:math></inline-formula> 50 <inline-formula><mml:math id="M45" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m). The fractionated samples were then oven-dried at 60 °C overnight and ground to approximately <inline-formula><mml:math id="M46" display="inline"><mml:mo>≤</mml:mo></mml:math></inline-formula> 80 <inline-formula><mml:math id="M47" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m, before the organic C content of each size fraction was measured using an elemental analyser (SoliTOC Cube, Elementar Analysensysteme, Hanau, Germany). The organic C content of the fine fraction, representing MAOC, was recorded in grams per kilogram of whole soil.</p>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>mid-IR spectroscopy</title>
      <p id="d2e732">The whole soils (sieved to <inline-formula><mml:math id="M48" display="inline"><mml:mo>≤</mml:mo></mml:math></inline-formula> 2 mm) were air-dried before fine grinding to <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:mo>≈</mml:mo><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M50" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m. The mid-IR spectra of the finely ground samples were measured with a diffuse reflectance infrared Fourier transform (DRIFT) spectrometer (Bruker Invenio HTS-XT, Massachusetts, United States). Spectra were recorded from 4000–450 cm<sup>−1</sup> with a spectral resolution of 4 cm<sup>−1</sup> and measuring 64 scans per sample. The spectrometer was calibrated with a gold standard before measuring each sample plate with 23 samples (Bruker, Massachusetts, United States). Reflectance spectra were recorded in <inline-formula><mml:math id="M53" display="inline"><mml:mrow><mml:mi>log⁡</mml:mi><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>R</mml:mi></mml:mfrac></mml:mstyle></mml:mrow></mml:math></inline-formula> (apparent absorbance).</p>
<sec id="Ch1.S2.SS3.SSSx1" specific-use="unnumbered">
  <title>Silt <inline-formula><mml:math id="M54" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> clay content</title>
      <p id="d2e814">The silt and clay content of the whole soil was determined using mid-IR spectroscopic modelling with CUBIST <xref ref-type="bibr" rid="bib1.bibx18" id="paren.25"/>. The silt % model has an <inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> value of 0.84 with a concordance of 0.92, and the clay % model has an <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> value of 0.90 with a concordance of 0.95. The estimated silt and clay content in % was combined for further analysis.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS4">
  <label>2.4</label><title>Frontier lines and calculation of <inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></title>
      <p id="d2e862">The MAOC content of samples displayed a log-normal distribution. We performed a log<sub><italic>e</italic></sub> transformation on the MAOC content and removed three outliers that were more than 1.5 times the interquartile range above Q3 or below Q1. We proceeded with the analysis of the remaining 482 samples from 270 sites.</p>
      <p id="d2e874">We fitted a monotonically increasing and concave frontier line <xref ref-type="bibr" rid="bib1.bibx33" id="paren.26"/> to the relationship between log(MAOC) and clay <inline-formula><mml:math id="M59" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> silt content of the samples using the smooth, non-parametric frontier line analysis with the R package SNFA <xref ref-type="bibr" rid="bib1.bibx31" id="paren.27"/>. We calculated the <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">Amax</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> following the approach described in <xref ref-type="bibr" rid="bib1.bibx51" id="text.28"/>. Each point on the frontier line represents the maximum attainable amount of MAOC that soil could store for a particular clay and silt content.</p>
      <p id="d2e916">To estimate uncertainty, we performed 100 non-parametric bootstrap resamples to fit the frontier lines, keeping samples from the same site together during resampling to prevent data leakage. We then averaged all 100 frontier-line fits from the bootstraps. The <inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> was calculated as the difference between the estimated mean frontier line and the MAOC content. We also computed the uncertainties of our frontier line estimate by calculating the 95 % confidence limits. All values were then back-transformed to their original units for the spectroscopic modelling.</p>
</sec>
<sec id="Ch1.S2.SS5">
  <label>2.5</label><title>Spectroscopic modelling</title>
      <p id="d2e939">The mid-IR spectra were interpolated to 32 cm<sup>−1</sup> wavenumber intervals to reduce inherent collinearity. Since mid-IR spectra are highly collinear and contain broad absorption features, we interpolated the spectra to 32 cm<sup>−1</sup> to reduce the redundant information passed into the machine learning model <xref ref-type="bibr" rid="bib1.bibx10" id="paren.29"/>. Visual checks confirmed relevant absorption features remained distinguishable at this resolution. We also checked <inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> model performance using spectra interpolated to 8, 16, 24, and 32 cm<sup>−1</sup> resolutions and found no significant difference between these resolutions. Preprocessing consisted of an initial offset correction, in which the minimum spectral value minus 0.01 was subtracted from all measurements so that each spectrum was shifted to a common baseline just above zero, followed by a standard normal variate (SNV) transformation, and a final offset correction to address the baseline shift introduced by the SNV transformation. Spectral regions that were either featureless (4000 to 3746 cm<sup>−1</sup>) or containing distracting features from noise and artefacts from water and CO<sub>2</sub> (2370 to 2082 cm<sup>−1</sup>) were removed before modelling.</p>
      <p id="d2e1026">We modelled the MAOC and the estimated <inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> with CUBIST. CUBIST is a rule-based regression tree algorithm <xref ref-type="bibr" rid="bib1.bibx35 bib1.bibx57" id="paren.30"/>. CUBIST creates a tree structure, with branches as a series of “if-then” conditions, then reduced into rules. Each CUBIST rule corresponds to a subset of the data that satisfies the rule's condition. For each rule, a linear regression model is fit to the data using relevant predictors <xref ref-type="bibr" rid="bib1.bibx24" id="paren.31"/>. CUBIST balances accurate predictions and model interpretability through its rule-based structure. CUBIST is tuned by two parameters: committees and neighbours. The number of committees specifies the number of ensembles contributing to the final prediction, with more committees typically improving performance but reducing interpretability, and the number of neighbours specifies how many nearest-neighbours of a sample CUBIST uses to adjust its rule-based predictions. <xref ref-type="bibr" rid="bib1.bibx49" id="text.32"/> described the method for spectroscopic modelling. In our experiments, since our goal was to understand which spectral regions influence predictions and how they relate to soil properties, we prioritised model interpretability by using a single committee to maintain model transparency, avoiding the added complexity of ensemble averaging. We optimised the number of neighbours by testing all values from 0 to 9. Model fitting and validation were carried out using 10-fold cross-validation grouped by site, in which the 270 sampling sites were randomly assigned to 10 folds to ensure that samples from the same site and the three depth layers were kept together within the same fold. We assessed the models based on their coefficient of determination (<inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>), Lin's concordance correlation coefficient (CCC) <xref ref-type="bibr" rid="bib1.bibx28" id="paren.33"/> and the root mean squared error (RMSE).</p>
      <p id="d2e1064">We propagated the uncertainty of the frontier line fitting and the CUBIST modelling. From the 100 frontier line fits made with the bootstraps, we derived the upper and lower 95 % confidence intervals (CI) for the frontier line fit and calculated the upper and lower limit of <inline-formula><mml:math id="M72" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. The upper and lower limits of <inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> were also modelled with CUBIST following the same method described above.</p>
</sec>
<sec id="Ch1.S2.SS6">
  <label>2.6</label><title>Interpretation</title>
      <p id="d2e1097">To interpret the models, we extracted each CUBIST rule from the MAOC and <inline-formula><mml:math id="M74" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> models to analyse their rule partitioning. For the MAOC model, we examined the distribution of MAOC values within each rule, while for the <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> model, we analysed the distributions of both MAOC and <inline-formula><mml:math id="M76" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> values within each rule. For the linear models in each CUBIST rule, we examined the wavenumber corresponding to specific absorptions of soil constituents and their coefficients. Furthermore, we calculated the SHAP (SHapley Additive exPlanations) values for each sample for each linear model of the CUBIST rules, focusing the analysis on the <inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> model. (SHAP analysis results of MAOC CUBIST model provided in  the Supplement) The SHAP values are used to explain the outputs of machine learning models. SHAP is based on game theory <xref ref-type="bibr" rid="bib1.bibx38" id="paren.34"/> and assigns an importance value to each instance at each feature (in our case, each sample's absorptions at specific wavenumbers) in a model. While the regression coefficients summarise the average effect of a wavenumber within a given rule, SHAP values provide instance-level attributions that quantify each wavenumber's contribution to the prediction for each individual sample. Positive SHAP values indicate a positive impact on the prediction, while those with negative values indicate a negative impact. The magnitude measures the strength of the effect.</p>
      <p id="d2e1147">All statistical analyses were performed using R <xref ref-type="bibr" rid="bib1.bibx36" id="paren.35"/>.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Results</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>The maximum attainable MAOC storage, the MAOC deficit and C sequestration potential</title>
      <p id="d2e1169">Our samples represent a wide geographical area in Australia (Fig. <xref ref-type="fig" rid="F1"/>) with large variations in MAOC content and texture (Table <xref ref-type="table" rid="T1"/>). The MAOC content ranges from 0.27 to 50.04 g kg soil<sup>−1</sup>, while silt content ranges from 0.54 % to 31.81 %, and clay content ranges from 2.34 % to 54.25 % (Table <xref ref-type="table" rid="T1"/>). The frontier line estimates the maximum C that can be stored in their current environments over their range of clay <inline-formula><mml:math id="M79" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> silt contents for all 482 samples, with their 95 % confidence intervals shown in Fig. <xref ref-type="fig" rid="F2"/>. The frontier line increases with increasing clay <inline-formula><mml:math id="M80" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> silt content to around 20 %–45 %, after which the rate of increase slows. The <inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">Amax</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> ranges from 5.29 to 45.79 g kg soil<sup>−1</sup> with a mean of 32.76 g kg soil<sup>−1</sup> (Table <xref ref-type="table" rid="T1"/>). The <inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> ranges from none to 45.17 g kg soil<sup>−1</sup> with a mean of 26.31 g kg soil<sup>−1</sup> (Table <xref ref-type="table" rid="T1"/>).</p>

<table-wrap id="T1" specific-use="star"><label>Table 1</label><caption><p id="d2e1285">Summary statistics.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="9">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Mean</oasis:entry>
         <oasis:entry colname="col3">SD</oasis:entry>
         <oasis:entry colname="col4">Min</oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">0.25</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col6">Median</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">0.75</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col8">Max</oasis:entry>
         <oasis:entry colname="col9">Skew</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Silt %</oasis:entry>
         <oasis:entry colname="col2">10.93</oasis:entry>
         <oasis:entry colname="col3">7.48</oasis:entry>
         <oasis:entry colname="col4">0.54</oasis:entry>
         <oasis:entry colname="col5">4.73</oasis:entry>
         <oasis:entry colname="col6">9.49</oasis:entry>
         <oasis:entry colname="col7">16.31</oasis:entry>
         <oasis:entry colname="col8">31.81</oasis:entry>
         <oasis:entry colname="col9">0.63</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Clay %</oasis:entry>
         <oasis:entry colname="col2">20.79</oasis:entry>
         <oasis:entry colname="col3">11.16</oasis:entry>
         <oasis:entry colname="col4">2.34</oasis:entry>
         <oasis:entry colname="col5">11.84</oasis:entry>
         <oasis:entry colname="col6">18.68</oasis:entry>
         <oasis:entry colname="col7">29.39</oasis:entry>
         <oasis:entry colname="col8">54.25</oasis:entry>
         <oasis:entry colname="col9">0.49</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">MAOC (g kg soil<sup>−1</sup>)</oasis:entry>
         <oasis:entry colname="col2">6.52</oasis:entry>
         <oasis:entry colname="col3">7.32</oasis:entry>
         <oasis:entry colname="col4">0.27</oasis:entry>
         <oasis:entry colname="col5">2.07</oasis:entry>
         <oasis:entry colname="col6">4.17</oasis:entry>
         <oasis:entry colname="col7">7.88</oasis:entry>
         <oasis:entry colname="col8">50.04</oasis:entry>
         <oasis:entry colname="col9">2.79</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">Amax</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (g kg soil<sup>−1</sup>)</oasis:entry>
         <oasis:entry colname="col2">32.76</oasis:entry>
         <oasis:entry colname="col3">10.52</oasis:entry>
         <oasis:entry colname="col4">5.29</oasis:entry>
         <oasis:entry colname="col5">26.84</oasis:entry>
         <oasis:entry colname="col6">36.15</oasis:entry>
         <oasis:entry colname="col7">41.24</oasis:entry>
         <oasis:entry colname="col8">45.79</oasis:entry>
         <oasis:entry colname="col9"><inline-formula><mml:math id="M101" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.89</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (g kg soil<sup>−1</sup>)</oasis:entry>
         <oasis:entry colname="col2">26.31</oasis:entry>
         <oasis:entry colname="col3">11.22</oasis:entry>
         <oasis:entry colname="col4">0.00</oasis:entry>
         <oasis:entry colname="col5">19.15</oasis:entry>
         <oasis:entry colname="col6">28.59</oasis:entry>
         <oasis:entry colname="col7">35.65</oasis:entry>
         <oasis:entry colname="col8">45.17</oasis:entry>
         <oasis:entry colname="col9"><inline-formula><mml:math id="M104" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.64</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d2e1288">Note: SD <inline-formula><mml:math id="M87" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> Standard Deviation, Min <inline-formula><mml:math id="M88" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> Minimum, <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">0.25</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M90" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> Lower 25 % quartiles, Med <inline-formula><mml:math id="M91" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> Median, <inline-formula><mml:math id="M92" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">0.75</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M93" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> Upper 25 % quartiles, Max <inline-formula><mml:math id="M94" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> Maximum, Skew <inline-formula><mml:math id="M95" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> Skewness.</p></table-wrap-foot></table-wrap>

      <fig id="F2"><label>Figure 2</label><caption><p id="d2e1671">Frontier lines and its 95 % confidence interval fitted using all 482 samples.</p></caption>
          <graphic xlink:href="https://soil.copernicus.org/articles/12/619/2026/soil-12-619-2026-f02.png"/>

        </fig>

</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Spectroscopic modelling of MAOC content</title>
      <p id="d2e1688">The CUBIST model predicts MAOC with an RMSE of 2.77 g kg soil<sup>−1</sup>, is unbiased with <inline-formula><mml:math id="M106" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.86, and CCC of 0.91 (Table <xref ref-type="table" rid="T2"/>, Fig. <xref ref-type="fig" rid="F3"/> b). The model partitions the data into four rule sets, corresponding to different MAOC content levels, which increase from Rule 1 to Rule 4 (Fig. <xref ref-type="fig" rid="F3"/>a). Samples in Rule 1 have the least MAOC and are not significantly different from Rule 2  (Fig. <xref ref-type="fig" rid="F3"/>a). Rule 3 samples have significantly more MAOC than Rule 1 but are not significantly different from Rule 2  (Fig. <xref ref-type="fig" rid="F3"/>a). Rule 4 samples have significantly more MAOC than all other rules and exhibit the largest spread (Fig. <xref ref-type="fig" rid="F3"/>a).</p>

<table-wrap id="T2" specific-use="star"><label>Table 2</label><caption><p id="d2e1730">Tuning parameters and model statistics for MAOC and <inline-formula><mml:math id="M107" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> CUBIST models.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Committee</oasis:entry>
         <oasis:entry colname="col3">Neighbor</oasis:entry>
         <oasis:entry colname="col4">RMSE</oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col6">CCC</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4">(g kg soil<sup>−1</sup>)</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6"/>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">MAOC</oasis:entry>
         <oasis:entry colname="col2">1</oasis:entry>
         <oasis:entry colname="col3">8</oasis:entry>
         <oasis:entry colname="col4">2.77</oasis:entry>
         <oasis:entry colname="col5">0.86</oasis:entry>
         <oasis:entry colname="col6">0.91</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Mean <inline-formula><mml:math id="M113" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">1</oasis:entry>
         <oasis:entry colname="col3">5</oasis:entry>
         <oasis:entry colname="col4">3.72</oasis:entry>
         <oasis:entry colname="col5">0.89</oasis:entry>
         <oasis:entry colname="col6">0.94</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M114" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> upper 95 % CI</oasis:entry>
         <oasis:entry colname="col2">1</oasis:entry>
         <oasis:entry colname="col3">4</oasis:entry>
         <oasis:entry colname="col4">4.13</oasis:entry>
         <oasis:entry colname="col5">0.85</oasis:entry>
         <oasis:entry colname="col6">0.92</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M115" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> lower 95 % CI</oasis:entry>
         <oasis:entry colname="col2">1</oasis:entry>
         <oasis:entry colname="col3">9</oasis:entry>
         <oasis:entry colname="col4">3.74</oasis:entry>
         <oasis:entry colname="col5">0.91</oasis:entry>
         <oasis:entry colname="col6">0.95</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d2e1744">Note: RMSE <inline-formula><mml:math id="M108" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> Root mean square error, CCC <inline-formula><mml:math id="M109" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> Lin's concordance correlation coefficient, CI <inline-formula><mml:math id="M110" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> Confidence interval.</p></table-wrap-foot></table-wrap>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e1971">CUBIST model result for MAOC. <bold>(a)</bold> The distribution of MAOC content for each CUBIST rule and Tukey's HSD between each CUBIST rule. <bold>(b)</bold> The correlation between observed and predicted MAOC of the CUBIST model, coloured by CUBIST rules. <bold>(c)</bold> The coefficient of each linear model for each CUBIST rule is plotted over the mean spectra of each CUBIST rule.</p></caption>
          <graphic xlink:href="https://soil.copernicus.org/articles/12/619/2026/soil-12-619-2026-f03.png"/>

        </fig>

      <p id="d2e1990">The mean mid-IR spectra of the samples of the four rule sets show overall consistent patterns, with differences in absorption intensities at 3700–3500, 2946–2850, 1986–1794, and 1634–1300 cm<sup>−1</sup>) (Fig. <xref ref-type="fig" rid="F3"/>c).</p>
      <p id="d2e2007">Specifically, the mean spectrum of Rule 4 has the highest absorption in the 2946–2850 cm<sup>−1</sup> region associated with organic C (C–H vibrations of alkyl CH<sub>2</sub>), corresponding to having the highest MAOC content (Fig. <xref ref-type="fig" rid="F3"/>a, c).</p>
      <p id="d2e2033">The wavenumbers selected for linear models of the four rules differ, although there is some overlap. All rules use wavenumbers between 2946–2850 cm<sup>−1</sup>, organic C–H vibrations of alkyl CH<sub>2</sub> groups <xref ref-type="bibr" rid="bib1.bibx32" id="paren.36"/>, though the specific selections vary (Fig. <xref ref-type="fig" rid="F3"/>c), suggesting that the models rely directly on spectral signals from organic C to predict MAOC content. Rule 1 exhibits densely distributed wavenumbers across both these regions with high coefficient values. Rule 3 shows a similarly dense distribution, concentrated primarily in the 2946–2850 cm<sup>−1</sup> region, with the largest coefficient values. Rule 2 displays more sparsely distributed wavenumbers across both regions, while Rule 4 uses only a few select wavenumbers around 2946–2850 cm<sup>−1</sup>.</p>
      <p id="d2e2087">Rules 1, 2, and 3 with the smallest MAOC values all use the region between 1986–1794 cm<sup>−1</sup>, associated with quartz, whereas Rule 4 does not (Fig. <xref ref-type="fig" rid="F3"/>c). Quartz is chemically inert, carries negligible surface charge, and has a low specific surface area compared to clay minerals, which limits the reactive surface area available for organo-mineral bonding and thus associated with low MAOC content, as found in coarser-textured soils dominated by quartz. Rules 1, 2, and 3 also use region 2515 cm<sup>−1</sup> associated with carbonate <xref ref-type="bibr" rid="bib1.bibx32" id="paren.37"/>, as soils with more carbonate commonly form in arid or semi-arid regions with low plant productivity and rainfall, and therefore low organic C.</p>
      <p id="d2e2119">Rule 4 uniquely includes absorptions at the 3750 cm<sup>−1</sup> region, associated with the hydroxyl stretching vibrations of clay minerals <xref ref-type="bibr" rid="bib1.bibx32" id="paren.38"/>. Clay minerals provide a more mineralogically reactive soil matrix that facilitates greater organo-mineral bonding, and the model's use of spectral signals reflecting mineral surface reactivity is particularly informative in soils where mineral-organic associations are most developed. Rule 4 also uses wavenumbers between 1762–1634 cm<sup>−1</sup>, associated with amide C<inline-formula><mml:math id="M127" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula>O bond <xref ref-type="bibr" rid="bib1.bibx55" id="paren.39"/>, as well as wavenumbers around 1154 cm<sup>−1</sup>, which correspond to the SiO<sub>2</sub> lattice <xref ref-type="bibr" rid="bib1.bibx43" id="paren.40"/> and C-OH stretch of aliphatic O–H <xref ref-type="bibr" rid="bib1.bibx37" id="paren.41"/> (Fig. <xref ref-type="fig" rid="F3"/>c), with Rule 3 also using these latter wavenumbers to a lesser extent. SiO<sub>2</sub> lattice indicating the silicate mineral of the soil matrix, while aliphatic C-OH groups indicate polysaccharide-derived and carbohydrate-like organic matter, and amide indicates protein-derived organic matter. Taken together, this suggests that the model draws on both organic-matter and mineral-related absorptions to estimate MAOC content.</p>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Spectroscopic modelling of the organic C deficit (<inline-formula><mml:math id="M131" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>)</title>
      <p id="d2e2218">The model predicts <inline-formula><mml:math id="M132" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> with an RMSE of 3.72 g kg soil<sup>−1</sup>, <inline-formula><mml:math id="M134" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.89, and CCC of 0.94 while also being unbiased (Table <xref ref-type="table" rid="T2"/>, Fig. <xref ref-type="fig" rid="F4"/>c). The model partitions the data into 3 rule sets, and the linear models of each CUBIST rule also show good precision (Table <xref ref-type="table" rid="T3"/>).</p>

<table-wrap id="T3"><label>Table 3</label><caption><p id="d2e2265">Model statistics for each linear model of the CUBIST rules in the mean <inline-formula><mml:math id="M135" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> CUBIST model.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">RMSE (g kg soil<sup>−1</sup>)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">CCC</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Rule 1</oasis:entry>
         <oasis:entry colname="col2">5.03</oasis:entry>
         <oasis:entry colname="col3">0.81</oasis:entry>
         <oasis:entry colname="col4">0.90</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Rule 2</oasis:entry>
         <oasis:entry colname="col2">2.25</oasis:entry>
         <oasis:entry colname="col3">0.94</oasis:entry>
         <oasis:entry colname="col4">0.97</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Rule 3</oasis:entry>
         <oasis:entry colname="col2">1.58</oasis:entry>
         <oasis:entry colname="col3">0.90</oasis:entry>
         <oasis:entry colname="col4">0.95</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d2e2279">Note: RMSE <inline-formula><mml:math id="M136" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> Root mean square error, CCC <inline-formula><mml:math id="M137" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> Lin's concordance correlation coefficient.</p></table-wrap-foot></table-wrap>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e2399">CUBIST model result for <inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, showing the CUBIST rules separation, including the distribution of <bold>(a)</bold> <inline-formula><mml:math id="M141" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <bold>(b)</bold> MAOC content for each CUBIST rule and Tukey's HSD between each CUBIST rule. Along with <bold>(c)</bold> the correlation between observed and predicted <inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> of the CUBIST model coloured by CUBIST rules, and <bold>(d)</bold> the coefficient of each linear model for each CUBIST rule plotted over the mean spectra of each CUBIST rule. </p></caption>
          <graphic xlink:href="https://soil.copernicus.org/articles/12/619/2026/soil-12-619-2026-f04.png"/>

        </fig>

      <p id="d2e2454">Rule 1 includes samples with the lowest <inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and the highest MAOC content, representing samples that have smaller C sequestration potential, as these samples contain more MAOC (Fig. <xref ref-type="fig" rid="F4"/>a, b). Rule 2 represents samples with intermediate <inline-formula><mml:math id="M144" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, and contains little MAOC and clay and silt content, representing coarser-textured soils with more C sequestration potential than samples in Rule 1 because they hold less MAOC (Fig. <xref ref-type="fig" rid="F4"/>a, b). Rule 3 includes samples with high <inline-formula><mml:math id="M145" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, low MAOC content and the most clay and silt content. Since these samples contain the finest particles, their capacity is largest and is thus undersaturated with C relative to their potential (Fig. <xref ref-type="fig" rid="F4"/>a, b).</p>
      <p id="d2e2497">The three rule sets show similar overall mean spectral patterns but with distinct differences in absorption intensities at key regions, including 2946–2850 cm<sup>−1</sup> associated with organic C, 1986–1794 cm<sup>−1</sup> associated with SiO<sub>2</sub> overtone and combination bands, and 1538–1218 cm<sup>−1</sup>) region associated with various organic and mineral absorptions (Fig. <xref ref-type="fig" rid="F4"/>d). The wavenumbers selected for the models in each CUBIST rule are generally consistent, with the magnitude of the coefficient decreasing from Rule 1 to Rule 3 (Fig. <xref ref-type="fig" rid="F4"/>d).</p>
      <p id="d2e2550">In the 2946–2850 cm<sup>−1</sup> region, associated with organic C–H vibrations of alkyl CH<sub>2</sub> groups <xref ref-type="bibr" rid="bib1.bibx32" id="paren.42"/>, Rule 1 shows greater average absorption compared to Rule 2 and Rule 3 consistent with Rule 1 having the highest MAOC content (Fig. <xref ref-type="fig" rid="F4"/>b, d). All three CUBIST rules use wavenumbers within and near this region with relatively large coefficients, but the coefficient magnitude decreases from Rule 1 to Rule 3 (Fig. <xref ref-type="fig" rid="F4"/>b, d). This pattern reflects that the model predicts <inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> by leveraging the spectral signal of existing organic C (MAOC) already occupying reactive mineral surfaces. As <inline-formula><mml:math id="M153" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represents the remaining sequestration potential, more existing MAOC implies less remaining capacity. Thus, Rule 1 with the highest MAOC and largest organic-region coefficient has the lowest <inline-formula><mml:math id="M154" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. Rule 2 has intermediate MAOC and coefficient magnitude, and Rule 3 has the lowest MAOC, smallest coefficient, and consequently the highest <inline-formula><mml:math id="M155" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (Fig. <xref ref-type="fig" rid="F4"/>a, b, d).</p>
      <p id="d2e2628">In the region near 1986–1794 cm<sup>−1</sup>, which is due to the overtones of Si-O vibrations <xref ref-type="bibr" rid="bib1.bibx55" id="paren.43"/>, absorption intensity decreases from Rule 2 to Rule 1 to Rule 3, corresponding to decreasing sand content and increasing clay and silt content (Fig. <xref ref-type="fig" rid="F4"/>d).</p>
      <p id="d2e2648">All three rules have prominent absorption at and near 1634 cm<sup>−1</sup>, which are associated with amide, carboxylate and carboxylic acid <xref ref-type="bibr" rid="bib1.bibx32 bib1.bibx45" id="paren.44"/>, aromatic –C<inline-formula><mml:math id="M158" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula>C– stretch <xref ref-type="bibr" rid="bib1.bibx12" id="paren.45"/>, HO–H stretch <xref ref-type="bibr" rid="bib1.bibx22" id="paren.46"/>, N–H bend, C<inline-formula><mml:math id="M159" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula>O stretch <xref ref-type="bibr" rid="bib1.bibx55" id="paren.47"/> and absorbed water <xref ref-type="bibr" rid="bib1.bibx29" id="paren.48"/> (Fig. <xref ref-type="fig" rid="F4"/>d). This indicates that organic matter types, including polysaccharide-derived, carbohydrate-like, and protein-derived organic C, are used across all three rules similarly to the 2946–2850 cm<sup>−1</sup> region, to predict whether the C saturation capacity is filled.</p>
      <p id="d2e2708">In the fingerprint region (1550–450 cm<sup>−1</sup>), the band assignments are more challenging due to significant overlaps between mineral and organic absorptions <xref ref-type="bibr" rid="bib1.bibx42" id="paren.49"/>. The region from 1538 to 1218 cm<sup>−1</sup>, likely associated with quartz minerals as well as organic matter <xref ref-type="bibr" rid="bib1.bibx55" id="paren.50"/>, is more prominent in Rule 2 and Rule 1, and lower in Rule 3 (Fig. <xref ref-type="fig" rid="F4"/>d). Rule 3 exhibits proportionally larger coefficients for wavenumbers in the fingerprint region because of low organic C content and high fine mineral particle content (Fig. <xref ref-type="fig" rid="F4"/>b, d).</p>
      <p id="d2e2746">The absorption near 2515 cm<sup>−1</sup> due to carbonates shows more prominent absorption in Rule 3. Where <inline-formula><mml:math id="M164" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is high, and MAOC is low, which matches the tendency of higher carbonate soils from arid or semi-arid regions with low organic C input that leave mineral surfaces unsaturated.</p>

      <fig id="F5"><label>Figure 5</label><caption><p id="d2e2774">The correlation between observed and predicted <inline-formula><mml:math id="M165" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> of the CUBIST model coloured by CUBIST rules, as well as the observed and predicted <inline-formula><mml:math id="M166" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> estimated from the upper 95 % CI and lower 95 % CI of the frontier line fit. The grey envelopes represent the range of <inline-formula><mml:math id="M167" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> predictions obtained when CUBIST is applied separately to the upper and lower 95 % CI frontier-line estimates, indicating the uncertainty of the frontier-line fit propagating to the <inline-formula><mml:math id="M168" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> predictions.</p></caption>
          <graphic xlink:href="https://soil.copernicus.org/articles/12/619/2026/soil-12-619-2026-f05.png"/>

        </fig>

      <p id="d2e2827">The model statistics of the CUBIST models of <inline-formula><mml:math id="M169" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> estimated from the upper and lower 95 % CI of <inline-formula><mml:math id="M170" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">Amax</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are shown in Table <xref ref-type="table" rid="T2"/>. The model for the <inline-formula><mml:math id="M171" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> estimated with the lower 95 % CI of the frontier line performs better than the model estimated with the upper 95 % CI. This can be attributed to the upper 95 % CI of the frontier line being more uncertain than the lower 95 % CI. Specifically, the upper uncertainty of the frontier line fit is high around 25 % clay <inline-formula><mml:math id="M172" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> silt content due to the low sample number (Fig. <xref ref-type="fig" rid="F2"/>). The uncertainty of <inline-formula><mml:math id="M173" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> estimated from CUBIST models of <inline-formula><mml:math id="M174" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> calculated from the upper CI and lower CI of the <inline-formula><mml:math id="M175" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">Amax</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is shown in Fig. <xref ref-type="fig" rid="F5"/>.</p>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title><inline-formula><mml:math id="M176" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> model interpretation with SHAP</title>
      <p id="d2e2929">The SHAP contribution of spectral absorption at each wavenumber for the linear model of each CUBIST rule is shown in Fig. <xref ref-type="fig" rid="F6"/>. The SHAP values coincide with the regression coefficients of the CUBIST rules (Fig. <xref ref-type="fig" rid="F6"/>). The regression coefficients and SHAP values are generally consistent: large coefficients correspond to strong SHAP model contributions. Rule 1 shows strong contributions primarily from organic C features, and Rule 2 displays a similar pattern but with more contributions from the fingerprint region. For Rule 3, there is a relatively stronger contribution from the absorptions in the double bonds region (including absorption from quartz and the region associated with amide overlapping with other absorptions), and the fingerprint regions have a relatively stronger contribution (Fig. <xref ref-type="fig" rid="F6"/>).</p>

      <fig id="F6" specific-use="star"><label>Figure 6</label><caption><p id="d2e2940">The mean spectra, key spectral assignment, and the SHAP contribution of the spectral regions used in each linear model of each <inline-formula><mml:math id="M177" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> CUBIST rule. A positive SHAP value indicates a positive contribution to a model with increased absorbance, whereas a negative SHAP value indicates a negative contribution with increased absorbance. The magnitude of SHAP indicates the strength of the contribution. The SHAP values are plotted over the pre-processed spectra of each rule set. The SHAP values are coloured by the normalised absorbance value at each wavenumber, ranging from <inline-formula><mml:math id="M178" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> (lowest absorbance at each wavenumber) to 1 (highest absorbance at each wavenumber). SHAP values of each rule are plotted in different <inline-formula><mml:math id="M179" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula>-axes to accommodate differences in magnitude across rules.</p></caption>
          <graphic xlink:href="https://soil.copernicus.org/articles/12/619/2026/soil-12-619-2026-f06.png"/>

        </fig>

      <p id="d2e2977">The SHAP values indicate positive and negative contributions from spectral regions associated with characteristic absorption of clay minerals, organic matter, and quartz (Fig. <xref ref-type="fig" rid="F6"/>). Generally, peaks associated with organic C have a negative model contribution with an increase in absorbance, while the troughs have a positive contribution with increasing absorbance (Fig. <xref ref-type="fig" rid="F6"/>). Absorbance in these regions indicates existing MAOC that already occupies reactive mineral surface sites. As mineral binding sites fill, the remaining deficit diminishes. Similarly, absorptions associated with clay minerals and silicate have a positive model contribution, while the troughs have a negative contribution (Fig. <xref ref-type="fig" rid="F6"/>). Absorbance from clay minerals indicates abundant reactive surface area that is available but not yet occupied by organic matter. The positive SHAP contribution reflects unrealised adsorption capacity, whereas quartz has negligible reactive surface area and contributes to coarser texture without contributing to adsorption capacity, which constrains <inline-formula><mml:math id="M180" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">Amax</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d2e2998">In comparison to SHAP analysis of the MAOC CUBIST model (Fig. S1 in the Supplement), both models draw heavily on the organic C-H region, but the <inline-formula><mml:math id="M181" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> model shows a progressive shift across rules from organic-C-dominated (Rule 1) to mineral-dominated (Rule 3), indicating the model increasingly relies on mineralogy for available surface in soil with higher <inline-formula><mml:math id="M182" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. The MAOC model's contributions remain more consistently concentrated in the organic C-H region across all rules, with the fingerprint region playing a lesser role throughout. This reflects that the MAOC model is more driven by the organic C information in the spectra, whereas the <inline-formula><mml:math id="M183" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> model integrates both organic and mineral information.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Discussion</title>
      <p id="d2e3043">Our findings support the hypothesis that mid-IR spectra, combined with machine learning and enhanced by SHAP analysis for interpretability, can accurately estimate soil MAOC content and <inline-formula><mml:math id="M184" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (Table <xref ref-type="table" rid="T2"/>) by elucidating the contribution of specific mid-IR absorptions.</p>
      <p id="d2e3059">Our results demonstrate that combining soil spectroscopy with machine learning offers a rapid, cost-effective, and robust method for estimating MAOC and <inline-formula><mml:math id="M185" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. The spectroscopic approach enables many more measurements than conventional methods, thereby enhancing our understanding of how MAOC and <inline-formula><mml:math id="M186" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> vary in soil across space and time <xref ref-type="bibr" rid="bib1.bibx3" id="paren.51"/>. This approach could also provide essential data for soil biogeochemical and Earth System models, improving their initialisation, validation and ongoing development <xref ref-type="bibr" rid="bib1.bibx44 bib1.bibx15 bib1.bibx2 bib1.bibx48" id="paren.52"/>. Given that C storage is a key soil function for maintaining soil health <xref ref-type="bibr" rid="bib1.bibx25 bib1.bibx27" id="paren.53"/>, our findings highlight how the current state and potential for C sequestration can be rapidly and cost-effectively measured as part of soil health assessment <xref ref-type="bibr" rid="bib1.bibx54" id="paren.54"/>. This aligns with growing evidence that soil spectra, when combined with machine learning, can model soil functions, going beyond predicting individual soil properties <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx13 bib1.bibx7 bib1.bibx52 bib1.bibx30 bib1.bibx11" id="paren.55"/>.</p>
      <p id="d2e3100">Two other studies estimated soil <inline-formula><mml:math id="M187" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> using mid-IR spectroscopic modelling <xref ref-type="bibr" rid="bib1.bibx21 bib1.bibx4" id="text.56"/>. Unlike these studies, which used quantile regressions to estimate <inline-formula><mml:math id="M188" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, our approach avoids under- or over-estimations (of <inline-formula><mml:math id="M189" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) using bootstrapped frontier lines that more accurately capture the relationship between MAOC and clay <inline-formula><mml:math id="M190" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> silt content <xref ref-type="bibr" rid="bib1.bibx51" id="paren.57"/>. Specifically, the frontier-line approach estimates the upper envelope of the MAOC–(clay <inline-formula><mml:math id="M191" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> silt) relationship under current environmental conditions. Other approaches that aim to avoid under- or overestimation, e.g. <xref ref-type="bibr" rid="bib1.bibx39" id="text.58"/>, which fit quantile regressions to mineralogically stratified subsets of the data, are effective but still impose a parametric relationship via an internal upper percentile. In contrast, the frontier-line prevents observed values from exceeding the estimated <inline-formula><mml:math id="M192" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">Amax</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and allows them to level off at high clay <inline-formula><mml:math id="M193" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> silt contents, reflecting diminishing stabilisation due to finite organic inputs. As a result, frontier-line analysis reduces both underestimation (by targeting the upper boundary rather than an internal quantile) and overestimation (by avoiding unconstrained extrapolation at high clay <inline-formula><mml:math id="M194" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> silt), yielding more realistic estimates of attainable MAOC storage. Additionally, unlike the earlier studies, we characterised two distinct sources of uncertainty: those from the frontier-line fitting and from the cross-validated CUBIST model. The 95 % confidence limits of the frontier-line fit were propagated to <inline-formula><mml:math id="M195" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> predictions by applying CUBIST separately to the upper and lower CI of <inline-formula><mml:math id="M196" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> estimation (Fig. <xref ref-type="fig" rid="F5"/>).</p>
      <p id="d2e3210">The MAOC and <inline-formula><mml:math id="M197" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> models relied on spectral regions related to organic functional groups such as the C-H groups near 2900 and 2800 cm<sup>−1</sup>, the C<inline-formula><mml:math id="M199" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula>O stretch near 1725 cm<sup>−1</sup> <xref ref-type="bibr" rid="bib1.bibx55" id="paren.59"/>, and 1 : 1 and 2 : 1 clay minerals, which provide surfaces for organic matter adsorption . Absorptions for quartz and other minerals in the fingerprint region were also important in the models, but negatively affected the estimates. The <inline-formula><mml:math id="M201" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> model drew on information on C already present in the soil, which contributed negatively and on soil mineralogy, which indicates what soil minerals could potentially adsorb, contributing positively.</p>
      <p id="d2e3271">The spectroscopic MAOC and <inline-formula><mml:math id="M202" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> models were developed using CUBIST, which offers good predictability and interpretability, effectively handling non-linearities, and is advantageous compared to linear methods like PLSR. Specifically, CUBIST  performs data-driven selection of relevant informative spectral features from the full spectrum input, as well as utilising contextual information across spectral regions from regions without distinct absorption peaks. CUBIST is therefore advantageous, given that mid-IR spectra are high-dimensional and contain regions that vary in information content, and peak positions can shift under varying molecular environments. As a tree-based algorithm, it can be locally interpreted, unlike other algorithms that are limited to global-level interpretation <xref ref-type="bibr" rid="bib1.bibx49" id="paren.60"/>. SHAP values provided additional interpretation, allowing us to not only know how each wavelength contributes to the model and how strongly they contributed to it but also show what direction an increase or decrease in absorbance affects the model, thus identifying which soil constituents (clay minerals, quartz, and organic C) significantly contribute to determining MAOC and <inline-formula><mml:math id="M203" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. Nevertheless, given the heterogeneity of soil composition, overlapping absorptions make it challenging to distinguish molecular vibrations, particularly in the fingerprint region. Like other regression tree methods, CUBIST can be sensitive to strong collinearity, potentially leading to model instability and overfitting <xref ref-type="bibr" rid="bib1.bibx23" id="paren.61"/>. To minimise the effect of collinearity in our modelling, we interpolated the spectra to a resolution of 32 cm<sup>−1</sup> (see Methods section).</p>
      <p id="d2e3314">This study extends beyond previous research by incorporating samples from various other ecosystems. The samples span Australia's main Köppen-Geiger climate zones, 24 major vegetation groups, and 11 of the 14 Australian soil classification orders <xref ref-type="bibr" rid="bib1.bibx20" id="paren.62"/>. We excluded hydrosols with different C-stabilisation dynamics. Future work will include more samples and a broader representation of soils to develop site-specific <inline-formula><mml:math id="M205" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> estimates. Although MAOC was measured in the <inline-formula><mml:math id="M206" display="inline"><mml:mo>≤</mml:mo></mml:math></inline-formula> 50 <inline-formula><mml:math id="M207" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m fraction and clay <inline-formula><mml:math id="M208" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> silt content follows the Australian classification (clay <inline-formula><mml:math id="M209" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M210" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m, silt 2–20 <inline-formula><mml:math id="M211" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m), the practical impact of this mismatch is likely modest given that Australian soils tend to have low silt contents and fine fractions dominated by clay-sized particles. Nevertheless, future work should align these operational definitions where possible, for example by directly measuring and modelling <inline-formula><mml:math id="M212" display="inline"><mml:mrow><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">50</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M213" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m or <inline-formula><mml:math id="M214" display="inline"><mml:mrow><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">53</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M215" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m silt <inline-formula><mml:math id="M216" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> clay fractions, to better reflect the true mineral capacity for C stabilisation.</p>
      <p id="d2e3424">Our method facilitates efficient data acquisition, providing an effective approach to help farmers and land managers gain the insights needed to assess the current and potential for C sequestration on their land. Identifying regions and soil types where increasing organic C storage is feasible enables more targeted resource allocation and informed decision-making.</p>
      <p id="d2e3427">While our study pertains to Australian soils, the principles of applying laboratory-based mid-IR spectroscopy and machine learning to estimate MAOC and <inline-formula><mml:math id="M217" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are applicable across various land uses, soil types, and climatic conditions. This approach provides high-throughput MAOC and <inline-formula><mml:math id="M218" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> estimation on sampled soils. Furthermore, such laboratory models can, in the future, underpin and improve the calibration and validation of remote-sensing-based approaches. When used in combinations, these methods provide the rapid assessment capability needed to scale soil C initiatives for monitoring soil organic C and its potential contribution to climate adaptation and mitigation targets under the Paris Agreement and the UN Sustainable Development Goals. The method's ability to support large-scale monitoring of C sequestration potential also makes it relevant to soil C credit systems such as the Australian Carbon Credit Units (ACCU) scheme.</p>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Conclusions</title>
      <p id="d2e3461">We demonstrated that mid-IR spectroscopy combined with machine learning could effectively estimate soil MAOC content (RMSE <inline-formula><mml:math id="M219" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 2.77 g kg soil<sup>−1</sup>, <inline-formula><mml:math id="M221" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M222" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.86, CCC <inline-formula><mml:math id="M223" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.91) and <inline-formula><mml:math id="M224" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (RMSE <inline-formula><mml:math id="M225" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 3.72 g kg soil<sup>−1</sup>, <inline-formula><mml:math id="M227" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M228" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.89, CCC <inline-formula><mml:math id="M229" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.94). We interpreted CUBIST, confirming the contributions to the models from functional groups related to organic functional groups, clay minerals, and quartz, reflecting existing soil organic C, soil mineralogy, particle size distribution, and surface area available for C adsorption, which are critical for estimating MAOC and <inline-formula><mml:math id="M230" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">def</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. Our approach contributes to the analysis of C sequestration potential using mid-IR spectroscopy and machine learning, supporting the development of rapid and cost-effective soil C sequestration assessment and monitoring.</p>
</sec>

      
      </body>
    <back><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d2e3580">The code and dataset will be made available upon reasonable request.</p>
  </notes><app-group>
        <supplementary-material position="anchor"><p id="d2e3583">The supplement related to this article is available online at <inline-supplementary-material xlink:href="https://doi.org/10.5194/soil-12-619-2026-supplement" xlink:title="pdf">https://doi.org/10.5194/soil-12-619-2026-supplement</inline-supplementary-material>.</p></supplementary-material>
        </app-group><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e3592">YH: Investigation, methodology, analysis, visualisation and writing. RAVR: Conceptualisation, methodology, writing, editing, supervision and funding acquisition.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e3598">At least one of the (co-)authors is a member of the editorial board of <italic>SOIL</italic>. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e3608">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e3614">We thank Mr. Farid Sepanta for the laboratory analyses of the soils, and Drs. Zefang Shen and Adam Cross for earlier project discussions. We are grateful to the Terrestrial Ecosystem Research Network (TERN) and Dr. Andrew Bissett, who provided us with some of the soil samples used in the work.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e3619">RAVR thanks the Australian Government's Australia-China Science and Research Fund-Joint Research Centres (ACSRF-JRCs) (grant ACSRIV000077) and the Australian Research Council's Discovery Projects scheme (project DP210100420) for funding.</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e3625">This paper was edited by Bas van Wesemael and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>ABARES(2022)</label><mixed-citation>ABARES: Land use of Australia 2010–11 to 2015–16, 250 m, CC BY 4.0, Australian Bureau of Agricultural and Resource Economics and Sciences, <ext-link xlink:href="https://doi.org/10.25814/7ygw-4d64" ext-link-type="DOI">10.25814/7ygw-4d64</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Abramoff et al.(2022)</label><mixed-citation>Abramoff, R. Z., Guenet, B., Zhang, H., Georgiou, K., Xu, X., Viscarra Rossel, R. A., Yuan, W., and Ciais, P.: Improved global-scale predictions of soil carbon stocks with Millennial Version 2, Soil Biol. Biochem., 164, 108466, <ext-link xlink:href="https://doi.org/10.1016/j.soilbio.2021.108466" ext-link-type="DOI">10.1016/j.soilbio.2021.108466</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Angers et al.(2011)Angers, Arrouays, Saby, and Walter</label><mixed-citation>Angers, D., Arrouays, D., Saby, N., and Walter, C.: Estimating and mapping the carbon saturation deficit of French agricultural topsoils, Soil Use Manage., 27, 448–452, <ext-link xlink:href="https://doi.org/10.1111/j.1475-2743.2011.00366.x" ext-link-type="DOI">10.1111/j.1475-2743.2011.00366.x</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Baldock et al.(2019)</label><mixed-citation>Baldock, J., McNally, S., Beare, M., Curtin, D., and Hawke, B.: Predicting soil carbon saturation deficit and related properties of New Zealand soils using infrared spectroscopy, Soil Res., 57, 835–844, <ext-link xlink:href="https://doi.org/10.1071/SR19149" ext-link-type="DOI">10.1071/SR19149</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Beare et al.(2014)Beare, McNeill, Curtin, Parfitt, Jones, Dodd, and Sharp</label><mixed-citation>Beare, M., McNeill, S., Curtin, D., Parfitt, R., Jones, H., Dodd, M., and Sharp, J.: Estimating the organic carbon stabilisation capacity and saturation deficit of soils: a New Zealand case study, Biogeochemistry, 120, 71–87, <ext-link xlink:href="https://doi.org/10.1007/s10533-014-9982-1" ext-link-type="DOI">10.1007/s10533-014-9982-1</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Beck et al.(2018)Beck, Zimmermann, McVicar, Vergopolan, Berg, and Wood</label><mixed-citation>Beck, H. E., Zimmermann, N. E., McVicar, T. R., Vergopolan, N., Berg, A., and Wood, E. F.: Present and future Köppen-Geiger climate classification maps at 1-km resolution, Sci. Data, 5, 1–12, <ext-link xlink:href="https://doi.org/10.1038/sdata.2018.214" ext-link-type="DOI">10.1038/sdata.2018.214</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Cécillon et al.(2009)Cécillon, Cassagne, Czarnes, Gros, Vennetier, and Brun</label><mixed-citation>Cécillon, L., Cassagne, N., Czarnes, S., Gros, R., Vennetier, M., and Brun, J.-J.: Predicting soil quality indices with near infrared analysis in a wildfire chronosequence, Sci. Total Environ., 407, 1200–1205, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2008.07.029" ext-link-type="DOI">10.1016/j.scitotenv.2008.07.029</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Cohen et al.(2006)Cohen, Dabral, Graham, Prenger, and Debusk</label><mixed-citation>Cohen, M., Dabral, S., Graham, W. D., Prenger, J., and Debusk, W.: Evaluating ecological condition using soil biogeochemical parameters and near infrared reflectance spectra, Environ. Monitor. Assess., 116, 427–457, <ext-link xlink:href="https://doi.org/10.1007/s10661-006-7664-8" ext-link-type="DOI">10.1007/s10661-006-7664-8</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Commonwealth of Australia(2020)</label><mixed-citation>Commonwealth of Australia: National Vegetation Information System V6.0, <uri>https://erin.maps.arcgis.com/home/item.html?id=1dab9240522d42c5804677bf19ac64af</uri> (last access: 30 April 2026), 2020.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Deiss et al.(2020)Deiss, Margenot, Culman, and Demyan</label><mixed-citation>Deiss, L., Margenot, A. J., Culman, S. W., and Demyan, M. S.: Optimizing acquisition parameters in diffuse reflectance infrared Fourier transform spectroscopy of soils, Soil Sci. Soc. Am. J., 84, 930–948, <ext-link xlink:href="https://doi.org/10.1002/saj2.20028" ext-link-type="DOI">10.1002/saj2.20028</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Deiss et al.(2023)Deiss, Demyan, Fulford, Hurisso, and Culman</label><mixed-citation>Deiss, L., Demyan, M. S., Fulford, A., Hurisso, T., and Culman, S. W.: High-throughput soil health assessment to predict corn agronomic performance, Field Crop. Res., 297, 108930, <ext-link xlink:href="https://doi.org/10.1016/j.fcr.2023.108930" ext-link-type="DOI">10.1016/j.fcr.2023.108930</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Du et al.(2014)Du, Goyne, Miles, and Zhou</label><mixed-citation>Du, C., Goyne, K. W., Miles, R. J., and Zhou, J.: A 1915–2011 microscale record of soil organic matter under wheat cultivation using FTIR-PAS depth-profiling, Agron. Sustain. Dev., 34, 803–811, <ext-link xlink:href="https://doi.org/10.1007/s13593-013-0201-6" ext-link-type="DOI">10.1007/s13593-013-0201-6</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Elliott et al.(2007)Elliott, Worgan, Broadhurst, Draper, and Scullion</label><mixed-citation>Elliott, G. N., Worgan, H., Broadhurst, D., Draper, J., and Scullion, J.: Soil differentiation using fingerprint Fourier transform infrared spectroscopy, chemometrics and genetic algorithm-based feature selection, Soil Biol.   Biochem., 39, 2888–2896, <ext-link xlink:href="https://doi.org/10.1016/j.soilbio.2007.05.032" ext-link-type="DOI">10.1016/j.soilbio.2007.05.032</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Feng et al.(2013)Feng, Plante, and Six</label><mixed-citation>Feng, W., Plante, A. F., and Six, J.: Improving estimates of maximal organic carbon stabilization by fine soil particles, Biogeochemistry, 112, 81–93, <ext-link xlink:href="https://doi.org/10.1007/s10533-011-9679-7" ext-link-type="DOI">10.1007/s10533-011-9679-7</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Georgiou et al.(2022)Georgiou, Jackson, Vindušková, Abramoff, Ahlström, Feng, Harden, Pellegrini, Polley, Soong, Riley, and Torn</label><mixed-citation>Georgiou, K., Jackson, R. B., Vindušková, O., Abramoff, R. Z., Ahlström, A., Feng, W., Harden, J. W., Pellegrini, A. F. A., Polley, H. W., Soong, J. L., Riley, W. J., and Torn, M. S.: Global stocks and capacity of mineral-associated soil organic carbon, Nat. Commun., 13, 3797, <ext-link xlink:href="https://doi.org/10.1038/s41467-022-31540-9" ext-link-type="DOI">10.1038/s41467-022-31540-9</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Hassink(1997)</label><mixed-citation>Hassink, J.: The capacity of soils to preserve organic C and N by their association with clay and silt particles, Plant Soil, 191, 77–87, <ext-link xlink:href="https://doi.org/10.1023/A:1004213929699" ext-link-type="DOI">10.1023/A:1004213929699</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Hassink and Whitmore(1997)</label><mixed-citation>Hassink, J. and Whitmore, A. P.: A model of the physical protection of organic matter in soils, Soil Sci. Soc. Am. J., 61, 131–139, <ext-link xlink:href="https://doi.org/10.2136/sssaj1997.03615995006100010020x" ext-link-type="DOI">10.2136/sssaj1997.03615995006100010020x</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Hicks et al.(2015)Hicks, Viscarra Rossel, and Tuomi</label><mixed-citation>Hicks, W., Viscarra Rossel, R., and Tuomi, S.: Developing the Australian mid-infrared spectroscopic database using data from the Australian Soil Resource Information System, Soil Res., 53, 922–931, <ext-link xlink:href="https://doi.org/10.1071/SR15171" ext-link-type="DOI">10.1071/SR15171</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Ingram and Fernandes(2001)</label><mixed-citation>Ingram, J. and Fernandes, E.: Managing carbon sequestration in soils: concepts and terminology, Agr. Ecosyst. Environ., 87, 111–117, <ext-link xlink:href="https://doi.org/10.1016/S0167-8809(01)00145-1" ext-link-type="DOI">10.1016/S0167-8809(01)00145-1</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Isbell and the National Committee on Soil and Terrain(2016)</label><mixed-citation>Isbell, R. and the National Committee on Soil and Terrain: The Australian soil classification, CSIRO publishing, ISBN 9781486314775, <uri>https://www.publishing.csiro.au/book/8016/</uri> (last access: 30 April 2026), 2016.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Karunaratne et al.(2024)Karunaratne, Asanopoulos, Jin, Baldock, Searle, Macdonald, and Macdonald</label><mixed-citation>Karunaratne, S., Asanopoulos, C., Jin, H., Baldock, J., Searle, R., Macdonald, B., and Macdonald, L. M.: Estimating the attainable soil organic carbon deficit in the soil fine fraction to inform feasible storage targets and de-risk carbon farming decisions, Soil Res., 62, <ext-link xlink:href="https://doi.org/10.1071/SR23096" ext-link-type="DOI">10.1071/SR23096</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Kronenberg(1994)</label><mixed-citation> Kronenberg, A. K.: Hydrogen speciation and chemical weakening of quartz, Rev. Mineral. Geochem., 29, 123–176, 1994.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Kuhn and Johnson(2013)</label><mixed-citation>Kuhn, M. and Johnson, K.: Applied predictive modeling, Springer, 1st edn., ISBN 978-1-4614-6848-6, <ext-link xlink:href="https://doi.org/10.1007/978-1-4614-6849-3" ext-link-type="DOI">10.1007/978-1-4614-6849-3</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Kuhn et al.(2012)Kuhn, Weston, Keefer, and Coulter</label><mixed-citation>Kuhn, M., Weston, S., Keefer, C., and Coulter, N.: Cubist models for regression, R package Vignette R package version 0.0, 18, 480, <uri>https://rdrr.io/rforge/Cubist/f/inst/doc/cubist.pdf</uri> (last access: 30 April 2026), 2012.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Lal(2016)</label><mixed-citation>Lal, R.: Soil health and carbon management, Food and Energy Security, 5, 212–222, <ext-link xlink:href="https://doi.org/10.1002/fes3.96" ext-link-type="DOI">10.1002/fes3.96</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Lal et al.(2015)Lal, Negassa, and Lorenz</label><mixed-citation>Lal, R., Negassa, W., and Lorenz, K.: Carbon sequestration in soil, Curr. Opin. Env. Sust., 15, 79–86, <ext-link xlink:href="https://doi.org/10.1079/PAVSNNR20083030" ext-link-type="DOI">10.1079/PAVSNNR20083030</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Lehmann et al.(2020)Lehmann, Bossio, Kögel-Knabner, and Rillig</label><mixed-citation>Lehmann, J., Bossio, D. A., Kögel-Knabner, I., and Rillig, M. C.: The concept and future prospects of soil health, Nature Reviews Earth &amp; Environment, 1, 544–553, <ext-link xlink:href="https://doi.org/10.1038/s43017-020-0080-8" ext-link-type="DOI">10.1038/s43017-020-0080-8</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Lin(1989)</label><mixed-citation>Lin, L. I.: A concordance correlation coefficient to evaluate reproducibility, Biometrics, 45, 255–268, <uri>https://www.jstor.org/stable/2532051</uri> (last access: 30 April 2026), 1989.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Max and Chapados(2009)</label><mixed-citation>Max, J.-J. and Chapados, C.: Isotope effects in liquid water by infrared spectroscopy. III. H2O and D2O spectra from 6000to cm-1, J. Chem. Phys., 131, <ext-link xlink:href="https://doi.org/10.1063/1.3258646" ext-link-type="DOI">10.1063/1.3258646</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Maynard and Johnson(2018)</label><mixed-citation>Maynard, J. J. and Johnson, M. G.: Applying fingerprint Fourier transformed infrared spectroscopy and chemometrics to assess soil ecosystem disturbance and recovery, J. Soil Water Conserv., 73, 443–451, <ext-link xlink:href="https://doi.org/10.2489/jswc.73.4.443" ext-link-type="DOI">10.2489/jswc.73.4.443</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>McKenzie(2022)</label><mixed-citation>McKenzie, T.: snfa: Smooth Non-Parametric Frontier Analysis, R package version <inline-formula><mml:math id="M231" display="inline"><mml:mo>≥</mml:mo></mml:math></inline-formula> 3.5.0, <uri>https://cran.r-project.org/web/packages/snfa/snfa.pdf</uri> (last access: 30 April 2026), 2022.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Nguyen et al.(1991)Nguyen, Janik, and Raupach</label><mixed-citation>Nguyen, T., Janik, L. J., and Raupach, M.: Diffuse reflectance infrared Fourier transform (DRIFT) spectroscopy in soil studies, Soil Res., 29, 49–67, <ext-link xlink:href="https://doi.org/10.1071/SR9910049" ext-link-type="DOI">10.1071/SR9910049</ext-link>, 1991.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Parmeter and Racine(2013)</label><mixed-citation>Parmeter, C. F. and Racine, J. S.: Smooth constrained frontier analysis, Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis: Essays in Honor of Halbert L. White Jr., Springer, New York, NY, 463–488, <ext-link xlink:href="https://doi.org/10.1007/978-1-4614-1653-1_18" ext-link-type="DOI">10.1007/978-1-4614-1653-1_18</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Poeplau et al.(2018)Poeplau, Don, Six, Kaiser, Benbi, Chenu, Cotrufo, Derrien, Gioacchini, Grand, Gregorich, Griepentrog, Gunina, Haddix, Kuzyakov, Kühnel, Macdonald, Soong, Trigalet, Vermeire, Rovira, van Wesemael, Wiesmeier, Yeasmin, Yevdokimov, and Nieder</label><mixed-citation>Poeplau, C., Don, A., Six, J., Kaiser, M., Benbi, D., Chenu, C., Cotrufo, M. F., Derrien, D., Gioacchini, P., Grand, S., Gregorich, E., Griepentrog, M., Gunina, A., Haddix, M., Kuzyakov, Y., Kühnel, A., Macdonald, L. M., Soong, J., Trigalet, S., Vermeire, M.-L., Rovira, P., van Wesemael, B., Wiesmeier, M., Yeasmin, S., Yevdokimov, I., and Nieder, R.: Isolating organic carbon fractions with varying turnover rates in temperate agricultural soils – A comprehensive method comparison, Soil Biol. Biochem., 125, 10–26, <ext-link xlink:href="https://doi.org/10.1016/j.soilbio.2018.06.025" ext-link-type="DOI">10.1016/j.soilbio.2018.06.025</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Quinlan(1992)</label><mixed-citation>Quinlan, J. R.: Learning with continuous classes, in: 5th Australian joint conference on artificial intelligence, Vol. 92, 343–348, World Scientific, <ext-link xlink:href="https://doi.org/10.1142/1897" ext-link-type="DOI">10.1142/1897</ext-link>, 1992.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>R Core Team(2024)</label><mixed-citation>R Core Team: R: A language and environment for statistical computing, <uri>https://www.R-project.org/</uri> (last access: 30 April 2026), 2024.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Senesi et al.(2003)Senesi, D'Orazio, and Ricca</label><mixed-citation>Senesi, N., D'Orazio, V., and Ricca, G.: Humic acids in the first generation of EUROSOILS, Geoderma, 116, 325–344, <ext-link xlink:href="https://doi.org/10.1016/S0016-7061(03)00107-1" ext-link-type="DOI">10.1016/S0016-7061(03)00107-1</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Shapley(1953)</label><mixed-citation>Shapley, L. S.: A value for n-person games, Contribution to the Theory of Games, 2, <uri>https://www.rand.org/content/dam/rand/pubs/papers/2021/P295.pdf</uri> (last access: 30 April 2026), 1953.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Shi et al.(2025)</label><mixed-citation>Shi, L., Daly, K., and O'Rourke, S.: Estimating mineral-associated organic carbon saturation and sequestration potential using MIR spectral based local quantile regression, Geoderma, 454, 117181, <ext-link xlink:href="https://doi.org/10.1016/j.geoderma.2025.117181" ext-link-type="DOI">10.1016/j.geoderma.2025.117181</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Six et al.(2002)Six, Conant, Paul, and Paustian</label><mixed-citation>Six, J., Conant, R. T., Paul, E. A., and Paustian, K.: Stabilization mechanisms of soil organic matter: implications for C-saturation of soils, Plant Soil, 241, 155–176, <ext-link xlink:href="https://doi.org/10.1023/A:1016125726789" ext-link-type="DOI">10.1023/A:1016125726789</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Six et al.(2024)Six, Doetterl, Laub, Müller, and Van de Broek</label><mixed-citation>Six, J., Doetterl, S., Laub, M., Müller, C. R., and Van de Broek, M.: The six rights of how and when to test for soil C saturation, SOIL, 10, 275–279, <ext-link xlink:href="https://doi.org/10.5194/soil-10-275-2024" ext-link-type="DOI">10.5194/soil-10-275-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Soriano-Disla et al.(2014)Soriano-Disla, Janik, Viscarra Rossel, Macdonald, and McLaughlin</label><mixed-citation>Soriano-Disla, J. M., Janik, L. J., Viscarra Rossel, R. A., Macdonald, L. M., and McLaughlin, M. J.: The performance of visible, near-, and mid-infrared reflectance spectroscopy for prediction of soil physical, chemical, and biological properties, Appl. Spectrosc. Rev., 49, 139–186, <ext-link xlink:href="https://doi.org/10.1080/05704928.2013.811081" ext-link-type="DOI">10.1080/05704928.2013.811081</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Spitzer and Kleinman(1961)</label><mixed-citation>Spitzer, W. and Kleinman, D.: Infrared lattice bands of quartz, Phys. Rev., 121, 1324, <ext-link xlink:href="https://doi.org/10.1103/PhysRev.121.1324" ext-link-type="DOI">10.1103/PhysRev.121.1324</ext-link>, 1961.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Stewart et al.(2007)Stewart, Paustian, Conant, Plante, and Six</label><mixed-citation>Stewart, C. E., Paustian, K., Conant, R. T., Plante, A. F., and Six, J.: Soil carbon saturation: concept, evidence and evaluation, Biogeochemistry, 86, 19–31, <ext-link xlink:href="https://doi.org/10.1007/s10533-007-9140-0" ext-link-type="DOI">10.1007/s10533-007-9140-0</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Tanykova et al.(2021)Tanykova, Petrova, Kostina, Kozlova, Leushina, and Spasennykh</label><mixed-citation>Tanykova, N., Petrova, Y., Kostina, J., Kozlova, E., Leushina, E., and Spasennykh, M.: Study of organic matter of unconventional reservoirs by IR spectroscopy and IR microscopy, Geosciences, 11, 277, <ext-link xlink:href="https://doi.org/10.3390/geosciences11070277" ext-link-type="DOI">10.3390/geosciences11070277</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Teng et al.(2018)Teng, Viscarra Rossel, Shi, and Behrens</label><mixed-citation>Teng, H., Viscarra Rossel, R. A., Shi, Z., and Behrens, T.: Updating a national soil classification with spectroscopic predictions and digital soil mapping, Catena, 164, 125–134, <ext-link xlink:href="https://doi.org/10.1016/j.catena.2018.01.015" ext-link-type="DOI">10.1016/j.catena.2018.01.015</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx47"><label>UNFCCC(2019)</label><mixed-citation>UNFCCC: Improved soil carbon, soil health and soil fertility under grassland and cropland as well as integrated systems, including water management: Workshop report by the secretariat, document GE.19-15339(E), <uri>https://unfccc.int/documents/199954</uri> (last access: 30 April 2026), 2019.</mixed-citation></ref>
      <ref id="bib1.bibx48"><label>Vereecken et al.(2016)Vereecken, Schnepf, Hopmans, Javaux, Or, Roose, Vanderborght, Young, Amelung, Aitkenhead, Allison, Assouline, Baveye, Berli, Brüggemann, Finke, Flury, Gaiser, Govers, Ghezzehei, Hallett, Hendricks Franssen, Heppell, Horn, Huisman, Jacques, Jonard, Kollet, Lafolie, Lamorski, Leitner, McBratney, Minasny, Montzka, Nowak, Pachepsky, Padarian, Romano, Roth, Rothfuss, Rowe, Schwen, Šimůnek, Tiktak, Van Dam, van der Zee, Vogel, Vrugt, Wöhling, and Young</label><mixed-citation>Vereecken, H., Schnepf, A., Hopmans, J. W., Javaux, M., Or, D., Roose, T., Vanderborght, J., Young, M. H., Amelung, W., Aitkenhead, M., Allison, S. D., Assouline, S., Baveye, P., Berli, M., Brüggemann, N., Finke, P., Flury, M., Gaiser, T., Govers, G., Ghezzehei, T., Hallett, P., Hendricks Franssen, H. J., Heppell, J., Horn, R., Huisman, J. A., Jacques, D., Jonard, F., Kollet, S., Lafolie, F., Lamorski, K., Leitner, D., McBratney, A., Minasny, B., Montzka, C., Nowak, W., Pachepsky, Y., Padarian, J., Romano, N., Roth, K., Rothfuss, Y., Rowe, E. C., Schwen, A., Šimůnek, J., Tiktak, A., Van Dam, J., van der Zee, S. E. A. T. M., Vogel, H. J., Vrugt, J. A., Wöhling, T., and Young, I. M.: Modeling soil processes: Review, key challenges, and new perspectives, Vadose Zone J., 15, vzj2015-09, <ext-link xlink:href="https://doi.org/10.2136/vzj2015.09.0131" ext-link-type="DOI">10.2136/vzj2015.09.0131</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx49"><label>Viscarra Rossel and Webster(2012)</label><mixed-citation>Viscarra Rossel, R. and Webster, R.: Predicting soil properties from the Australian soil visible–near infrared spectroscopic database, Eur. J. Soil Sci., 63, 848–860, <ext-link xlink:href="https://doi.org/10.1111/j.1365-2389.2012.01495.x" ext-link-type="DOI">10.1111/j.1365-2389.2012.01495.x</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx50"><label>Viscarra Rossel et al.(2006)Viscarra Rossel, Walvoort, McBratney, Janik, and Skjemstad</label><mixed-citation>Viscarra Rossel, R., Walvoort, D., McBratney, A., Janik, L. J., and Skjemstad, J.: Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties, Geoderma, 131, 59–75, <ext-link xlink:href="https://doi.org/10.1016/j.geoderma.2005.03.007" ext-link-type="DOI">10.1016/j.geoderma.2005.03.007</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx51"><label>Viscarra Rossel et al.(2024)Viscarra Rossel, Webster, Zhang, Shen, Dixon, Wang, and Walden</label><mixed-citation>Viscarra Rossel, R., Webster, R., Zhang, M., Shen, Z., Dixon, K., Wang, Y.-P., and Walden, L.: How much organic carbon could the soil store? The carbon sequestration potential of Australian soil, Glob. Change Biol., 30, e17053, <ext-link xlink:href="https://doi.org/10.1111/gcb.17053" ext-link-type="DOI">10.1111/gcb.17053</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx52"><label>Viscarra Rossel et al.(2010)Viscarra Rossel, Rizzo, Demattê, and Behrens</label><mixed-citation>Viscarra Rossel, R. A., Rizzo, R., Demattê, J. A. M., and Behrens, T.: Spatial Modeling of a Soil Fertility Index using Visible–Near-Infrared Spectra and Terrain Attributes, Soil Sci. Soc. Am. J., 74, 1293–1300, <ext-link xlink:href="https://doi.org/10.2136/sssaj2009.0130" ext-link-type="DOI">10.2136/sssaj2009.0130</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx53"><label>Viscarra Rossel et al.(2022)Viscarra Rossel, Behrens, Ben-Dor, Chabrillat, Demattê, Ge, Gomez, Guerrero, Peng, Ramirez-Lopez, Shi, Stenberg, Webster, Winowiecki, and Shen</label><mixed-citation>Viscarra Rossel, R. A., Behrens, T., Ben-Dor, E., Chabrillat, S., Demattê, J. A. M., Ge, Y., Gomez, C., Guerrero, C., Peng, Y., Ramirez-Lopez, L., Shi, Z., Stenberg, B., Webster, R., Winowiecki, L., and Shen, Z.: Diffuse reflectance spectroscopy for estimating soil properties: A technology for the 21st century, Eur. J. Soil Sci., 73, e13271, <ext-link xlink:href="https://doi.org/10.1111/ejss.13271" ext-link-type="DOI">10.1111/ejss.13271</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx54"><label>Vogel et al.(2019)Vogel, Eberhardt, Franko, Lang, Ließ, Weller, Wiesmeier, and Wollschläger</label><mixed-citation>Vogel, H.-J., Eberhardt, E., Franko, U., Lang, B., Ließ, M., Weller, U., Wiesmeier, M., and Wollschläger, U.: Quantitative evaluation of soil functions: Potential and state, Frontiers in Environmental Science, 7, 463905, <ext-link xlink:href="https://doi.org/10.3389/fenvs.2019.00164" ext-link-type="DOI">10.3389/fenvs.2019.00164</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx55"><label>Volkov et al.(2021)Volkov, Rogova, and Proskurnin</label><mixed-citation>Volkov, D. S., Rogova, O. B., and Proskurnin, M. A.: Organic matter and mineral composition of silicate soils: FTIR comparison study by photoacoustic, diffuse reflectance, and attenuated total reflection modalities, Agronomy, 11, 1879, <ext-link xlink:href="https://doi.org/10.3390/agronomy11091879" ext-link-type="DOI">10.3390/agronomy11091879</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx56"><label>Walden et al.(2025)Walden, Sepanta, and Viscarra Rossel</label><mixed-citation>Walden, L., Sepanta, F., and Viscarra Rossel, R.: FT-MIR Spectroscopic Analysis of the Organic Carbon Fractions in Australian Mineral Soils, Eur. J. Soil Sci., 76, e70084, <ext-link xlink:href="https://doi.org/10.1111/ejss.70084" ext-link-type="DOI">10.1111/ejss.70084</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx57"><label>Wang and Witten(1997)</label><mixed-citation>Wang, Y. and Witten, I. H.: Inducing model trees for continuous classes, in: Proceedings of the ninth European conference on machine learning, Vol. 9, 128–137, Citeseer, <uri>https://researchcommons.waikato.ac.nz/entities/publication/d6e1955d-92f8-4993-8999-98be1a1c1b59</uri> (last access: 30 April 2026), 1997.</mixed-citation></ref>
      <ref id="bib1.bibx58"><label>Wiesmeier et al.(2019)Wiesmeier, Urbanski, Hobley, Lang, von Lützow, Marin-Spiotta, van Wesemael, Rabot, Ließ, Garcia-Franco, Wollschläger, Vogel, and Kögel-Knabner</label><mixed-citation>Wiesmeier, M., Urbanski, L., Hobley, E., Lang, B., von Lützow, M., Marin-Spiotta, E., van Wesemael, B., Rabot, E., Ließ, M., Garcia-Franco, N., Wollschläger, U., Vogel, H.-J., and Kögel-Knabner, I.: Soil organic carbon storage as a key function of soils – A review of drivers and indicators at various scales, Geoderma, 333, 149–162, <ext-link xlink:href="https://doi.org/10.1016/j.geoderma.2018.07.026" ext-link-type="DOI">10.1016/j.geoderma.2018.07.026</ext-link>, 2019.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Estimating soil carbon sequestration potential with mid-IR spectroscopy and explainable machine learning</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>ABARES(2022)</label><mixed-citation>
      
ABARES: Land use of Australia 2010–11 to 2015–16, 250&thinsp;m, CC BY 4.0, Australian Bureau of Agricultural and Resource Economics and Sciences, <a href="https://doi.org/10.25814/7ygw-4d64" target="_blank">https://doi.org/10.25814/7ygw-4d64</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Abramoff et al.(2022)</label><mixed-citation>
      
Abramoff, R. Z., Guenet, B., Zhang, H., Georgiou, K., Xu, X., Viscarra Rossel,
R. A., Yuan, W., and Ciais, P.: Improved global-scale predictions of soil
carbon stocks with Millennial Version 2, Soil Biol. Biochem., 164,
108466, <a href="https://doi.org/10.1016/j.soilbio.2021.108466" target="_blank">https://doi.org/10.1016/j.soilbio.2021.108466</a>,
2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Angers et al.(2011)Angers, Arrouays, Saby, and
Walter</label><mixed-citation>
      
Angers, D., Arrouays, D., Saby, N., and Walter, C.: Estimating and mapping the
carbon saturation deficit of French agricultural topsoils, Soil Use
Manage., 27, 448–452, <a href="https://doi.org/10.1111/j.1475-2743.2011.00366.x" target="_blank">https://doi.org/10.1111/j.1475-2743.2011.00366.x</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Baldock et al.(2019)</label><mixed-citation>
      
Baldock, J., McNally, S., Beare, M., Curtin, D., and Hawke, B.: Predicting soil
carbon saturation deficit and related properties of New Zealand soils using
infrared spectroscopy, Soil Res., 57, 835–844,
<a href="https://doi.org/10.1071/SR19149" target="_blank">https://doi.org/10.1071/SR19149</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Beare et al.(2014)Beare, McNeill, Curtin, Parfitt, Jones, Dodd, and
Sharp</label><mixed-citation>
      
Beare, M., McNeill, S., Curtin, D., Parfitt, R., Jones, H., Dodd, M., and
Sharp, J.: Estimating the organic carbon stabilisation capacity and
saturation deficit of soils: a New Zealand case study, Biogeochemistry, 120,
71–87, <a href="https://doi.org/10.1007/s10533-014-9982-1" target="_blank">https://doi.org/10.1007/s10533-014-9982-1</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Beck et al.(2018)Beck, Zimmermann, McVicar, Vergopolan, Berg, and
Wood</label><mixed-citation>
      
Beck, H. E., Zimmermann, N. E., McVicar, T. R., Vergopolan, N., Berg, A., and
Wood, E. F.: Present and future Köppen-Geiger climate classification maps
at 1-km resolution, Sci. Data, 5, 1–12,
<a href="https://doi.org/10.1038/sdata.2018.214" target="_blank">https://doi.org/10.1038/sdata.2018.214</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Cécillon et al.(2009)Cécillon, Cassagne, Czarnes, Gros,
Vennetier, and Brun</label><mixed-citation>
      
Cécillon, L., Cassagne, N., Czarnes, S., Gros, R., Vennetier, M., and Brun,
J.-J.: Predicting soil quality indices with near infrared analysis in a
wildfire chronosequence, Sci. Total Environ., 407, 1200–1205,
<a href="https://doi.org/10.1016/j.scitotenv.2008.07.029" target="_blank">https://doi.org/10.1016/j.scitotenv.2008.07.029</a>, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Cohen et al.(2006)Cohen, Dabral, Graham, Prenger, and
Debusk</label><mixed-citation>
      
Cohen, M., Dabral, S., Graham, W. D., Prenger, J., and Debusk, W.: Evaluating
ecological condition using soil biogeochemical parameters and near infrared
reflectance spectra, Environ. Monitor. Assess., 116, 427–457,
<a href="https://doi.org/10.1007/s10661-006-7664-8" target="_blank">https://doi.org/10.1007/s10661-006-7664-8</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Commonwealth of Australia(2020)</label><mixed-citation>
      
Commonwealth of Australia: National Vegetation Information System V6.0,
<a href="https://erin.maps.arcgis.com/home/item.html?id=1dab9240522d42c5804677bf19ac64af" target="_blank"/> (last access: 30 April 2026),
2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Deiss et al.(2020)Deiss, Margenot, Culman, and
Demyan</label><mixed-citation>
      
Deiss, L., Margenot, A. J., Culman, S. W., and Demyan, M. S.: Optimizing
acquisition parameters in diffuse reflectance infrared Fourier transform
spectroscopy of soils, Soil Sci. Soc. Am. J., 84, 930–948,
<a href="https://doi.org/10.1002/saj2.20028" target="_blank">https://doi.org/10.1002/saj2.20028</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Deiss et al.(2023)Deiss, Demyan, Fulford, Hurisso, and
Culman</label><mixed-citation>
      
Deiss, L., Demyan, M. S., Fulford, A., Hurisso, T., and Culman, S. W.:
High-throughput soil health assessment to predict corn agronomic performance,
Field Crop. Res., 297, 108930,
<a href="https://doi.org/10.1016/j.fcr.2023.108930" target="_blank">https://doi.org/10.1016/j.fcr.2023.108930</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Du et al.(2014)Du, Goyne, Miles, and Zhou</label><mixed-citation>
      
Du, C., Goyne, K. W., Miles, R. J., and Zhou, J.: A 1915–2011 microscale
record of soil organic matter under wheat cultivation using FTIR-PAS
depth-profiling, Agron. Sustain. Dev., 34, 803–811,
<a href="https://doi.org/10.1007/s13593-013-0201-6" target="_blank">https://doi.org/10.1007/s13593-013-0201-6</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Elliott et al.(2007)Elliott, Worgan, Broadhurst, Draper, and
Scullion</label><mixed-citation>
      
Elliott, G. N., Worgan, H., Broadhurst, D., Draper, J., and Scullion, J.: Soil
differentiation using fingerprint Fourier transform infrared spectroscopy,
chemometrics and genetic algorithm-based feature selection, Soil Biol.   Biochem., 39, 2888–2896,
<a href="https://doi.org/10.1016/j.soilbio.2007.05.032" target="_blank">https://doi.org/10.1016/j.soilbio.2007.05.032</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Feng et al.(2013)Feng, Plante, and Six</label><mixed-citation>
      
Feng, W., Plante, A. F., and Six, J.: Improving estimates of maximal organic
carbon stabilization by fine soil particles, Biogeochemistry, 112, 81–93,
<a href="https://doi.org/10.1007/s10533-011-9679-7" target="_blank">https://doi.org/10.1007/s10533-011-9679-7</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Georgiou et al.(2022)Georgiou, Jackson, Vindušková,
Abramoff, Ahlström, Feng, Harden, Pellegrini, Polley, Soong, Riley, and
Torn</label><mixed-citation>
      
Georgiou, K., Jackson, R. B., Vindušková, O., Abramoff, R. Z.,
Ahlström, A., Feng, W., Harden, J. W., Pellegrini, A. F. A., Polley,
H. W., Soong, J. L., Riley, W. J., and Torn, M. S.: Global stocks and
capacity of mineral-associated soil organic carbon, Nat. Commun.,
13, 3797, <a href="https://doi.org/10.1038/s41467-022-31540-9" target="_blank">https://doi.org/10.1038/s41467-022-31540-9</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Hassink(1997)</label><mixed-citation>
      
Hassink, J.: The capacity of soils to preserve organic C and N by their
association with clay and silt particles, Plant Soil, 191, 77–87,
<a href="https://doi.org/10.1023/A:1004213929699" target="_blank">https://doi.org/10.1023/A:1004213929699</a>, 1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Hassink and Whitmore(1997)</label><mixed-citation>
      
Hassink, J. and Whitmore, A. P.: A model of the physical protection of organic
matter in soils, Soil Sci. Soc. Am. J., 61, 131–139,
<a href="https://doi.org/10.2136/sssaj1997.03615995006100010020x" target="_blank">https://doi.org/10.2136/sssaj1997.03615995006100010020x</a>,
1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Hicks et al.(2015)Hicks, Viscarra Rossel, and
Tuomi</label><mixed-citation>
      
Hicks, W., Viscarra Rossel, R., and Tuomi, S.: Developing the Australian
mid-infrared spectroscopic database using data from the Australian Soil
Resource Information System, Soil Res., 53, 922–931,
<a href="https://doi.org/10.1071/SR15171" target="_blank">https://doi.org/10.1071/SR15171</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Ingram and Fernandes(2001)</label><mixed-citation>
      
Ingram, J. and Fernandes, E.: Managing carbon sequestration in soils: concepts
and terminology, Agr. Ecosyst. Environ., 87, 111–117,
<a href="https://doi.org/10.1016/S0167-8809(01)00145-1" target="_blank">https://doi.org/10.1016/S0167-8809(01)00145-1</a>, 2001.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Isbell and the National Committee on Soil and
Terrain(2016)</label><mixed-citation>
      
Isbell, R. and the National Committee on Soil and Terrain: The Australian
soil classification, CSIRO publishing, ISBN 9781486314775,
<a href="https://www.publishing.csiro.au/book/8016/" target="_blank"/> (last access: 30 April 2026), 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Karunaratne et al.(2024)Karunaratne, Asanopoulos, Jin, Baldock,
Searle, Macdonald, and Macdonald</label><mixed-citation>
      
Karunaratne, S., Asanopoulos, C., Jin, H., Baldock, J., Searle, R., Macdonald,
B., and Macdonald, L. M.: Estimating the attainable soil organic carbon
deficit in the soil fine fraction to inform feasible storage targets and
de-risk carbon farming decisions, Soil Res., 62,
<a href="https://doi.org/10.1071/SR23096" target="_blank">https://doi.org/10.1071/SR23096</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Kronenberg(1994)</label><mixed-citation>
      
Kronenberg, A. K.: Hydrogen speciation and chemical weakening of quartz,
Rev. Mineral. Geochem., 29, 123–176, 1994.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Kuhn and Johnson(2013)</label><mixed-citation>
      
Kuhn, M. and Johnson, K.: Applied predictive modeling, Springer, 1st edn., ISBN
978-1-4614-6848-6, <a href="https://doi.org/10.1007/978-1-4614-6849-3" target="_blank">https://doi.org/10.1007/978-1-4614-6849-3</a>,
2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Kuhn et al.(2012)Kuhn, Weston, Keefer, and Coulter</label><mixed-citation>
      
Kuhn, M., Weston, S., Keefer, C., and Coulter, N.: Cubist models for
regression, R package Vignette R package version 0.0, 18, 480,
<a href="https://rdrr.io/rforge/Cubist/f/inst/doc/cubist.pdf" target="_blank"/> (last access: 30 April 2026), 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Lal(2016)</label><mixed-citation>
      
Lal, R.: Soil health and carbon management, Food and Energy Security, 5,
212–222, <a href="https://doi.org/10.1002/fes3.96" target="_blank">https://doi.org/10.1002/fes3.96</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Lal et al.(2015)Lal, Negassa, and Lorenz</label><mixed-citation>
      
Lal, R., Negassa, W., and Lorenz, K.: Carbon sequestration in soil, Curr.
Opin. Env. Sust., 15, 79–86,
<a href="https://doi.org/10.1079/PAVSNNR20083030" target="_blank">https://doi.org/10.1079/PAVSNNR20083030</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Lehmann et al.(2020)Lehmann, Bossio, Kögel-Knabner, and
Rillig</label><mixed-citation>
      
Lehmann, J., Bossio, D. A., Kögel-Knabner, I., and Rillig, M. C.: The
concept and future prospects of soil health, Nature Reviews Earth &amp;
Environment, 1, 544–553,
<a href="https://doi.org/10.1038/s43017-020-0080-8" target="_blank">https://doi.org/10.1038/s43017-020-0080-8</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Lin(1989)</label><mixed-citation>
      
Lin, L. I.: A concordance correlation coefficient to evaluate reproducibility,
Biometrics, 45, 255–268,
<a href="https://www.jstor.org/stable/2532051" target="_blank"/> (last access: 30 April 2026), 1989.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Max and Chapados(2009)</label><mixed-citation>
      
Max, J.-J. and Chapados, C.: Isotope effects in liquid water by infrared
spectroscopy. III. H2O and D2O spectra from 6000to cm-1, J.
Chem. Phys., 131, <a href="https://doi.org/10.1063/1.3258646" target="_blank">https://doi.org/10.1063/1.3258646</a>,
2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Maynard and Johnson(2018)</label><mixed-citation>
      
Maynard, J. J. and Johnson, M. G.: Applying fingerprint Fourier transformed
infrared spectroscopy and chemometrics to assess soil ecosystem disturbance
and recovery, J. Soil Water Conserv., 73, 443–451,
<a href="https://doi.org/10.2489/jswc.73.4.443" target="_blank">https://doi.org/10.2489/jswc.73.4.443</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>McKenzie(2022)</label><mixed-citation>
      
McKenzie, T.: snfa: Smooth Non-Parametric Frontier Analysis, R package version
 ≥ &thinsp;3.5.0, <a href="https://cran.r-project.org/web/packages/snfa/snfa.pdf" target="_blank"/> (last access: 30 April 2026), 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Nguyen et al.(1991)Nguyen, Janik, and Raupach</label><mixed-citation>
      
Nguyen, T., Janik, L. J., and Raupach, M.: Diffuse reflectance infrared Fourier
transform (DRIFT) spectroscopy in soil studies, Soil Res., 29, 49–67,
<a href="https://doi.org/10.1071/SR9910049" target="_blank">https://doi.org/10.1071/SR9910049</a>, 1991.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Parmeter and Racine(2013)</label><mixed-citation>
      
Parmeter, C. F. and Racine, J. S.: Smooth constrained frontier analysis, Recent
Advances and Future Directions in Causality, Prediction, and Specification
Analysis: Essays in Honor of Halbert L. White Jr., Springer, New York, NY, 463–488,
<a href="https://doi.org/10.1007/978-1-4614-1653-1_18" target="_blank">https://doi.org/10.1007/978-1-4614-1653-1_18</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Poeplau et al.(2018)Poeplau, Don, Six, Kaiser, Benbi, Chenu, Cotrufo,
Derrien, Gioacchini, Grand, Gregorich, Griepentrog, Gunina, Haddix, Kuzyakov,
Kühnel, Macdonald, Soong, Trigalet, Vermeire, Rovira, van Wesemael,
Wiesmeier, Yeasmin, Yevdokimov, and Nieder</label><mixed-citation>
      
Poeplau, C., Don, A., Six, J., Kaiser, M., Benbi, D., Chenu, C., Cotrufo,
M. F., Derrien, D., Gioacchini, P., Grand, S., Gregorich, E., Griepentrog,
M., Gunina, A., Haddix, M., Kuzyakov, Y., Kühnel, A., Macdonald, L. M.,
Soong, J., Trigalet, S., Vermeire, M.-L., Rovira, P., van Wesemael, B.,
Wiesmeier, M., Yeasmin, S., Yevdokimov, I., and Nieder, R.: Isolating organic
carbon fractions with varying turnover rates in temperate agricultural
soils – A comprehensive method comparison, Soil Biol. Biochem., 125,
10–26, <a href="https://doi.org/10.1016/j.soilbio.2018.06.025" target="_blank">https://doi.org/10.1016/j.soilbio.2018.06.025</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Quinlan(1992)</label><mixed-citation>
      
Quinlan, J. R.: Learning with continuous classes, in: 5th Australian joint
conference on artificial intelligence, Vol. 92, 343–348, World
Scientific, <a href="https://doi.org/10.1142/1897" target="_blank">https://doi.org/10.1142/1897</a>, 1992.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>R Core Team(2024)</label><mixed-citation>
      
R Core Team: R: A language and environment for statistical computing,
<a href="https://www.R-project.org/" target="_blank"/> (last access: 30 April 2026), 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Senesi et al.(2003)Senesi, D'Orazio, and Ricca</label><mixed-citation>
      
Senesi, N., D'Orazio, V., and Ricca, G.: Humic acids in the first generation of
EUROSOILS, Geoderma, 116, 325–344,
<a href="https://doi.org/10.1016/S0016-7061(03)00107-1" target="_blank">https://doi.org/10.1016/S0016-7061(03)00107-1</a>, 2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Shapley(1953)</label><mixed-citation>
      
Shapley, L. S.: A value for n-person games, Contribution to the Theory of
Games, 2, <a href="https://www.rand.org/content/dam/rand/pubs/papers/2021/P295.pdf" target="_blank"/> (last access: 30 April 2026),
1953.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Shi et al.(2025)</label><mixed-citation>
      
Shi, L., Daly, K., and O'Rourke, S.: Estimating mineral-associated organic
carbon saturation and sequestration potential using MIR spectral based local
quantile regression, Geoderma, 454, 117181,
<a href="https://doi.org/10.1016/j.geoderma.2025.117181" target="_blank">https://doi.org/10.1016/j.geoderma.2025.117181</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Six et al.(2002)Six, Conant, Paul, and
Paustian</label><mixed-citation>
      
Six, J., Conant, R. T., Paul, E. A., and Paustian, K.: Stabilization mechanisms
of soil organic matter: implications for C-saturation of soils, Plant Soil, 241, 155–176, <a href="https://doi.org/10.1023/A:1016125726789" target="_blank">https://doi.org/10.1023/A:1016125726789</a>,
2002.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Six et al.(2024)Six, Doetterl, Laub, Müller, and Van de
Broek</label><mixed-citation>
      
Six, J., Doetterl, S., Laub, M., Müller, C. R., and Van de Broek, M.: The six rights of how and when to test for soil C saturation, SOIL, 10, 275–279, <a href="https://doi.org/10.5194/soil-10-275-2024" target="_blank">https://doi.org/10.5194/soil-10-275-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Soriano-Disla et al.(2014)Soriano-Disla, Janik, Viscarra Rossel,
Macdonald, and McLaughlin</label><mixed-citation>
      
Soriano-Disla, J. M., Janik, L. J., Viscarra Rossel, R. A., Macdonald, L. M.,
and McLaughlin, M. J.: The performance of visible, near-, and mid-infrared
reflectance spectroscopy for prediction of soil physical, chemical, and
biological properties, Appl. Spectrosc. Rev., 49, 139–186,
<a href="https://doi.org/10.1080/05704928.2013.811081" target="_blank">https://doi.org/10.1080/05704928.2013.811081</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Spitzer and Kleinman(1961)</label><mixed-citation>
      
Spitzer, W. and Kleinman, D.: Infrared lattice bands of quartz, Phys.
Rev., 121, 1324, <a href="https://doi.org/10.1103/PhysRev.121.1324" target="_blank">https://doi.org/10.1103/PhysRev.121.1324</a>,
1961.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Stewart et al.(2007)Stewart, Paustian, Conant, Plante, and
Six</label><mixed-citation>
      
Stewart, C. E., Paustian, K., Conant, R. T., Plante, A. F., and Six, J.: Soil
carbon saturation: concept, evidence and evaluation, Biogeochemistry, 86,
19–31, <a href="https://doi.org/10.1007/s10533-007-9140-0" target="_blank">https://doi.org/10.1007/s10533-007-9140-0</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Tanykova et al.(2021)Tanykova, Petrova, Kostina, Kozlova, Leushina,
and Spasennykh</label><mixed-citation>
      
Tanykova, N., Petrova, Y., Kostina, J., Kozlova, E., Leushina, E., and
Spasennykh, M.: Study of organic matter of unconventional reservoirs by IR
spectroscopy and IR microscopy, Geosciences, 11, 277,
<a href="https://doi.org/10.3390/geosciences11070277" target="_blank">https://doi.org/10.3390/geosciences11070277</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Teng et al.(2018)Teng, Viscarra Rossel, Shi, and
Behrens</label><mixed-citation>
      
Teng, H., Viscarra Rossel, R. A., Shi, Z., and Behrens, T.: Updating a national
soil classification with spectroscopic predictions and digital soil mapping,
Catena, 164, 125–134,
<a href="https://doi.org/10.1016/j.catena.2018.01.015" target="_blank">https://doi.org/10.1016/j.catena.2018.01.015</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>UNFCCC(2019)</label><mixed-citation>
      
UNFCCC: Improved soil carbon, soil health and soil fertility under grassland
and cropland as well as integrated systems, including water management:
Workshop report by the secretariat, document GE.19-15339(E),
<a href="https://unfccc.int/documents/199954" target="_blank"/> (last access: 30 April 2026),
2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Vereecken et al.(2016)Vereecken, Schnepf, Hopmans, Javaux, Or, Roose,
Vanderborght, Young, Amelung, Aitkenhead, Allison, Assouline, Baveye, Berli,
Brüggemann, Finke, Flury, Gaiser, Govers, Ghezzehei, Hallett,
Hendricks Franssen, Heppell, Horn, Huisman, Jacques, Jonard, Kollet, Lafolie,
Lamorski, Leitner, McBratney, Minasny, Montzka, Nowak, Pachepsky, Padarian,
Romano, Roth, Rothfuss, Rowe, Schwen, Šimůnek, Tiktak, Van Dam,
van der Zee, Vogel, Vrugt, Wöhling, and Young</label><mixed-citation>
      
Vereecken, H., Schnepf, A., Hopmans, J. W., Javaux, M., Or, D., Roose, T.,
Vanderborght, J., Young, M. H., Amelung, W., Aitkenhead, M., Allison, S. D.,
Assouline, S., Baveye, P., Berli, M., Brüggemann, N., Finke, P., Flury,
M., Gaiser, T., Govers, G., Ghezzehei, T., Hallett, P., Hendricks Franssen,
H. J., Heppell, J., Horn, R., Huisman, J. A., Jacques, D., Jonard, F.,
Kollet, S., Lafolie, F., Lamorski, K., Leitner, D., McBratney, A., Minasny,
B., Montzka, C., Nowak, W., Pachepsky, Y., Padarian, J., Romano, N., Roth,
K., Rothfuss, Y., Rowe, E. C., Schwen, A., Šimůnek, J., Tiktak,
A., Van Dam, J., van der Zee, S. E. A. T. M., Vogel, H. J., Vrugt, J. A.,
Wöhling, T., and Young, I. M.: Modeling soil processes: Review, key
challenges, and new perspectives, Vadose Zone J., 15, vzj2015-09,
<a href="https://doi.org/10.2136/vzj2015.09.0131" target="_blank">https://doi.org/10.2136/vzj2015.09.0131</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Viscarra Rossel and Webster(2012)</label><mixed-citation>
      
Viscarra Rossel, R. and Webster, R.: Predicting soil properties from the
Australian soil visible–near infrared spectroscopic database, Eur.
J. Soil Sci., 63, 848–860,
<a href="https://doi.org/10.1111/j.1365-2389.2012.01495.x" target="_blank">https://doi.org/10.1111/j.1365-2389.2012.01495.x</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Viscarra Rossel et al.(2006)Viscarra Rossel, Walvoort, McBratney,
Janik, and Skjemstad</label><mixed-citation>
      
Viscarra Rossel, R., Walvoort, D., McBratney, A., Janik, L. J., and Skjemstad,
J.: Visible, near infrared, mid infrared or combined diffuse reflectance
spectroscopy for simultaneous assessment of various soil properties,
Geoderma, 131, 59–75,
<a href="https://doi.org/10.1016/j.geoderma.2005.03.007" target="_blank">https://doi.org/10.1016/j.geoderma.2005.03.007</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Viscarra Rossel et al.(2024)Viscarra Rossel, Webster, Zhang, Shen,
Dixon, Wang, and Walden</label><mixed-citation>
      
Viscarra Rossel, R., Webster, R., Zhang, M., Shen, Z., Dixon, K., Wang, Y.-P.,
and Walden, L.: How much organic carbon could the soil store? The carbon
sequestration potential of Australian soil, Glob. Change Biol., 30,
e17053, <a href="https://doi.org/10.1111/gcb.17053" target="_blank">https://doi.org/10.1111/gcb.17053</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Viscarra Rossel et al.(2010)Viscarra Rossel, Rizzo, Demattê, and
Behrens</label><mixed-citation>
      
Viscarra Rossel, R. A., Rizzo, R., Demattê, J. A. M., and Behrens, T.:
Spatial Modeling of a Soil Fertility Index using Visible–Near-Infrared
Spectra and Terrain Attributes, Soil Sci. Soc. Am. J., 74,
1293–1300, <a href="https://doi.org/10.2136/sssaj2009.0130" target="_blank">https://doi.org/10.2136/sssaj2009.0130</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Viscarra Rossel et al.(2022)Viscarra Rossel, Behrens, Ben-Dor,
Chabrillat, Demattê, Ge, Gomez, Guerrero, Peng, Ramirez-Lopez, Shi,
Stenberg, Webster, Winowiecki, and Shen</label><mixed-citation>
      
Viscarra Rossel, R. A., Behrens, T., Ben-Dor, E., Chabrillat, S.,
Demattê, J. A. M., Ge, Y., Gomez, C., Guerrero, C., Peng, Y.,
Ramirez-Lopez, L., Shi, Z., Stenberg, B., Webster, R., Winowiecki, L., and
Shen, Z.: Diffuse reflectance spectroscopy for estimating soil properties: A
technology for the 21st century, Eur. J. Soil Sci., 73,
e13271, <a href="https://doi.org/10.1111/ejss.13271" target="_blank">https://doi.org/10.1111/ejss.13271</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Vogel et al.(2019)Vogel, Eberhardt, Franko, Lang, Ließ, Weller,
Wiesmeier, and Wollschläger</label><mixed-citation>
      
Vogel, H.-J., Eberhardt, E., Franko, U., Lang, B., Ließ, M., Weller, U.,
Wiesmeier, M., and Wollschläger, U.: Quantitative evaluation of soil
functions: Potential and state, Frontiers in Environmental Science, 7,
463905, <a href="https://doi.org/10.3389/fenvs.2019.00164" target="_blank">https://doi.org/10.3389/fenvs.2019.00164</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Volkov et al.(2021)Volkov, Rogova, and
Proskurnin</label><mixed-citation>
      
Volkov, D. S., Rogova, O. B., and Proskurnin, M. A.: Organic matter and mineral
composition of silicate soils: FTIR comparison study by photoacoustic,
diffuse reflectance, and attenuated total reflection modalities, Agronomy,
11, 1879, <a href="https://doi.org/10.3390/agronomy11091879" target="_blank">https://doi.org/10.3390/agronomy11091879</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>Walden et al.(2025)Walden, Sepanta, and
Viscarra Rossel</label><mixed-citation>
      
Walden, L., Sepanta, F., and Viscarra Rossel, R.: FT-MIR Spectroscopic Analysis
of the Organic Carbon Fractions in Australian Mineral Soils, Eur. J. Soil Sci., 76, e70084,
<a href="https://doi.org/10.1111/ejss.70084" target="_blank">https://doi.org/10.1111/ejss.70084</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>Wang and Witten(1997)</label><mixed-citation>
      
Wang, Y. and Witten, I. H.: Inducing model trees for continuous classes, in:
Proceedings of the ninth European conference on machine learning, Vol. 9,
128–137, Citeseer, <a href="https://researchcommons.waikato.ac.nz/entities/publication/d6e1955d-92f8-4993-8999-98be1a1c1b59" target="_blank"/> (last access: 30 April 2026),
1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>Wiesmeier et al.(2019)Wiesmeier, Urbanski, Hobley, Lang, von
Lützow, Marin-Spiotta, van Wesemael, Rabot, Ließ, Garcia-Franco,
Wollschläger, Vogel, and Kögel-Knabner</label><mixed-citation>
      
Wiesmeier, M., Urbanski, L., Hobley, E., Lang, B., von Lützow, M.,
Marin-Spiotta, E., van Wesemael, B., Rabot, E., Ließ, M., Garcia-Franco,
N., Wollschläger, U., Vogel, H.-J., and Kögel-Knabner, I.: Soil organic
carbon storage as a key function of soils – A review of drivers and
indicators at various scales, Geoderma, 333, 149–162,
<a href="https://doi.org/10.1016/j.geoderma.2018.07.026" target="_blank">https://doi.org/10.1016/j.geoderma.2018.07.026</a>, 2019.

    </mixed-citation></ref-html>--></article>
