Articles | Volume 12, issue 1
https://doi.org/10.5194/soil-12-113-2026
https://doi.org/10.5194/soil-12-113-2026
Original research article
 | 
10 Feb 2026
Original research article |  | 10 Feb 2026

Challenges in the use of local data for regional scale mapping of C and N stocks in the continuous permafrost zone at the Yukon Coastal Plain

Julia Wagner, Juliane Wolter, Justine Ramage, Victoria Martin, Andreas Richter, Niek Jesse Speetjens, Jorien E. Vonk, Rachele Lodi, Annett Bartsch, Michael Fritz, Hugues Lantuit, and Gustaf Hugelius
Abstract

Permafrost soils are particularly vulnerable to climate change. To assess and improve estimations of carbon (C) and nitrogen (N) budgets it is necessary to accurately map soil carbon and nitrogen in the permafrost region. In particular, soil organic carbon (SOC) stocks have been predicted and mapped by many studies from local to pan-Arctic scales. Several studies have been carried out at the Canadian Beaufort Sea coast, though no regional maps of terrestrial carbon stocks based on spatial modelling has been conducted yet. This study combines available field data from the Canadian Yukon coastal plain and uses it to map regional SOC and N stocks using the machine learning algorithm random forest and environmental variables based on remote sensing data. We developed models using the data for the entire region and separate models for the coastal mainland area and Qikiqtaruk Herschel Islandand. Each model was used to map SOC and N stocks for its respective area. We assessed the performance of the different random forest models by using crossvalidation. We further assessed model results using the Area of Applicability (AOA) method and the quantile regression forest approach, comparing the results and discussing their implications within the context of both methods. We explore local differences in soil properties and how soil data distribution across the region affects the accuracy of the predictions of SOC and N stocks. The estimated SOC stock for the upper metre is 48.7 ± 6.6 kg m−2 and the N stock 3.03 ± 0.30 kg m−2. The average SOC stocks vary significantly when creating separate models for subsets of the data. Qikiqtaruk Herschel Island is geologically different from the coastal mainland and has on average lower SOC stocks. Including Qikiqtaruk Herschel Island soil data to predict SOC stocks at the mainland has large impact on the results. Differences in N stocks were not as dependent on the location as SOC stocks and rather differences between individual studies occurred. The results of the separate models show 38.0 ± 5.6 kg C m−2 and 2.87 ± 0.34 kg N m−2 for Qikiqtaruk Herschel Island and 52.5 ± 6.3 kg C m−2 and 3.15 ± 0.32 kg N m−2 for the mainland. These estimates refer to the entire study area, without masking out regions outside the Area of Applicability (AOA). Our results indicate that the spatial perspective, whether regional or local (whole area vs. mainland and Qikiqtaruk Herschel Island separately), can affect the patterns and values represented in the resulting maps, highlighting the importance of scale when interpreting SOC and N stock distributions. Using a more densely sampled, region-specific dataset, our study captures finer regional-scale patterns than previous lower-resolution or pan-Arctic analyses.

Share
1 Introduction

The northern Permafrost region, which is approximately 22 % of the earth’s surface (Obu2021), is considered vulnerable to climate change. With air temperatures rising up to four times faster than the global average (Meredith et al.2022; Rantanen et al.2022), permafrost is thawing across the pan-arctic. Between 2017 and 2016 permafrost temperatures in the continuous permafrost zone have increased by 0.39 ± 0.15 °C (Biskaborn et al.2019). The permafrost region contains ca. 1600 Gt of carbon, including ca. 1000 Gt stored in frozen ground, which is twice as much as atmosphere (Strauss et al.2021). The thickening of the active layer, driven by warming, increases carbon availability for decomposition, releasing greenhouse gases like carbon dioxide and methane, which further amplify climate warming. This is called the permafrost-climate feedback (Schuur et al.2015). In addition to gradual thaw, ice-rich permafrost is particularly prone to thermokarst and abrupt thaw events (Turetsky et al.2020) which further accelerate losses of permafrost carbon.

Approximately one third of the earth’s coastlines are Arctic (Lantuit et al.2012) and coastal permafrost areas are especially prone to changes. Average coastal retreat rates of 0.7 myr−1 (Irrgang et al.2018) were detected along the Yukon coast. This represents considerable changes, taking into account that previous studies estimated SOC fluxes of 132 kg C yr−1 per metre of shoreline from land to sea at the Yukon coast (Couture et al.2018b). Therefore, it is crucial to estimate soil organic carbon (SOC) and nitrogen (N) stocks in the coastal lowland area. The study area is characterised by extensive ice-wedge polygon (IWP) terrain, high ground ice content and the occurrence of thermokarst.

Quantification and mapping of SOC using different methods have been carried out on pan-Arctic (Hugelius et al.2014; Mishra et al.2021) and local level (Obu et al.2017; Palmtag et al.2018; Siewert et al.2015, 2016; Siewert2018; Wagner et al.2023). However, few studies have investigated how the shift from local to regional scale affects accuracy and the prediction results.Here, we define local scale as the catchment scale or a smaller, clearly delineated area, and regional scale as an area spanning multiple catchments (Pachepsky and Hill2017). Although the term scale can be used differently across disciplines, the unifying element among most definitions is that scales primarily differ in their level of spatial detail. In our study, the focus is on the regional scale, covering the Yukon coastal plain, which spans multiple catchments and thus captures variability in soil-forming factors beyond gradients controlled by catenary position or within-catchment processes. The availability of intermediate to high resolution datasets for larger areas in recent years (e.g. Bartsch et al.2019a; Widhalm et al.2019) offers the possibility to investigate local and regional SOC and N stocks using the same data and therefore connects the different scales.

As permafrost soil properties and ice content are highly variable, especially in ice-wedge polygon (IWP) terrain (e.g. Siewert et al.2021a), high-resolution studies that represent the variability at the pedon-scale of <2 m (e.g. Wagner et al., 2023), the terrain scale (ranging up to tens of metres) up to landscape scale are needed. The variability at landscape scale can now be represented in pan-Arctic studies such as Mishra et al. (2021) with a resolution of 250 m or better. Studies in between those scales are lacking, but necessary when quantifying regional carbon budgets.

As permafrost soil properties and ice content are highly variable, especially in ice-wedge polygon (IWP) terrain (e.g.,  Siewert et al.2021a), high-resolution studies at the pedon scale (<2 m; e.g. Wagner et al.2023) capture fine-scale variability. At the other end of the spectrum, pan-Arctic studies (e.g.,  Mishra et al.2021) represent variability at coarser resolutions of 250 m, defined as landscape scale (Siewert et al.2021a). However, studies at intermediate, regional scales, bridging the gap between pedon-scale and landscape-scale approaches, are largely lacking, yet they are essential for accurately quantifying regional carbon budgets. While Couture et al. (2018b) studied fluxes from nearshore terrain units, our study is the first for the region that combines all available terrestrial SOC and N data to predict regional SOC and N stocks. Most of the existing SOC and N measurements in the region were collected for local studies and are often difficult to compare across sites or scales. By compiling these datasets and including new, previously unpublished data, our study creates the first consistent regional-scale dataset for the Yukon coastal plain, providing a baseline for regional modeling, mapping and estimation of carbon and nitrogen stocks.

A range of different methods have been used to upscale permafrost SOC stocks, including landcover class upscaling (e.g. Obu et al.2017; Palmtag et al.2018). Machine learning (ML) algorithms are also widely used for the spatial mapping of SOC and N stocks (e.g. Siewert2018; Wagner et al.2023), or for investigating C and N fluxes (Virkkala et al.2024). Machine learning as part of digital soil mapping (DSM) generates model based datasets for soil properties. Based on the “SCORPAN”-principle (McBratney et al.2003) covariates representing the soil forming factors (soils, climate, organisms, terrain, parent material, age, spatial location) are identified and spatial patterns and relationships between those and the target variable are used to predict the target variable to the whole area (Lagacherie2008). Next to geostatistical methods, machine learning methods are the current state of the art, especially recommended to map permafrost soils due to the spatial heterogeneity and the ability of ML to detect non-linear relationships (Siewert et al.2021a). A method that is suitable for evaluating the representativeness of predictive models is the area of applicability (Meyer and Pebesma2021). The AOA method computes a dissimilarity index (DI) by analysing the relationship between the target variable and covariates in prediction areas, accounting for variable importance and location distances. Areas poorly represented by these relationships receive a low DI, and a threshold is used to classify regions into applicable or non-applicable areas. To quantify regional C and N budgets, quantifications of soil C and N stocks are necessary. The aim of this study is to quantify SOC and N stocks and their local-scale variability in the whole lowland area at the Yukon coastal plain. The targeted spatial resolution is 10 m.

To reach this overall aim, the specific objectives of our study are

  1. to compile and integrate research on SOC and N stocks that has been carried out in the study region,

  2. to provide regional spatial datasets of SOC and N stocks and

  3. to assess our predictions via the area of applicability (AOA) method, to quantify the uncertainty with quantile regression forest and to discuss the limitations of the models caused by the local heterogeneity and the inherent spatial clustering of the soil data.

2 Material and methods

2.1 Study area

The study area is located in the continuous permafrost zone (Obu et al.2019) of the Yukon Coastal Plain and comprises the coastal lowland area below 180 m.a.s.l. and Qikiqtaruk Herschel Island (in further description named as Herschel Island) (Fig. 1A). During the Last Glacial Maximum (LGM) the study area was partially glaciated with the Laurentide ice sheet reaching up until west of Herschel Island during the Wisconsinan glaciation (Fritz et al.2011; Rampton1982). Herschel Island represents a unique feature, namely an ice-thrust moraine created by the advance of the Laurentide ice sheet (Fritz et al.2012; Rampton1982). Rampton (1982) mapped the quaternary deposits of the region. The western area, outside of the LGM glacial limit, is composed of mainly lacustrine deposits. Colluvial slopes are found in areas with higher elevation and greater distance from the coast. The area within the former glacial limits is composed by lacustrine and outwash plains. The whole area is transected by several rivers flowing towards the Arctic Ocean, creating alluvial fens, floodplains and stream terraces. Further, some areas are rich in organic material with peat layers of 0.4 to 3.5 m thickness.

The study area is high in ground ice, with an average of 46 % volume of ground material and up to 74 % in some areas (Couture and Pollard2017; Couture et al.2018b). The study area is highly prone to erosion with mean coastal erosion rates of 0.7 m yr−1, with some parts eroding up to 9 m yr−1 (Irrgang et al.2018). On Herschel Island several well-studied retrogressive thaw slumps occur and retreat rates of 0.45 ± 0.48 m yr−1 between 1970 and 2000 (Lantuit and Pollard2008) and 0.68 ± 2.48 m yr−1 between 2000 and 2011 were estimated (Obu et al.2016). The local scale landforms in the study area are determined by permafrost and ground ice with large areas covered by IWP tundra. The formation and geomorphology of these IWP systems has been extensively studied (e.g. Fritz et al.2016; Wolter et al.2016, 2018). Another commonly occurring landform feature are drained thermokarst lake basins (DTLB) which are subject of current studies due to their potential for carbon storage (Wolter et al.2024). The climate is characterised as Tundra climate. Between 1972 and 2000, mean annual temperatures were 9.9 and 11.0 °C at the nearby weather stations Komakuk Beach and Shingle point, respectively. During the summer months (June, July and August) average temperatures of 8.6 °C (±1.7 °C) at Shingle point and 6.0 °C (±1.6 °C) at Komakuk Beach were observed (Government of Canada2024). The active layer depths in the study area reported in the literature range from 30 to up to 50 cm (Siewert et al.2021a; Wagner et al.2023; Wolter et al.2018). The vegetation close to the coast is characterised by tussock and non-tussock sedge, dwarf-shrub, moss tundra and sedge, moss, dwarf-shrub and low shrub wetlands. Herschel Island is defined by erect dwarf-shrub tundra (Walker et al.2005).

2.2 Compilation, integration and processing of soil data and predictor variables

This study focuses on coastal lowland tundra and Herschel Island, the mapping region (Fig. 1A) was selected from the DEM by using areas with an elevation below 180 m. This threshold includes all the area of Herschel Island, while not expanding too far inland where soil data is limited.

https://soil.copernicus.org/articles/12/113/2026/soil-12-113-2026-f01

Figure 1Overview of the study area, locations and sources of the soil data used in this study (A) and zoom-in to the eastern part of Herschel Island (B). The DTLB dataset is a new dataset previously unpublished and included in this study (for more information see Chap. 2.2). The mapping extent is shown in grey on top of the Basemap: ESRI (2024a) | Powered by Esri.

Soil property data were compiled/ combined from multiple published sources (Table 1 and harmonized by converting reported values into the target depth intervals of 0–30 and 30–100 cm using weighted averaging. The laboratory methods differ with some studies using an elemental analyzer (e.g. Elementar vario EL III and Elementar vario MAX C in the study by Obu et al.2017) or an elemental analyzer (CE Instrument EA 1110 elemental analyzer) coupled to an isotope ratio mass spectrometer (Thermo Fischer Scientific Instruments, Delta V Advantage) (Siewert et al.2021a) or coupled to a continuous-flow isotopic ratio mass spectrometer (IRMS, DeltaPlus, Finnigan MAT) (Wagner et al.2023). Further, the timing of the fieldwork differs. Whereas the majority of the studies sampled during the summer (mostly July), the DTLB samples were taken as full frozen cores during spring 2019. These differences in sampling (active layer sampling + coring vs. full frozen coring) have likely affected the results together with the use of different laboratory methods to derive SOC content. We did not account for these differences in the compilation of the dataset. However, permafrost soil research is conducted according to established standardized protocols, facilitating reproducibility and comparability among studies in this region (Ping et al.2013; Palmtag et al.2022) which also ensures samples are handled in a similar way. E.g. the coarse fraction >2 mm was excluded from the samples which was mentioned in all studies except by Ramage et al. (2019). The studies were conducted with different sampling designs and goals, causing spatial clustering of data points in the study area. The studies by Siewert et al. (2021a) and Ramage et al. (2019) both followed a transect approach, the study by Obu et al. (2017) sampled representative sites per ecological unit and Couture et al. (2018a) sampled sites along the coast. Wagner et al. (2023) followed a stratified random sampling approach. The study by Siewert et al. (2021a) focuses on soil properties at a small scale and contains values for subpedons which are soil profiles in 10 cm intervals generated from 1 m wide transects. Such a high level of detail was not necessary in our study. Therefore these values were averaged per pedon.

Obu et al. (2017)Obu et al. (2016)Couture et al. (2018a)(Couture et al.2018a)Ramage et al. (2019)Siewert et al. (2021a)Siewert et al. (2021b)Wagner et al. (2023)Wagner et al. (2024)

Table 1Overview of the datasets, their sources, aim of the study and availability of SOC and N stock data.

Download Print Version | Download XLSX

In addition to published data, new data from DTLBs that were sampled during a field campaign in April 2019 was added to this study. The DTLB data covers drained lake basins between Herschel Island and Kay Point at different distances inland from the coast (0.6–14.0 km) and on various elevations (3–65 m), with one central coring location per basin. We cored 24 frozen sediment cores using a SIPRE permafrost corer during this campaign. We aimed to core at least the top metre of sediment including active layer and permafrost and reached 88–200 cm sediment depth, except for core YC19-DTLB-17 (88 cm) and YC19-DTLB-20 (95 cm). All cores were kept frozen until subsampling and analysis. We described core stratigraphies and subsampled all cores at 5 cm resolution or according to stratigraphic boundaries. The stratum of the drainage event was sampled in 2 cm resolution. We weighed the frozen soil samples and measured their dimensions to calculate sample volume. We then calculated water contents as the difference before and after freeze drying, and calculated dry bulk densities as dry sample weight divided by wet sample volume. We analysed freeze dried ground samples for total organic carbon (TOC) content using an Elementar Soli TOC® cube, and for total nitrogen (TN) content using an Elementar Rapid Max N exceed analyser. All analyses were conducted at laboratories of the Alfred Wegener Institute of Polar and Marine Research in Potsdam, Germany.

The predictor variables used in the study were taken from published datasets at a spatial resolution of 10–20 m (Table 2). All raster datasets were resampled to 10 m and reprojected to the UTM coordinate system (EPSG code 32607). The landcover data (Bartsch et al.2019b) was converted into binary variables and the dominating classes in the area were selected. Classes with fewer than 10 sampling points were excluded; these contained between 1 and 7 points, so the cutoff did not omit any reasonably represented classes. The included classes were “dry to moist tundra, partially barren, prostrate shrubs” (LC_class6), “dry tundra, abundant lichen, prostrate shrubs” (LC_class7), “dry to moist tundra, prostrate to low shrubs” (LC_class9), “moist tundra, abundant moss, prostrate to low shrubs” (LC_class10), “moist tundra, abundant moss, dwarf and low shrubs” (LC_class11) and “moist tundra, low shrubs” (LC_class14), . From the surficial geology (Rampton1982) classes with at least 10 sampling lications were used, the two classes excluded contained only two or three sampling locations. The included classes were Ff (Alluvial fan), Gp (Outwash plain or valley train), L (Lacustrine plain), Mm (Rolling moraine) and Mr (Ice-thrust moraine ridge).

(Porter et al.2018)Bartsch et al. (2024b)Bartsch et al. (2024a)Bartsch et al. (2024a)Bartsch et al. (2024a)Bartsch et al. (2019a)Bartsch et al. (2024a)Widhalm et al. (2019)The Government of Yukon (2025)Rampton (1982)Ehlers et al. (2011)

Table 2Overview of the predictor variables, their sources, spatial resolution, acronym used, and brief description.

Download Print Version | Download XLSX

2.3 Random forest mapping, area of applicability and uncertainty

This study focuses on creating spatial predictions individually for the mainland and Herschel Island (Fig. 1), creating a separate model that combines all data and comparing the results of the models. All analysis were carried out in R studio using R version 4.1.2 (R Development Core Team2021). The random forest algorithm (Liaw and Wiener2002) implemented in the package caret (Kuhn2008, 2022) was used for mapping SOC and N stocks for the whole area as well as with separate models for Herschel Island and mainland the coastal area. Random forest is a decision-tree based algorithm that builds a large amount of trees based on bootstrap samples of the training data and aggregating their results. This process aims to reduce the prediction error and is also called “bagging” (bootstrap and aggregating). The number of trees (ntree), the amount of covariates selected at each split (mtry) and the minimum amount of training data to continue splitting the tree (nodesize) can be manually defined (Breiman1996, 2001).

The “boruta” algorithm (Kursa and Rudnicki2010) was used for feature selection prior to the training of the random forest models. The algorithm was run with random forest, a maximum repetition of 2500 and a maximum of 1500 trees. The Boruta algorithm serves as a wrapper algorithm and assesses the significance of predictors by contrasting their importance with their corresponding shadows, discarding those with notably lowered importance. These shadows mirror the initial variables, featuring randomly shuffled values while maintaining the original distribution and are regenerated in each iteration. The algorithm stops either as soon as reaching the maximum of specified runs or when only the attributes considered important remain (Kursa and Rudnicki2010). Separate random forest models were trained for the depth 0–30 and 30–100 cm for the whole study area including all data. Additional models were trained for Herschel Island and the mainland using the respective data. The reason for creating separate models and predictions was due to Herschel Island being an ice thrust moraine (Rampton1982) and therefore geologically very different from the coastal mainland area. The mtry was chosen by grid search where RMSE was the lowest. Mtry is the amount of variables chosen at each split (Liaw and Wiener2002). The internal validation was set to 10-fold crossvalidation repeated 20 times for all models, except two where leave-one-out crossvalidation was chosen due to a low amount of datapoints. Ntree was set to 500 and nodesize was kept at its default value of 5. We provide the internal validation results: mtry at which the best performing model was selected according to the lowest RMSE, R2 (fraction of %), mean absolute error (MAE), the number of predictors (n pred), the number of samples (n samp) and StDev. We also calculated Lin’s concordance correlation coefficient (CCC) (Lin1989) using the package DescTools (Signorell2025) because, unlike R2 it evaluates both correlation and agreement by accounting for systematic bias, measuring not only how tightly predicted and observed values cluster but also how closely they fall along the 1:1 line of perfect agreement. Additionally, we display for simplicity the first three important variables (short: var) for each model, var1, var2 and var3 (Table 4). From the whole area models we exported predicted values during cross validation and used these to calculate RMSE, R2 and Lin's ccc per area (mainland and Herschel Island).

To further evaluate the prediction, the “area of applicability” – method (Meyer and Pebesma2021) implemented in the package CAST version 0.9.0 (Meyer et al.2024) was applied to each model. The area of applicability (AOA) estimates the area where the application of the predicted model is likely accurate and lies within the prediction error (RMSE). The AOA is based on the dissimilarity index (DI) which is calculated based on the minimum distance to the training data within the multidimensional predictor space, where predictors are weighted with respect to their importance in the model prior to the distance calculation. Then a threshold to the DI is applied which is the maximum DI of the training data obtained via crossvalidation resulting in a binary raster layer with 1 representing the AOA and 0 displaying areas outside the AOA. The complete workflow, including all equations, is documented in the Paper by Meyer and Pebesma (2021). In addition to the AOA we used quantile regression forest as described by Yigini et al. (2018) and also applied by Wagner et al. (2023) to estimate the uncertainty which includes the sensitivity of the model to available data and the uncertainty of the model. We applied this workflow to all models.

3 Results

3.1 Data complilation of regional carbon and nitrogen stocks

The individual studies included in this regional study showed a large variety in their distribution of SOC stocks (Fig. 2). Stocks were generally lower on Herschel Island (Fig. 2) compared to the mainland, mirroring geological differences. All the studies carried out solely on Herschel Island showed lower SOC stock values, with the study by Ramage et al. (2019) having the lowest values in the depth increments 0–30 and 30–100 cm with averages of 10.2 and 16.1 kg m−2 respectively. The study by Siewert et al. (2021a) showed the highest average values with 16.0 and 29.6 kg m−2 and the study by Obu et al. (2017) was in between with average values of 11.9 and 26.8 kg m−2. The studies carried out on the mainland showed average values of 17.6 and 31.2 kg m−2 (DTLB dataset) and 21.5 and 42.6 kg m−2 (Wagner et al.2023). The study by Couture et al. (2018b) used sites on Herschel Island as well as coastal sites on the mainland, however average values were similar when separating the values for Herschel Island and the mainland (Table 3).

https://soil.copernicus.org/articles/12/113/2026/soil-12-113-2026-f02

Figure 2Soil organic carbon (SOC) and nitrogen (N) stocks per depth interval separated into mainland and Herschel Island (A) and per study area (B).

Download

Table 3Mean, median, standard deviation (StDev) and number of datapoints per study used in this study. (The Dataset by Couture et al. (2018) consists of sites from Herschel Island and the mainland and the values are therefore presented for the whole dataset and split per site). NA = not available.

Download Print Version | Download XLSX

Table 4Results of the random forest models. The table shows the internal validation results: mtry at which the best performing model was selected according to the lowest RMSE. R2 (fraction of %), Lin's CCC, mean absolute error (MAE), the number of predictors (n pred), the number of samples (n samp) and StDev are displayed together with the first three important variables for each model (Var1, Var2 and Var3). Additionally the predictions during crossvalidation for the whole area models were saved and RMSE, R2 and Lin's CCC were calculated for the areas mainland and Herschel Island. The table continues on the next page.

Download Print Version | Download XLSX

In contrast, N stocks showed a different pattern with higher N stocks at the sites of the studies by Obu et al. (2017) and Wagner et al. (2023), especially in the depth 30–100 cm compared to the datasets by Ramage et al. (2019) and the DTLB dataset (Fig. 2 and Table 3). After dividing SOC and N values between “mainland” and “Herschel Island”, average SOC stock values were 13.0 kg m−2 in depth 0–30 cm and 23.1 kg m−2 at 30–100 cm for Herschel Island and 20.6, while average SOC stock values for the mainland were 38.9 kg m−2. The N stocks on Herschel Island were 0.8 and 1.59 and 1.16 and 2.39 kg m−2 on the mainland (Fig. 2). The distribution further shows that mean and median values are close (Fig. 2).

3.2 Random Forest mapping and model validation assessment

The models for the whole area have R2 values of 0.38 and 0.26 for the SOC stocks and 0.21 and 0.11 for the N stocks (Table 4). When separating the accuracy measures from the whole area models into the data for the mainland and Herschel Island R2 values are lower with for example 0.16 and 0.12 for the SOC stocks at 0–30 cm depth and 0.008 and 0.14 for 30–100 cm depth. The separate models for the mainland had R2 values of 0.24 and 0.10 for the SOC stocks at 0–30 and 30–100 cm. The models for the SOC stocks at Herschel Island had an R2 value of 0.37 for 0–30 cm and 0.35 for 30–100 cm. The RMSE values of most models were below the standard deviation (StDev) of the distribution of the data values. Most of Lin's ccc values are higher than the R2 of the respective models (Table 4). The models for SOC stocks often performed better regarding R2 and Lin's ccc values. Comparing upper soil 0–30 cm vs. lower soil 30–100 cm, the models for the upper soil performed better (Table 4). The highest values for SOC and N stocks are predicted in the west of the study area and areas close to the coast with lower values with increasing distance from the coast (Fig. 3). The average SOC stocks (Table 5) for the models using all data for the whole area were 17.2 kg m−2 at 0–30 cm and 31.5 kg m−2 at 30–100 cm depth. The N stocks were 0.93 and 2.11 kg m−2 respectively. When both SOC and N stocks, were modelled separately for Herschel Island and the mainland, higher SOC and N stocks were modelled on the mainland and lower SOC and N stocks on Herschel Island. SOC stocks at 0–30 cm depth were 18.9 kg m−2 for the mainland and 12.8 kg m−2 for Herschel Island, when both areas were modelled separately, in contrast to using the all data for the whole area, which resulted in an average of 17.2 kg m−2 (Table 5).

https://soil.copernicus.org/articles/12/113/2026/soil-12-113-2026-f03

Figure 3Prediction of soil organic carbon (SOC) and nitrogen (N) stocks for the whole study area using all data (Basemap: ESRI (2024b) | Powered by Esri).

Table 5Comparison of average soil organic carbon (SOC) and nitrogen (N) stocks for each model prediction depicting the whole area and additionally the AOA of each respective prediction and predictive model (For more descriptive statistic values see Table S2). The average SOC and N stock values from the training points is added to the table for context.

Download Print Version | Download XLSX

The x and y coordinates and the elevation were among the most important predictor variables across the majority of the models (Table 4, Fig. 4). While Fig. 4 shows examples of two variable importance plots, the variable importance for the remaining models is shown by Figs. S1 and S2 in the Supplement. The most important variables for the whole area models are mainly the geographic location represented by the x and y coordinates and the geology. For the mainland models the geographic location, elevation, geology and the LGM extent were among the most important variables. In contrast, for the Herschel Island models the geographic location was important in the depth 0–30 cm but not on the depth 30–100 cm. Here variables such as vegetation height, NDVI and radar backscatter were important.

https://soil.copernicus.org/articles/12/113/2026/soil-12-113-2026-f04

Figure 4Variable importance for the models predicting soil organic carbon (SOC) for the whole area in depth 0–30 and 30–100 cm. For the variable importance for all models see supplement. (variables: DEM = digital elevation model, NDVI = Normalised Difference Vegetation Index, NDWI = Normalised Difference Water Index, NDBI = Normalised Difference Built-up Index, vegetation height, LC_class9 = dry to moist tundra, prostrate to low shrubs, LC_class10 = moist tundra, abundant moss, prostrate to low shrubs, LC_class11 = moist tundra, abundant moss, dwarf and low shrubs, S1 C VV backscatter = Normalised C-VV winter backscatter from Sentinel-1, S1 C HH backscatter = Normalised C-HH winter backscatter from Sentinel-1), ice-thrust moraine ridge, Lacustrine plain, Alluvial fan, and Outwash plain or valley train = Surficial Geology classes, LGM = ice sheet extent at the last glacial maximum.

Download

3.3 Area of Applicability (AOA) and uncertainty with quantile regression forest

The analysis of the DI and AOA showed that areas where models were not reliable are mostly located around streams and lakes, as well as higher elevation areas further away from the coast (Figs. 5 and 6). The analysis also shows that large areas where no samples are available, had a high DI, limiting the applicability of the model. Generally, areas that are too different from the sampled areas had a high DI and are outside the AOA. However, sites with similar sample-predictor relations to the sampled sites, but further away, were still suitable and within the AOA (Fig. 6). The general patterns show that the applicable area was defined within close vicinity to the sampling locations (Fig. 6). In models using all data and the models for the mainland, predicted SOC and N stock values inside the AOA were generally higher (for example 21.1 ± 4.1 kg C m−2 inside the AOA compared to 18.9 ± 2.8 kg C m−2 for the mainland model at 0–30 cm depth; Table 5). For Herschel Island there was no difference between AOA and whole area for the N stock models and and slight differences for the SOC stock models (Table 5).

The results of the quantile regression forest uncertainty (Figs. S5, S6 and S7) showed areas around lakes and streams and areas further away from the sampling locations as having high uncertainty, similar to the AOA analyses. At Herschel Island (Fig. S7) the highest uncertainty is towards the coast, in depressions and near streams. The area on Herschel Island, that has been identified as applicable by the AOA method, has no visible higher uncertainty measured by the quantile regression forest method than the inapplicable area.

https://soil.copernicus.org/articles/12/113/2026/soil-12-113-2026-f05

Figure 5Area of applicability (AOA) and dissimilarity index (DI) for the predictive model of SOC stocks for the whole area. For DI thresholds that determine AOA see Table S1 (Basemap: ESRI (2024b) | Powered by Esri).

https://soil.copernicus.org/articles/12/113/2026/soil-12-113-2026-f06

Figure 6Zoom-in to an area (A, black square) with SOC stock distribution for depth 0–30 cm (B) the respective area of applicability (C) and dissimilarity index (DI). The datapoints are shown with cross symbols (Basemap in A: ESRI (2024a) | Powered by Esri).

4 Discussion

This study provides the most comprehensive compilation of SOC and N data available for the region to date. This dataset serves as a benchmark for future studies, while also highlighting the substantial uncertainties that remain due to limitations in spatial coverage and data distribution.

4.1 Regional soil carbon and nitrogen data reveal heterogeneity with implications for modeling

Our study shows that already at a regional level there is a very high heterogeneity in field data, challenging the predictive ability of models. The full range of the available data in regional, but also in pan-Arctic synthesis must be representative for the entire study area. Data collection and sampling strategies from local studies are often tailored to specific research questions. As a result, some landscape features may be overrepresented while others are underrepresented in the available data, potentially limiting the ability to capture spatial variability in regional analyses. To address this, it is important to evaluate the distribution of the target variable at the sampling locations in the context of the broader landscape. The Area of Applicability (AoA) method provides a way to do this by comparing the feature space of the covariates at sampled sites to the feature space across the entire study area. Areas where the covariate space is not covered by existing samples can be excluded from regional predictions and can also inform the planning of future sampling campaigns to better capture landscape heterogeneity. Since the AoA method evaluates only the feature space of the covariates and does not provide uncertainty estimates, complementary approaches, such as quantile regression forests, are essential for quantifying uncertainty in random forest predictions.

The data shows that SOC and N stocks are lower at Herschel Island than at the mainland (Fig. 2). This is likely due to Herschel Island being an ice thrust moraine (Rampton1982) and therefore being geologically different to the mainland. When looking at the studies individually, large differences occur even when comparing the studies only carried out on Herschel Island (Fig. 2). This is attributed to the different aims of the studies and site selection. While Obu et al. (2017) covered representative sites for different ecological units on Herschel Island, Ramage et al. (2019) sampled hillslopes in three valleys which show significant levels of erosion. Erosive relocation of organic material which explains why SOC and N stocks are lower compared to the other studies. Siewert et al. (2021a) sampled representative sites in three typical ecological units on Herschel Island: hummocky tussock tundra, non-sorted circle tundra and IWP tundra. The selective sampling of specific representative areas was to study multi-scale variation of SOC distribution in specific permafrost landforms. The sites did not undergo significant erosion/ degradation and therefore SOC stocks are higher than at the sites of Ramage et al. (2019) and slightly higher than Obu et al. (2017). The latter covers also moderately and strongly disturbed sites which show lower amounts of SOC are found in the upper soil (0–30 cm) as well as in the whole profile of 1 m. Wagner et al. (2023) studied two typical small coastal catchments with dense sampling density. Particularly the sampling sites at Ptarmigan Bay were located in areas of high organic layer thickness (Rampton1982). The DTLB samples were exclusively from organic rich deposits (peat above lake sediment).

4.2 Modeling and mapping SOC and N stocks: model and prediction results in the context of their limitations and methodological challenges

Our analysis highlights the challenge of scaling from local (pedon) to regional study areas. At the pedon scale, permafrost soils are highly heterogeneous (Siewert et al.2021a) and this variability is still evident at coarser spatial scales. The models for the entire area explain only a small fraction of spatial heterogeneity in SOC and N stocks (R2 for SOC stock 0–30 cm: 0.38, 30–100 cm: 0.26; R2 for N stock 0–30 cm: 0.21, 30–100 cm: 0.11, Table 4). Partitioned per study area the R2 values show that the models perform worse than the respective individually trained models for the mainland and Herschel Island. For example the SOC stocks for 0–30 cm depth have R2 values of 0.16 and 0.12 for the mainland and Herschel Island respectively while the individual models have 0.24 and 0.37. This indicates that individually trained models might handle the local variability better. The models for Herschel Island show higher R2 values for SOC (0.37 and 0.35), which could be due to a combination of factors: (1) lower natural soil variability on Herschel Island, (2) better representation of the soil data, or (3) stronger relationships between the predictors and SOC at this site. For all models Lin's ccc values were higher than R2. This indicates that agreement between predicted and observed values is stronger than indicated by correlation alone, despite the limited model performance. We therefore recommend that users of these maps carefully consider which dataset to use: the full-area predictions provide a consistent regional overview, whereas the individually trained models may better capture local variability and could be more appropriate for studies focused on specific subregions.

For Herschel Island, there was a higher data density (0.8 samples per km2; n=78 for SOC, area approximately 108 km2) than on the mainland (0.02 samples per km2; n=107 for SOC stocks, area approximately 4465 km2). This has likely affected the accuracy of the models and may explain why mainland models have a lower R2 values. It is also possible that the exact data density is less important than the spread and representation of the data in relation to natural variability between landscape types. Several of the studies on Herschel Island included representative sampling of multiple ecological units on the island, ensuring a range of landscape types are well characterised by the data. Models for SOC stocks also outperform those for N stocks, likely reflecting the smaller number of sites with N data available, as several locations only report SOC measurements.

The strong local heterogeneity in SOC and N stocks makes model predictions highly sensitive to the spatial origin of the training data. Including or excluding data from geologically distinct subregions, such as the geologically different Herschel Island compared to the mainland, can substantially alter predicted patterns and underscores the uncertainty of extrapolating beyond the training domain. Figure 7 displays the SOC stock distribution for the mainland area when using only training data from the mainland (Fig. 7(1a) and Fig. 1(1b)), in contrast to training a model that also includes data from Herschel Island (Fig. 7(2a) and (2b)). When using all data, including Herschel Island, SOC stocks are lower, with averages of 17.0 and 31.6 kg C m−2 for 0–30 and 30–100 cm depth, respectively. This is in large contrast to using only data derived from the mainland where SOC stocks are considerably higher, with 18.9 and 33.7 kg C m−2. An opposing pattern is displayed in Fig. 8, where using all data results in higher SOC stocks at Herschel Island in comparison to using Herschel Island data only. The map with the largest SOC stocks is produced when interpolating from mainland data to Herschel Island (Fig. 8(2a) and (2b)).

https://soil.copernicus.org/articles/12/113/2026/soil-12-113-2026-f07

Figure 7Comparison of soil organic carbon (SOC) stock for the coastal mainland region using only the mainland data (1a, 1b) and using all data (2a, 2b) (Basemap: ESRI (2024b) | Powered by Esri).

https://soil.copernicus.org/articles/12/113/2026/soil-12-113-2026-f08

Figure 8Comparison of soil organic carbon (SOC) stocks at Herschel Island using only datapoints on Herschel Island (1a, 1b), predicting (extrapolating) SOC only based on data from the mainland (2a, 2b) and using da for the entire area (3a, 3b) (Basemap: ESRI (2024b) | Powered by Esri).

This analysis of the sensitivity of model predictions to the spatial origin of training data highlights that careful data selection is crucial for studies at a regional scale. Data representation and potential site biases are also likely to be important for pan-Arctic SOC and N assessments. Mishra et al. (2021) carried out spatial predictions for three regions separately: the North American, the Eurasian and the Tibetan Permafrost region. Our study shows that differences in SOC stocks are significant even when comparing adjacent local scale areas. As the exclusion or inclusion of certain data has the potential to significantly influence SOC and N stock predictions, a more detailed differentiation, e.g. through geological units or terrain characteristics, would be recommended. Herschel Island in comparison to the mainland, especially the area in close proximity, is particularly interesting, since soil forming factors such as landscape age, climate and vegetation are very similar, however the parent material differs. The area at Ptarmigan Bay consists mostly of glaciofluvial and rolling moraine deposits (Rampton1982). In contrast Herschel Island consists of massive ground ice later buried by marine sediments (Burn2012). These differences in parent material offer different conditions for soil development as their texture and geochemical composition differ. Studies show that carbon enrichment through cryoturbation differs with grain size leading to higher carbon enrichment in higher coarse silt to very fine sand fractions (Palmtag and Kuhry2018).

An assessment of uncertainty and model representation is crucial in digital soil mapping. Hugelius et al. (2014) used upscaling of SOC stocks based on thematic maps representing the physiography and provide an uncertainty assessment in their study that includes a “representation error” in the regions with lower data density. Similar considerations should be incorporated in pan-Arctic studies that use DSM approaches such as machine learning models. The data used in synthesis studies might be highly clustered and can show a high variability on regional and local level. This phenomenon is discussed by Meyer and Pebesma (2022), mentioning that geographic clusters of the data do not necessarily mean gaps in the feature space, i.e. the sample – predictor relationships may still represent the entire area.

To assess whether the presented models of our study are applicable for the total study area, we used the AOA method (Meyer and Pebesma2021). It is expected that the dry areas of riverbeds and areas around lakes on the mainland are excluded after applying the AOA, because these areas do not contain any sampling sites. Areas with larger distance from the coast were not extensively sampled and therefore DI rises with distance from the coast. The average SOC and N stocks were slightly higher compared to the initial prediction when excluding not applicable areas. This shows that the predictions outside the AOA are at the lower end of the spectrum of predicted values (Table 5). We believe that values that are not represented by AOA are more likely to be underpredicted than overpredicted when using random forest for our study area.

The AOA method focuses on the relationship between covariates and data. We also applied the quantile regression forest to the mainland and Herschel Island models for SOC stocks to assess uncertainty. This method estimates uncertainty from two sources: the model itself and variations in the data. Data uncertainty is assessed by repeatedly applying a random forest model to different datasets, while model uncertainty is calculated using quantile regression forests, and both are combined into a final uncertainty map (Yigini et al.2018). For all models there is no clear trend whether a high uncertainty could correlate with a high dissimilarity index (Fig. S8). The quantile regression forest method focuses on the uncertainty on a pixel basis. Multiple model runs are used to assess the sensitivity of the model to available data and quantile regression forest is used to calculate a probability distribution for the values each pixel can take on. In contrast, the AOA method assesses the applicability based on a certain model and the importance of the variables. To best assess the accuracy of the predictions, we recommend using not exclusively one of the two methods.

The selection of the covariates also significantly impacts the prediction power of models. Many studies in digital soil mapping in general (e.g. Hengl et al.2017) but also pan-Arctic studies (Mishra et al.2021) include parameters derived from digital elevation models such as slope, curvature, aspect, flow direction, flow accumulation, etc. Coastal lowland tundra areas generally have very flat terrain and are close to sea level, which are critical aspects when considering the use of topographic derivates. Parameters such as flow direction and flow accumulation may not represent the complexity of flow patterns in IWP terrain (e.g. Speetjens et al.2024). Most models for the whole area and the mainland consider the geographic location, the geology and elevation among the most important variables, reflecting broad-scale spatial patterns, the influence of substrate and soil-forming materials on carbon and nitrogen content, and gradients in temperature, moisture, and drainage that affect soil properties. On Herschel Island, local vegetation characteristics and surface properties such as vegetation height, NDVI and radar backscatter are important predictors of SOC and N stocks, likely reflecting the relatively uniform geology of the island and the greater influence of fine-scale surface indicators compared to broad-scale spatial variables.

A major limitation of our study is the low explanatory power of the models as indicated by the low R2 and Lin's ccc values explaning only up to half of the spatial variability. in SOC and N stocks. This suggest that to achieve high predictive power across this region, systematic additional soil sampling in addition to the compilation of local studies would be needed. A dataset that represents all landscape types, and predictors that co-vary with soil variability across space are needed. Lastly, a quantification of errors in the dataset spanning from differences of point in time of sampling, sampling errors, differences between devices measuring SOC and N were not quantified and outside the scope of this study.

4.3 Comparing local scale results to regional scale synthesis

By integrating all available SOC and N data for the region, this study establishes a benchmark dataset, representing the most comprehensive regional-scale assessment currently possible. Substantial uncertainties remain, stemming from the uneven spatial distribution of sampling points and the inherent challenges of collecting soil data in remote permafrost landscapes. Previous studies in the region often rely on class-based averages with limited region-specific observations, or, in the case of digital soil mapping, coarser-resolution datasets. Here we present our findings in the context of existing local and regional data, emphasizing that our compilation, which incorporates both published and previously unpublished observations, provides an enhanced regional perspective while acknowledging its limitations.

The two mainland sites Ptarmigan Bay and Komakuk Beach were studied on a high spatial resolution (2 m) (Wagner et al.2023) and SOC stocks were 40 and 66.6 kg C m−2 respectively. The results of our study show 63.7 and 59.7 kg C m−2 respectively with the entire area models and 64.3 and 62.1 kg C m−2 with the mainland only models. For Ptarmigan Bay, (Hugelius et al.2013) suggest 92.8 kg C m−2 and (Mishra and Gautam2022) 63.4 kg C m−2. At Komakuk Beach, (Hugelius et al.2013) estimated 18.1 kg C m−2. Furthermore, this study shows that there are high local differences between different geographic areas when comparing different individual high resolution studies. In contrast, the results of the regional mapping lead to similar results for Komakuk Beach and Ptarmigan Bay. However, the SOC stocks for Komakuk Beach using the model with all data is lower than using the mainland only model. This could be partially explained by the inclusion of data from Herschel Island. Our study included more samples than Obu et al. (2017) and our methods differ (machine learning in contrast to class matching), leading to different average SOC stocks when comparing both studies. Obu et al. (2017) show 34.8 kg C m−2 for the upper metre of soil while our results show 38.0 kg C m−2 with the Herschel Island models (Table S3). Large scale studies seemed to overpredict SOC stocks at Herschel Island. The estimated SOC stocks in the upper metre are according to Hugelius et al. (2013) 55.3 kg C m−2 at Herschel Island and 58.6 kg C m−2 according to Mishra and Gautam (2022). For the mainland, Hugelius et al. (2013) provide an average of 64.5 kg C m−2 and Mishra and Gautam (2022) 52.8 kg C m−2, though the latter does not account for the whole mainland area due to data gaps. Our study shows 52.5 kg C m−2 for the mainland, which lies in between the studies mentioned above (Table S3).

Our study provides the most up-to-date and comprehensive assessment of soil C and N stocks for the region. Our results highlight that geological and landscape differences within the region play a key role in SOC and N distribution, emphasizing the need to consider these factors in regional-scale analyses and future sampling strategies. The dataset by Mishra and Gautam (2022) has a data gap at Komakuk Beach, west of Komakuk Beach and for a large portion of the area where DTLBs were sampled. This shows that the coastal area lacks accurate maps of SOC stocks. Hence, our study contributes in complementing existing studies and provides moderate resolution datasets at regional scale. Furthermore, the fact that local SOC stocks can vary widely between studies needs to be taken into account in future carbon budget studies.

5 Conclusion

Our study combines existing soil data from different studies that have been carried out at the Yukon coastal plain, providing an estimation of regional SOC and N stocks at 10 m spatial resolution for the top metre of soil. Furthermore, our study complements pan-Arctic estimates, especially as coastal areas are not well represented and there are large data gaps in pan-Arctic datasets. The average SOC stock for the upper metre for the entire area is 48.7 ± 6.6 kg m−2 and the N stock 3.03 ± 0.30 kg m−2. Separate models provide for Herschel Island the values of 38.0 ± 5.6 kg C m−2 and 2.87 ± 0.34 kg N m−2 respectively, at the mainland 52.2 ± 6.3 kg C m−2 and 3.15 ± 0.32 kg N m−2. Users should select the model according to the intended scale: the whole area model for regional assessments, and the separate models for site-specific applications.

Our study leads to several recommendations. First, we want to highlight that the usage/exclusion of data might lead to very different model results, especially when extrapolating to areas with differences in geological genesis or soil parent material composition. We therefore recommend that pan-Arctic studies not only subdivide data into geographic regions, but also following landscape history. Second, we recommend the usage of the AOA to assess whether the relation between covariates and target variable is represented across the whole study area, as this approach can also help to determine new sampling sites. This requires additional spatial regional datasets that can be used as covariates that represent the soil forming factors. Third, we recommend using quantile regression forest to estimate the model uncertainty. Overall, we conclude that multi-scale studies are needed for better predictions of global soil C budgets in the wake of climate change, because they can provide a potent bridge between high resolution local studies and moderate resolution pan-Arctic studies. This paper highlights challenges in combining data from existing studies for DSM applications, which should be considered in future research. The use of existing datasets can substantially reduce costs, yet challenges due to e.g. spatial clustering of the data remain. Our findings can inform the design of future sampling campaigns in remote permafrost regions, aiming to maximize efficiency while minimizing expenses.

Data availability

The datasets and soil data will be made publicly available in the Bolin Centre Database: https://bolin.su.se/data/ (last access: 2 February 2026).

Supplement

The supplement related to this article is available online at https://doi.org/10.5194/soil-12-113-2026-supplement.

Author contributions

JW is the lead author and conceptualised the study, did the analysis and wrote a first draft. JWo provided new datapoints that were not previously published and contributed text to the manuscript. AB provided remote sensing datasets and JR contributed soil data from a previous study. All co-authors provided input to writing of the manuscript.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Acknowledgements

The authors would like to thank George Tanski for help with coring during the Yukon Coast 2019 spring field campaign and Justin Lindemann for assistance in AWI laboratories. We thank the Yukon Territorial Government, the Yukon Parks office (Herschel Island Qikiqtaruk Territorial Park), Parks Canada office (Ivvavik National Park) and the Aurora Research Institute and Aurora College (ARI) in Inuvik, NT, for administrative and logistical support for our fieldwork.

Financial support

This study is part of the Nunataryuk project, which received funding under the European Union's Horizon 2020 Research and Innovation Programme under grant agreement no. 773421. JWo was funded by German Research Foundation (DFG) (grant no. DFG WO 2420/2-1). JR received additional funding from the Swedish Academy of Science (Formas) (grant no. FR‐2021/0004). GH received support from the European Union's Horizon Europe Research and Innovation Programme to the ILLUQ project (grant no. 773421) and the Swedish Research Council VR (grant no. 2022-04839).

The publication of this article was funded by the Swedish Research Council, Forte, Formas, and Vinnova.

Review statement

This paper was edited by Bas van Wesemael and reviewed by two anonymous referees.

References

Bartsch, A., Widhalm, B., and Pointner, G.: Vegetation height derived from Sentinel-1 and Sentinel-2 satellite data (2015–2018) for tundra regions, PANGAEA [data set], https://doi.org/10.1594/PANGAEA.897045, 2019a. a, b

Bartsch, A., Widhalm, B., Pointner, G., Ermokhina, K., Leibman, M., and Heim, B.: Landcover derived from Sentinel-1 and Sentinel-2 satellite data (2015–2018) for Subarctic and Arctic environments, PANGAEA [data set], https://doi.org/10.1594/PANGAEA.897916, 2019b. a

Bartsch, A., Efimova, A., Widhalm, B., Muri, X., von Baeckmann, C., Bergstedt, H., Ermokhina, K., Hugelius, G., Heim, B., and Leibman, M.: Circumarctic land cover diversity considering wetness gradients, Hydrology and Earth System Sciences, 28, 2421–2481, https://doi.org/10.5194/hess-28-2421-2024, 2024a. a, b, c, d

Bartsch, A., Khairullin, R., Efimova, A., Widhalm, B., Muri, X., von Baeckmann, C., Bergstedt, H., Ermokhina, K., Hugelius, G., Heim, B., Leibman, M., and Gruber, C.: Circumarctic Landcover Units, Zenodo [data set], https://doi.org/10.5281/zenodo.14235736, 2024b. a

Biskaborn, B. K., Smith, S. L., Noetzli, J., Matthes, H., Vieira, G., Streletskiy, D. A., Schoeneich, P., Romanovsky, V. E., Lewkowicz, A. G., Abramov, A., Allard, M., Boike, J., Cable, W. L., Christiansen, H. H., Delaloye, R., Diekmann, B., Drozdov, D., Etzelmüller, B., Grosse, G., Guglielmin, M., Ingeman-Nielsen, T., Isaksen, K., Ishikawa, M., Johansson, M., Johannsson, H., Joo, A., Kaverin, D., Kholodov, A., Konstantinov, P., Kröger, T., Lambiel, C., Lanckman, J. P., Luo, D., Malkova, G., Meiklejohn, I., Moskalenko, N., Oliva, M., Phillips, M., Ramos, M., Sannel, A. B. K., Sergeev, D., Seybold, C., Skryabin, P., Vasiliev, A., Wu, Q., Yoshikawa, K., Zheleznyak, M., and Lantuit, H.: Permafrost is warming at a global scale, Nature Communications, 10, 1–11, https://doi.org/10.1038/s41467-018-08240-4, 2019. a

Breiman, L.: Bagging predictors, Machine Learning, 24, 123–140, https://doi.org/10.1007/bf00058655, 1996. a

Breiman, L.: Random Forests, Machine Learning, 45, 5–32, https://doi.org/10.1023/A:1010933404324, 2001. a

Burn, C. R.: Herschel Island: Qikiqtaryuk: A Natural & Cultural History of Yukon's Arctic Island, Calgary, AB: University of Calgary Press, Calgary, Alberta, 1st edn., ISBN-13: 978-0988000919, 2012. a

Couture, N., Irrgang, A. M., Pollard, W. H., Lantuit, H., and Fritz, M.: Soil organic carbon (SOC) and sediment contents and fluxes for terrain units along the Yukon Coastal Plain, Canada, including corrections for volumetric ground ice contents, PANGAEA [data set], https://doi.org/10.1594/PANGAEA.884689, 2018a. a, b, c

Couture, N. J. and Pollard, W. H.: A Model for Quantifying Ground‐Ice Volume, Yukon Coast, Western Arctic Canada, Permafrost and Periglacial Processes, 28, 534–542, https://doi.org/10.1002/ppp.1952, 2017. a

Couture, N. J., Irrgang, A., Pollard, W., Lantuit, H., and Fritz, M.: Coastal Erosion of Permafrost Soils Along the Yukon Coastal Plain and Fluxes of Organic Carbon to the Canadian Beaufort Sea, Journal of Geophysical Research: Biogeosciences, https://doi.org/10.1002/2017JG004166, 2018b. a, b, c, d

Ehlers, J., Gibbard, P. L., and Hughes, P. D.: Quaternary Glaciations – Extent and Chronology: a closer look, Elsevier Science, ISBN 9780444535375, 2011. a

ESRI: World terrain base, https://www.arcgis.com/home/item.html?id=33064a20de0c48d2bb61efa8faca93a8 (last access: 4 July 2024), 2024a. a, b

ESRI: World Hillshade, https://www.arcgis.com/home/item.html?id=1b243539f4514b6ba35e7d995890db1d (last access: 5 March 2025), 2024b. a, b, c, d

Fritz, M., Wetterich, S., Meyer, H., Schirrmeister, L., Lantuit, H., and Pollard, W. H.: Origin and characteristics of massive ground ice on Herschel Island (western Canadian Arctic) as revealed by stable water isotope and Hydrochemical signatures, Permafrost and Periglacial Processes, 22, 26–38, https://doi.org/10.1002/ppp.714, 2011. a

Fritz, M., Wetterich, S., Schirrmeister, L., Meyer, H., Lantuit, H., Preusser, F., and Pollard, W. H.: Eastern Beringia and beyond: Late Wisconsinan and Holocene landscape dynamics along the Yukon Coastal Plain, Canada, Palaeogeography, Palaeoclimatology, Palaeoecology, 319–320, 28–45, https://doi.org/10.1016/j.palaeo.2011.12.015, 2012. a

Fritz, M., Wolter, J., Rudaya, N., Palagushkina, O., Nazarova, L., Obu, J., Rethemeyer, J., Lantuit, H., and Wetterich, S.: Holocene ice-wedge polygon development in northern Yukon permafrost peatlands (Canada), Quaternary Science Reviews, 147, 279–297, https://doi.org/10.1016/j.quascirev.2016.02.008, 2016. a

Government of Canada: Canadian Climate Normals and Averages, http://climate.weather.gc.ca/climate_normals/index_e.html (last access: 25 May 2024), 2024. a

Hengl, T., de Jesus, J. M., Heuvelink, G. B. M., Gonzalez, M. R., Kilibarda, M., Blagotić, A., Shangguan, W., Wright, M. N., Geng, X., Bauer-Marschallinger, B., Guevara, M. A., Vargas, R., MacMillan, R. A., Batjes, N. H., Leenaars, J. G. B., Ribeiro, E., Wheeler, I., Mantel, S., and Kempen, B.: SoilGrids250m: Global gridded soil information based on machine learning, PloS one, 12, e0169748, https://doi.org/10.1371/journal.pone.0169748, 2017. a

Hugelius, G., Bockheim, J. G., Camill, P., Elberling, B., Grosse, G., Harden, J. W., Johnson, K., Jorgenson, T., Koven, C. D., Kuhry, P., Michaelson, G., Mishra, U., Palmtag, J., Ping, C.-L., O'Donnell, J., Schirrmeister, L., Schuur, E. A. G., Sheng, Y., Smith, L. C., Strauss, J., and Yu, Z.: A new data set for estimating organic carbon storage to 3 m depth in soils of the northern circumpolar permafrost region, Earth System Science Data, 5, 393–402, https://doi.org/10.5194/essd-5-393-2013, 2013. a, b, c, d

Hugelius, G., Strauss, J., Zubrzycki, S., Harden, J. W., Schuur, E. A. G., Ping, C.-L., Schirrmeister, L., Grosse, G., Michaelson, G. J., Koven, C. D., O'Donnell, J. A., Elberling, B., Mishra, U., Camill, P., Yu, Z., Palmtag, J., and Kuhry, P.: Estimated stocks of circumpolar permafrost carbon with quantified uncertainty ranges and identified data gaps, Biogeosciences, 11, 6573–6593, https://doi.org/10.5194/bg-11-6573-2014, 2014. a, b

Irrgang, A. M., Lantuit, H., Manson, G. K., Günther, F., Grosse, G., and Overduin, P. P.: Variability in Rates of Coastal Change Along the Yukon Coast, 1951 to 2015, Journal of Geophysical Research: Earth Surface, 123, 779–800, https://doi.org/10.1002/2017JF004326, 2018. a, b

Kuhn, M.: Building Predictive Models in R Using the caret Package, Journal of Statistical Software, 28, https://doi.org/10.18637/jss.v028.i05, 2008. a

Kuhn, M.: caret: Classification and Regression Training, R package version 6.0-92, CRAN [code], https://cran.r-project.org/web/packages/caret/ (last access: 2 February 2026), 2022. a

Kursa, M. B. and Rudnicki, W. R.: Feature selection with the boruta package, Journal of Statistical Software, 36, 1–13, https://doi.org/10.18637/jss.v036.i11, 2010. a, b

Lagacherie, P.: Digital soil mapping: A state of the art, Springer Netherlands, 3–14, ISBN 9781402085918, https://doi.org/10.1007/978-1-4020-8592-5_1, 2008. a

Lantuit, H. and Pollard, W. H.: Fifty years of coastal erosion and retrogressive thaw slump activity on Herschel Island, southern Beaufort Sea, Yukon Territory, Canada, Geomorphology, 95, 84–102, https://doi.org/10.1016/j.geomorph.2006.07.040, 2008. a

Lantuit, H., Overduin, P. P., Couture, N., Wetterich, S., Aré, F., Atkinson, D., Brown, J., Cherkashov, G., Drozdov, D., Forbes, L. D., Graves-Gaylord, A., Grigoriev, M., Hubberten, H. W., Jordan, J., Jorgenson, T., Ødegård, R. S., Ogorodov, S., Pollard, W. H., Rachold, V., Sedenko, S., Solomon, S., Steenhuisen, F., Streletskaya, I., and Vasiliev, A.: The Arctic Coastal Dynamics Database: A New Classification Scheme and Statistics on Arctic Permafrost Coastlines, Estuaries and Coasts, 35, 383–400, https://doi.org/10.1007/s12237-010-9362-6, 2012. a

Liaw, A. and Wiener, M.: Classifcation and Regression by randomForest, R News, 2, 18–22, https://cran.r-project.org/doc/Rnews/ (last access: 2 February 2026), 2002. a, b

Lin, L. I.-K.: A Concordance Correlation Coefficient to Evaluate Reproducibility, Biometrics, 45, 255, https://doi.org/10.2307/2532051, 1989. a

McBratney, A. B., Santos, M. L. M., and Minasny, B.: On digital soil mapping, Geoderma, 117, 3–52, https://doi.org/10.1016/S0016-7061(03)00223-4, 2003. a

Meredith, M., Sommerkorn, M., Cassotta, S., Derksen, C., Ekaykin, A., Hollo, A., Kofinas, G., Mackintosh, A., Melbourne-Thomas, J., Muelbert, M. M., Ottersen, G., Pritchard, H., and Schuur, E. A.: Polar Regions, Cambridge University Press, 203–320, https://doi.org/10.1017/9781009157964.005, 2022. a

Meyer, H. and Pebesma, E.: Predicting into unknown space? Estimating the area of applicability of spatial prediction models, Methods in Ecology and Evolution, 12, 1620–1633, https://doi.org/10.1111/2041-210X.13650, 2021. a, b, c, d

Meyer, H. and Pebesma, E.: Machine learning-based global maps of ecological variables and the challenge of assessing them, Nature Communications, 13, 1–4, https://doi.org/10.1038/s41467-022-29838-9, 2022. a

Meyer, H., Ludwig, M., Milà, C., Linnenbrink, J., and Schumacher, F.: The CAST package for training and assessment of spatial prediction models in R, arXiv [preprint], https://doi.org/10.48550/arXiv.2404.06978, 10 April 2024. a

Mishra, U. and Gautam, S.: Spatial heterogeneity and environmental predictors of permafrost region soil organic carbon stocks, DRYAD [data set], https://doi.org/10.7941/D1GD1H, 2022. a, b, c, d

Mishra, U., Hugelius, G., Shelef, E., Yang, Y., Strauss, J., Lupachev, A., Harden, J. W., Jastrow, J. D., Ping, C. L., Riley, W. J., Schuur, E. A. G., Matamala, R., Siewert, M., Nave, L. E., Koven, C. D., Fuchs, M., Palmtag, J., Kuhry, P., Treat, C. C., Zubrzycki, S., Hoffman, F. M., Elberling, B., Camill, P., Veremeeva, A., and Orr, A.: Spatial heterogeneity and environmental predictors of permafrost region soil organic carbon stocks, Science Advances, 7, https://doi.org/10.1126/sciadv.aaz5236, 2021. a, b, c, d, e

Obu, J.: How Much of the Earth's Surface is Underlain by Permafrost?, Journal of Geophysical Research: Earth Surface, 126, 1–5, https://doi.org/10.1029/2021JF006123, 2021. a

Obu, J., Lantuit, H., Myers-Smith, I. H., Heim, B., Wolter, J., and Fritz, M.: Permafrost cores and active layer pits on Herschel Island: link to shape files, PANGAEA [data set], https://doi.org/10.1594/PANGAEA.859661, 2016. a, b

Obu, J., Lantuit, H., Myers-Smith, I., Heim, B., Wolter, J., and Fritz, M.: Effect of Terrain Characteristics on Soil Organic Carbon and Total Nitrogen Stocks in Soils of Herschel Island, Western Canadian Arctic, Permafrost and Periglacial Processes, 28, 92–107, https://doi.org/10.1002/ppp.1881, 2017. a, b, c, d, e, f, g, h, i, j, k

Obu, J., Westermann, S., Bartsch, A., Berdnikov, N., Christiansen, H. H., Dashtseren, A., Delaloye, R., Elberling, B., Etzelmüller, B., Kholodov, A., Khomutov, A., Kääb, A., Leibman, M. O., Lewkowicz, A. G., Panda, S. K., Romanovsky, V., Way, R. G., Westergaard-Nielsen, A., Wu, T., Yamkhin, J., and Zou, D.: Northern Hemisphere permafrost map based on TTOP modelling for 2000–2016 at 1 km2 scale, Earth-Science Reviews, 193, 299–316, https://doi.org/10.1016/j.earscirev.2019.04.023, 2019. a

Pachepsky, Y. and Hill, R. L.: Scale and scaling in soils, Geoderma, 287, 4–30, https://doi.org/10.1016/j.geoderma.2016.08.017, 2017. a

Palmtag, J. and Kuhry, P.: Grain size controls on cryoturbation and soil organic carbon density in permafrost-affected soils, Permafrost and Periglacial Processes, 29, 112–120, https://doi.org/10.1002/ppp.1975, 2018. a

Palmtag, J., Cable, S., Christiansen, H. H., Hugelius, G., and Kuhry, P.: Landform partitioning and estimates of deep storage of soil organic matter in Zackenberg, Greenland, The Cryosphere, 12, 1735–1744, https://doi.org/10.5194/tc-12-1735-2018, 2018. a, b

Palmtag, J., Obu, J., Kuhry, P., Richter, A., Siewert, M. B., Weiss, N., Westermann, S., and Hugelius, G.: A high spatial resolution soil carbon and nitrogen dataset for the northern permafrost region based on circumpolar land cover upscaling, Earth System Science Data, 14, 4095–4110, https://doi.org/10.5194/essd-14-4095-2022, 2022. a

Ping, C., Clark, M. H., Kimble, J. M., Michaelson, G. J., Shur, Y., and Stiles, C. A.: Sampling Protocols for Permafrost‐Affected Soils, Soil Horizons, 54, 13–19, https://doi.org/10.2136/sh12-09-0027, 2013. a

Porter, C., Morin, P., Howat, I., Noh, M., Bates, B., Peterman, K., Keesey, S., Schlenk, M., Gardiner, J., Tomko, K., Willis, M., Kelleher, C., Cloutier, M., Husby, E., Foga, S., Nakamura, H., Platson, M., Wethington, M. J., Williamson, C., Bauer, G., Enos, J., Arnold, G., Kramer, W., Becker, P., Doshi, A., D’Souza, C., Cummens, P., Laurier, F., and Bojesen, M.: ArcticDEM, Harvard Dataverse [data set], https://doi.org/10.7910/DVN/OHHUKH, 2018. a

R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0, http://www.R-project.org (last access: 12 August 2017), 2021. a

Ramage, J. L., Fortier, D., Hugelius, G., Lantuit, H., and Morgenstern, A.: Distribution of carbon and nitrogen along hillslopes in three valleys on Herschel Island, Yukon Territory, Canada, Catena, 178, 132–140, https://doi.org/10.1016/j.catena.2019.02.029, 2019. a, b, c, d, e, f, g

Rampton, V. R.: Surficial geology, Yukon Coastal Plain, Yukon Geological Survey, https://doi.org/10.4095/126954, 1982. a, b, c, d, e, f, g, h, i

Rantanen, M., Karpechko, A. Y., Lipponen, A., Nordling, K., Hyvärinen, O., Ruosteenoja, K., Vihma, T., and Laaksonen, A.: The Arctic has warmed nearly four times faster than the globe since 1979, Communications Earth and Environment, 3, 1–10, https://doi.org/10.1038/s43247-022-00498-3, 2022. a

Schuur, E. A. G., McGuire, A. D., Schädel, C., Grosse, G., Harden, J. W., Hayes, D. J., Hugelius, G., Koven, C. D., Kuhry, P., Lawrence, D. M., Natali, S. M., Olefeldt, D., Romanovsky, V. E., Schaefer, K., Turetsky, M. R., Treat, C. C., and Vonk, J. E.: Climate change and the permafrost carbon feedback, Nature, 520, 171–179, https://doi.org/10.1038/nature14338, 2015. a

Siewert, M. B.: High-resolution digital mapping of soil organic carbon in permafrost terrain using machine learning: a case study in a sub-Arctic peatland environment, Biogeosciences, 15, 1663–1682, https://doi.org/10.5194/bg-15-1663-2018, 2018. a, b

Siewert, M. B., Hanisch, J., Weiss, N., Kuhry, P., Maximov, T. C., and Hugelius, G.: Comparing carbon storage of Siberian tundra and taiga permafrost ecosystems at very high spatial resolution, Journal of Geophysical Research: Biogeosciences, 120, 1973–1994, https://doi.org/10.1002/2015JG002999, 2015. a

Siewert, M. B., Hugelius, G., Heim, B., and Faucherre, S.: Landscape controls and vertical variability of soil organic carbon storage in permafrost-affected soils of the Lena River Delta, Catena, 147, 725–741, https://doi.org/10.1016/j.catena.2016.07.048, 2016. a

Siewert, M. B., Lantuit, H., Richter, A., and Hugelius, G.: Permafrost Causes Unique Fine-Scale Spatial Variability Across Tundra Soils, Global Biogeochemical Cycles, 35, e2020GB006659, https://doi.org/10.1029/2020GB006659, 2021a. a, b, c, d, e, f, g, h, i, j, k, l

Siewert, M. B., Lantuit, H., Richter, A., and Hugelius, G.: Soil organic carbon, active layer depth and visible ground ice content for Herschel Island (Qikiqtaruk), PANGAEA [data set], https://doi.org/10.1594/PANGAEA.927861, 2021b. a

Signorell, A.: DescTools: Tools for Descriptive Statistics, R package version 0.99.60, CRAN [code], https://doi.org/10.32614/CRAN.package.DescTools, 2025. a

Speetjens, N. J., Berghuijs, W. R., Wagner, J., and Vonk, J. E.: Degradation of ice-wedge polygons leads to increased fluxes of water and DOC, Science of The Total Environment, 920, 170931, https://doi.org/10.1016/j.scitotenv.2024.170931, 2024. a

Strauss, J., Abbott, B. W., Hugelius, G., Schuur, E., Treat, C., Fuchs, M., Schädel, C., Ulrich, M., Turetsky, M., Keuschnig, M., Biasi, C., Yang, Y., and Grosse, G.: 9. Permafrost, in: Recarbonizing global soils – A technical manual of recommended management practices, FAO, 127–147, https://www.fao.org/publications/card/en/c/CB6378EN/ (last access: 2 February 2026), 2021. a

The Government of Yukon: Watercourses, Mapdataservice, https://map-data.service.yukon.ca/GeoYukon/Base/Watercourses_1M/ (last access: 3 December 2025), 2025. a

Turetsky, M. R., Abbott, B. W., Jones, M. C., Anthony, K. W., Olefeldt, D., Schuur, E. A. G., Grosse, G., Kuhry, P., Hugelius, G., Koven, C., Lawrence, D. M., Gibson, C., Sannel, A. B. K., and McGuire, A. D.: Carbon release through abrupt permafrost thaw, Nature Geoscience, 13, 138–143, https://doi.org/10.1038/s41561-019-0526-0, 2020. a

Virkkala, A.-M., Niittynen, P., Kemppinen, J., Marushchak, M. E., Voigt, C., Hensgens, G., Kerttula, J., Happonen, K., Tyystjärvi, V., Biasi, C., Hultman, J., Rinne, J., and Luoto, M.: High-resolution spatial patterns and drivers of terrestrial ecosystem carbon dioxide, methane, and nitrous oxide fluxes in the tundra, Biogeosciences, 21, 335–355, https://doi.org/10.5194/bg-21-335-2024, 2024. a

Wagner, J., Martin, V., Speetjens, N. J., A'Campo, W., Durstewitz, L., Lodi, R., Fritz, M., Tanski, G., Vonk, J. E., Richter, A., Bartsch, A., Lantuit, H., and Hugelius, G.: High resolution mapping shows differences in soil carbon and nitrogen stocks in areas of varying landscape history in Canadian lowland tundra, Geoderma, 438, 116652, https://doi.org/10.1016/j.geoderma.2023.116652, 2023. a, b, c, d, e, f, g, h, i, j, k, l

Wagner, J., Martin, V. S., Speetjens, N. J., A'Campo, W., Durstewitz, L., Lodi, R., Fritz, M., Tanski, G., Vonk, J. E., Richter, A., Bartsch, A., Lantuit, H., and Hugelius, G.: High resolution datasets of soil organic carbon and nitrogen stocks in Canadian lowland tundra, Dataset version 1, Bolin Centre Database [data set], https://doi.org/10.17043/wagner-2024-soc-1, 2024. a

Walker, D. A., Reynolds, M. K., Daniëls, F. J., Einarsson, E., Elvebakk, A., Gould, W. A., Katenin, A. E., Kholod, S. S., Markon, C. J., Melnikov, E. S., Moskalenko, N. G., Talbot, S. S., Yurtsev, B. A., Bliss, L. C., Edlund, S. A., Zoltai, S. C., Wilhelm, M., Bay, C., Gudjónsson, G., Moskalenko, N. G., Ananjeva, G. V., Drozdov, D. S., Konchenko, L. A., Korostelev, Y. V., Melnikov, E. S., Ponomareva, O. E., Matveyeva, N. V., Safranova, I. N., Shelkunova, R., Polezhaev, A. N., Johansen, B. E., Maier, H. A., Murray, D. F., Fleming, M. D., Trahan, N. G., Charron, T. M., Lauritzen, S. M., and Vairin, B. A.: The Circumpolar Arctic vegetation map, Journal of Vegetation Science, 16, 267–282, https://doi.org/10.1111/j.1654-1103.2005.tb02365.x, 2005. a

Widhalm, B., Bartsch, A., and Goler, R.: Normalized C-HH backscatter from Sentinel-1 (December, 2014–2017) for selected tundra regions, PANGAEA [data set], https://doi.org/10.1594/PANGAEA.897046, 2019. a, b

Wolter, J., Lantuit, H., Fritz, M., Macias-Fauria, M., Myers-Smith, I., and Herzschuh, U.: Vegetation composition and shrub extent on the Yukon coast, Canada, are strongly linked to ice-wedge polygon degradation, Polar Research, https://doi.org/10.3402/polar.v35.27489, 2016. a

Wolter, J., Lantuit, H., Wetterich, S., Rethemeyer, J., and Fritz, M.: Climatic, geomorphologic and hydrologic perturbations as drivers for mid- to late Holocene development of ice-wedge polygons in the western Canadian Arctic, Permafrost and Periglacial Processes, 29, 164–181, https://doi.org/10.1002/ppp.1977, 2018. a, b

Wolter, J., Jones, B. M., Fuchs, M., Breen, A., Bussmann, I., Koch, B., Lenz, J., Myers-Smith, I. H., Sachs, T., Strauss, J., Nitze, I., and Grosse, G.: Post-drainage vegetation, microtopography and organic matter in Arctic drained lake basins, Environmental Research Letters, 19, 045001, https://doi.org/10.1088/1748-9326/ad2eeb, 2024. a

Yigini, Y., Olmedo, G., Reiter, S., Baritz, R., Viatkin, K., and Vargas, R.: Soil Organic Carbon Mapping Cookbook, 2nd edn., FAO, https://openknowledge.fao.org/items/74551bea-cf4d-40fc-95f6-47b48f77597e (last access: 2 February 2026), 2018. a, b

Download
Short summary
Permafrost soils store vast amounts of organic carbon, key to understanding climate change. This study uses machine learning and combines existing data with new field data to create detailed regional maps of soil carbon and nitrogen stocks for the Yukon coastal plain. The results show how soil properties vary across the landscape highlighting the importance of data selection for accurate predictions. These findings improve carbon storage estimates and may aid regional carbon budget assessments.
Share