An open Soil Structure Library based on X-ray CT data

. Soil structure in terms of the spatial arrangement of pores and solids is highly relevant for most physical and biochemical processes in soil. While this was known for a long time, a scientiﬁc approach to quantify soil structural characteristics was also missing for a long time. This was due to its buried nature but also due to the three-dimensional complexity. During decades, tools to acquire full 3D images of undisturbed soil became more and more available and a number of powerful software tools were developed to reduce the complexity to a set of meaningful numbers. However, of soil structure analysis for a better comparability of the results is not and the accessibility of required computing facilities and software limited. we introduce an open-access Soil Structure Library offers well-deﬁned soil structure analyses for X-ray CT (computed tomography) data sets uploaded by interested scientists. At the the aim of this library is to serve as an open data source for real pore structures as developed in a wide spectrum of different soil types under different site conditions all over the globe, by making accessible the uploaded binarized 3D images. By combining pore structure metrics with essential soil information requested during upload (e.g., bulk density, texture, organic carbon content), this Soil Structure Library can be harnessed towards data mining and development of soil-structure-based pedotransfer functions. In this paper, we describe the architecture of the Soil Structure Library and the provided metrics. This is complemented by an example of how the database can be used to address new research questions.


Introduction
Soil structure is of central importance for soil functions. Besides its relevance for plant growth, this is also true for the storage and movement of water and solutes inside the soil pore system for biochemical matter cycling and for soil as habitat for a myriad of interacting organisms (Dexter, 1988;Rabot et al., 2018).
For a long time, a crucial hurdle in exploring soil structure was that soil is opaque so that soil structural properties were hardly accessible. This was especially true with respect to quantitative analysis as required for any scientific evaluation.
During the last three decades, with the development and increasing availability of X-ray CT scanners, we are now in the position to quantify soil structure without disturbance in full three dimensions and with a spatial resolution of a few micrometers or even below. This boosted an enormous amount of scientific insight especially with respect to the soil pore structure in relation to water dynamics and solute transport (Wildenschild and Sheppard, 2013;Larsbo et al., 2014;Tracy et al., 2015). More recently, also the importance of soil structure for the turnover of organic matter (Kravchenko et al., 2019;Schlüter et al., 2022b) (Juarez et al., 2013;Falconer et al., 2012;Juyal et al., 2019) is studied based on 3D images.
Fortunately, the available computing power increased together with the size of the images generated by X-ray CT scanners. However, when it comes to the calculation of distance distributions and connectivity measures, a considerable amount of computing power is required which often exceeds the capacity of standard computers. Another difficulty is the lack of comparability of the results since the detailed algorithms to calculate soil structural attributes such as connectivity or pore size distribution are not always obvious. Hence some standardized analysis would be beneficial to generate results that are comparable among different studies.
The motivation of the Soil Structure Library (https:// structurelib.ufz.de/, last access: 22 July 2022) introduced in this paper is to offer some standardized analysis of the 3D pore structure obtained from X-ray CT together with the required computing power. The price we charge for this service is that the analyzed structures are made freely available through our website together with the metadata describing the soil. This should generate some substantial benefit for both the data providers who get standardized analysis for their CT images and for the wider scientific community who gets access to a wide range of soil structure data including additional information on the specific climate, land use and soil type.
It should be noted that the provided analysis is limited to the analysis of binary images. This means that the user needs to upload images which are already segmented into pore and solid. We are aware that segmentation is a crucial step in image analysis and there are no objective procedures on how to do it (Baveye et al., 2010;Schlüter et al., 2014). This is why we prefer to leave this step to the data owner who, however, has to upload at least one 2D image of the original gray scale CT image so that the effect of the segmentation process is illustrated and can be understood by others.

General description of the Soil Structure Library
Our Soil Structure Library is open for anybody with a clear focus on the soil science community. For uploading image data, the user has to subscribe and to ascertain the data policy. For each uploaded data set, metadata on soil and site properties are requested. A part of this information is mandatory, another part is optional. Table 1 gives an overview of the metadata.
Although many of the parameters are listed as optional, it is highly recommended to provide a rather complete list as this information will make the data much more valuable for others. The complete content of the database, including the 3D binarized X-ray images, is provided under Creative Commons and thus can be used for further research by the entire scientific community.
The architecture of the library comprises three servers: a web server, a database server and an image processor. The web server hosts the user frontend and manages the user administration, data input, file uploads and the presentation of results and metadata. It is implemented in Django which integrates data modeling and web service based on Python. Django communicates with the database server configured as a MySQL server. The image processing is triggered as soon as the data server receives new data from the frontend. It gets transferred to the image processor consisting of a Linux workstation where an ImageJ macro is launched. Upon completion, the results are uploaded to the database server. Simultaneously, the web server sends an email to the submitter to inform them about the completion of the calculations.
All machines but the web server are behind a firewall. The only connection across this firewall is through the database connection. All other required connections are realized behind the wall. The modular structure makes it possible to provide further computing power (i.e., a computing cluster) when needed.

Images files
Three-dimensional binary images of the pore structure can either be uploaded in the popular 3D TIFF format or in the MHD/raw format common in the Insight Segmentation and Registration Toolkit (ITK). Zero values will be distinguished from non-zero values so the actual gray value of the non-zero phase, e.g., 1 or 255, does not matter. It is optional whether zero should be the foreground or background phase (i.e., pore or solid). Also, it is optional to upload a mask to specify a region of interest (ROI) for which the analysis is required. Typical examples would be ROIs for cylindrical soil cores or irregularly shaped soil aggregates. The ROI image can be uploaded in form of a binary image (TIFF or MHD/raw) with identical 3D dimensions as the uploaded image or in form of a selection (ROI format) to be created and exported in Im-ageJ. This selection is a two-dimensional image and will be applied to all slices equally (e.g., a circular selection will result in a cylindrical mask). All files have to be uploaded in one compressed ZIP folder. The name of the ROI has to be provided upon upload. The remaining image file is considered as the pore structure image.
In addition to the 3D image, two-dimensional slices in at least one but preferably in all three principal directions have to be uploaded both in grayscale and binary form in order to provide information on the quality of image segmentation. Since segmentation has to be done beforehand, it is the only form of quality control warranted in the Soil Structure Library to judge about outliers being caused by natural variation or improper segmentation.

Image analysis
The segmented image is processed with a standardized workflow implemented as an ImageJ macro that is executed in the Fiji distribution of ImageJ (Schindelin et al., 2012) and associated plugins. The binary image undergoes several transforms to extract a limited set of meaningful pore metrics ( Fig. 1). A complete list of quantitative output with units is summarized in Table 2.

Binary pore structure
A first set of structural properties is directly derived from the binary pore structure (Fig. 1a). These Minkowski functionals M 0−3 (Vogel et al., 2010;Armstrong et al., 2019) comprise fundamental properties of complex objects like volume (M 0 ), surface area (M 1 ), integral of mean curvature (M 2 ) and integral of total curvature (M 3 ). The meaning of M 0 and M 1 is obvious. M 2 is negative for concave surfaces as typical for packing voids in granular media, while it is positive for convex surfaces as spherical bubbles or cylindrical pores. For cylindrical pores, M 2 can be directly related to the length of these pores (Koebernick et al., 2014).
Boolean property (0,1) connection probability [-] ratio between 0 and 1 critical pore diameter d cr [mm] pore diameter at which percolation is lost a topological number that sums over all isolated objects N and fully enclosed cavities O and subtracts the number of redundant loops L. O is typically negligible as it represents the number of floating particles in the pore space. Therefore, χ can be interpreted as a connectivity metric that turns negative when the number of connections exceeds the number of isolated objects and vice versa.
Minkowksi functionals M i are extensive properties (meaning they change their value with the size of the image) calculated for the analyzed volume (V ROI ). We transform these quantities to densities, m i = M i /V ROI , to account for the fact that the volume of different datasets can be very different. Any metric derived from m 0−3 are indicated by the subscript v, e.g., χ v , as these are intensive properties (indifferent to image size).

Cluster analysis
As a next step, the binary image is separated into individual pore clusters (Fig. 1b) with the connected-component labeling method in MorphoLibJ (Legland et al., 2016). Two metrics are retrieved from this image: (a) percolation is determined as a Boolean property depending on whether at least one pore cluster is present that connects the top and the bottom of the image and (b) the connection probability, also denoted as indicator, is retrieved from the second moment of the cluster size distribution, where each pore cluster has a label l i and a size n i expressed as a number of voxels. N l is the total number of pore clusters and N n is the total number of pore voxels. is one when the entire pore space is connected in one big pore cluster, whereas → 0 when the pore space is very fragmented.

Distance transforms
The next steps involve an Euclidean distance transform of the pore space (Fig. 1c) and the soil matrix (Fig. 1d). This transform determines the shortest distance to the pore surface of each voxel in the foreground and background, respectively. The critical pore diameter d cr is determined by segmenting the transformed pore space at each distance step to check at which pore diameter percolation breaks down by using connected components labeling. The distance transform of the background is used to compute the contact distance distribution, i.e., the histogram of pore distances within the solid phase h(c), and derive the average pore distance c from it.

Local Thickness
The local pore diameters within the pore space are retrieved with the maximum inscribed sphere method which is called Local Thickness in Fiji. An average pore diameter d is calculated based on the histogram of the Local Thickness transform. This transform leads to a pore size distribution V v (d) which can be related to the water distribution as a function of the capillary pressure (i.e., retention characteristic) assuming spherical interfaces between water and air (Vogel et al., 2010). A similar measure is the medial axes transform where the local pore size is projected onto the skeleton of the pore space. This leads to a different pore size distribution since the volume fractions are obtained from their length along the skeleton. Moreover, this medial axis transform, which is called Skeletonize in Fiji, is very time consuming and therefore discarded here. In addition, the Minkowski densities m 0−3 are calculated as a function of pore diameter m 0−3 (d). This results in the cumulative pore size distribution V v (d) = m 0 (d), the distribution of surface density S v (d) = m 1 (d), the distribution of mean curvature density C v (d) = m 2 (d) and the distribution of the Euler number density χ v (d) = m 3 /4π(d).

Data visualization
The data visualization of both the meta information and the results of the image analysis is implemented with Dash (https://plot.ly/dash, last access: 22 July 2022), an open source library based on a Python framework for building interactive web applications. It builds on various other packages such as Pandas and Numpy for data import and transformation and Plotly for visualization.
An example of the graphical output with Dash is shown in Fig. 2. It is split into a left frame containing drop-down menus and sliders to create queries for selected data sets based on meta information, e.g., bulk densities larger than x, soil type y, etc. The right frame displays quantitative information in a scatter plot. The assignment of numeric properties to the x and y axis can be selected by the user, e.g., V v and , respectively, and different colors are assigned to different entities of a query, e.g., different soil types. The ordinate values are also summarized with averages in an additional bar chart. Finally, important meta information like geographical coordinates and texture are displayed for all data sets of the query in interactive maps and ternary diagrams, respectively. Clicking on individual data points opens the data sheet containing all meta information (Table 1) and image analysis results (Table 2) of that data set. Finally, all interactive graphs can be saved as PNG images and all underlying data can be exported as CSV tables.

Mining the Soil Structure Library
The Soil Structure Library can be harnessed in various ways. First, it is a data repository that can be used by scientists to upload their segmented X-ray CT data and make it available to the public. This data availability is becoming more important as, for good reasons, an increasing number of scientific journals have introduced a stricter policy in this respect.
Second, segmented X-ray CT data for a large number of different soils are a valuable source of realistic scenarios which can be used for development and testing of threedimensional, image-based modeling approaches. Examples of image-based modeling could be water flow and matter transport by convection and dispersion (Blunt et al., 2013;Daly et al., 2015), matter turnover by reaction and diffusion (Pot et al., 2022;Falconer et al., 2015;Zech et al., 2022) or maintenance of biodiversity by habitat modeling (Pot et al., 2022;Portell et al., 2018). The model results can be put in a broader context by regression analysis with the morphological properties of the pore structure and with the uploaded meta-information that characterize basic soil properties. In the long run, the Soil Structure Library can host a similar number and variety of pore structures in soil, like popular repositories such as the digital rocks portal (Prodanovic et al., 2015) offer for pore structures in rocks mainly for the petroleum engineering science community.
Third, the Soil Structure Library can be a reference for the suitability of soil as habitat for soil organisms. Structural attributes can be linked to biological activity or the abundance of various species (Hallett et al., 2013;Schlüter et al., 2022a). If such relations are found and can be expressed in the metrics provided, the Structure Library provides the database to identify soil types or soil management practices that are expected to impact the soil biome and its activity in one way or another. This also includes soil processes that emerge from the interplay of pore structure and microbial activity like the formation of anaerobic soil volumes and greenhouse gas formation (Rabot et al., 2015;Kravchenko et al., 2018;Rohe et al., 2021).
Fourth,the Structure Library can be mined in order to deduce general patterns, relationships or tipping points that may exist among structural properties or between basic soil properties and structural properties. A short example shall suffice here to demonstrate such a data mining approach.

A case study on connectivity metrics
Several metrics have been implemented that quantify different aspects of pore connectivity. Percolation represents the existence of a continuous path between image borders, i.e., long-range connectivity between distant locations in the pore space. The critical pore diameter d cr indicates the size of the smallest pore neck in this path. The Euler characteristic reflects the intrinsic connectivity independent of location or distance. It does not provide any information on the length scale of connections, but about the internal number of connections independent of whether they are percolating or not. It has been conjectured that, under certain conditions of structural homogeneity, the percolation threshold the number of isolated objects and redundant loops are exactly balanced, i.e., χ = 0 (Mecke, 2000;Vogel et al., 2010). The corresponding minimum pore diameter when this balance is reached shall be denoted as d χ 0 . The transition in connection probability from fully connected to completely fragmented is expected to occur in a similar pore diameter range. It will decrease monotonically when small pores are removed sequentially in a series of increasing minimum pore diameters. Until now, there has been no comprehensive analysis as to (1) whether long-range connectivity and intrinsic connectivity break apart around the same pore diameter and (2) what the remaining pore volume V v and connection probability Figure 2. Visualization of the Gamma connectivity as a function of porosity for all data sets. The plotted metrics can be selected under the "Graphic" tab. A subset of data can be plotted by using the filter option to select specific site characteristics. Hovering over the single data points provides more information on the soil type and a direct link to the complete metadata of the selected sample. at these pore diameters (d cr and d χ 0 ) are. In addition, these connectivity metrics may serve as a fingerprint of the pore structure that can distinguish between pore systems that are generated by different processes. Such an approach will be demonstrated here for a selection of soil samples with identical resolution (20 µm) and similar soil properties (texture, SOM content, climate, etc.) but managed as long-term conventional tillage (CT) or no-tillage (NT). Samples for both treatments originate from different locations Lucas et al., 2019). All samples without a percolating pore cluster were sorted out beforehand and all isolated pores in the original pore network were removed in order to ensure that the analyzed pore space is well connected and χ of the entire percolating cluster is negative. The complete data set comprises 104 samples (CT: 34, NT: 70).
It turns out that d χ 0 is much smaller than d cr irrespective of tillage treatment (Fig. 3a-b). This is because d χ 0 indicates at which pore diameter the removal of pore necks due to morphological openings has created as many isolated pores as remaining redundant connections, whereas d cr indicates when the last pore object that still sustains vertical percolation is lost towards the end of this succession of pore removal steps. Both d χ 0 and d cr are significantly higher in pore structures produced by plowing (Fig. 3b). That is, the fragmentation of the soil through mechanical disturbance forms a network of large macropores with higher connectivity. There are occasional outliers for d cr in the NT treatment that represent samples with at least one large continuous biopore from top to bottom that is not refilled by casts or soil fragments. In fact, these outliers even lead to similar d cr averages (crosses in violin plots) despite significantly different populations in terms of rank metrics (Fig. 3b).
The very different pore morphology with and without plowing also manifests itself in the remaining porosity at the minimum pore diameter where d χ 0 and d cr occur (Fig. 3c). The pore structure in undisturbed soil (NT) is dominated by cracks and biopores with elongated, planar or cylindrical shapes. That is, they stretch across long distances with rather small volumes. The pore structure after plowing, in turn, is dominated by more bulky and isotropic packing pores with a lower spatial extent per volume. This is why the pore structure in NT soils needs significantly less porosity to sustain both d χ 0 and d cr .
As a consequence, the connection probability of the remaining porosity at d χ 0 is larger in CT soils (Fig. 3d) because more of the NT soils have already reached a critical macroporosity range around 0.05-0.1 at which decreases sharply. At d cr , both NT and CT pore structures are in this critical macroporosity range. As a result, there is a huge variability in with fluctuations across the entire possible range. In addition, both treatments exhibit a distinct bimodal distribution of (d cr ) enforced by the non-linear relationship between visible porosity and connection probability.
Finally, it has been conjectured before (Schlüter et al., 2011;Lucas et al., 2019) that biopores produced by fine roots with typical diameters of 0.1-0.2 mm are the main contributor to pore connectivity in no-till samples. This is confirmed with this case study by the fact that (i) d cr falls into this root diameter range and (ii) also reaches 0.5 around d cr .

Conclusions
With the Soil Structure Library, we have established a free platform for sharing segmented X-ray CT images of pore images among the soil science community. The library can be used as a conventional data repository to provide access to 3D large image data, which is a service that has not been available until now but is becoming more important with updated data policies of many journals. Likewise, the Soil Structure Library is a rich source of realistic threedimensional pore structures for image-based modeling on a large range of image resolutions and domain sizes. Access to such image data is appealing especially to scientists with no or limited access to imaging facilities. In a similar vein, the Soil Structure Library offers free, standardized and reproducible soil structure analysis to users who lack the computing infrastructure or expertise for pore structure analysis. The full potential of the Soil Structure Library unfolds, however, when harnessed for data mining and regression analysis with complementary meta-information in order to better understand the relationship between soil structure and soil functions.
Code availability. The routines that are used for calculations are cited in the paper, namely in Sect. 2.2. The used software is ImageJ (Schindelin et al., 2012), with the extensions SoilJ (Koestel, 2018) and MorphoLibJ (Legland et al., 2016). A script that provides the standardized production of the calculated data is available in the Supplement.
Data availability. The data referred to in the text are available via the Soil Structure Library (https://structurelib.ufz.de/, UFZ and BONARES, 2022).