Articles | Volume 5, issue 2
https://doi.org/10.5194/soil-5-177-2019
© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/soil-5-177-2019
© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Word embeddings for application in geosciences: development, evaluation, and examples of soil-related concepts
José Padarian
CORRESPONDING AUTHOR
Sydney Institute of Agriculture & School of Life and Environmental Sciences, The University of Sydney, New South Wales, Australia
Ignacio Fuentes
Sydney Institute of Agriculture & School of Life and Environmental Sciences, The University of Sydney, New South Wales, Australia
Related authors
Marliana Tri Widyastuti, Budiman Minasny, José Padarian, Federico Maggi, Matt Aitkenhead, Amélie Beucher, John Connolly, Dian Fiantis, Darren Kidd, Yuxin Ma, Fraser Macfarlane, Ciaran Robb, Rudiyanto, Budi Indra Setiawan, and Muh Taufik
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2024-333, https://doi.org/10.5194/essd-2024-333, 2024
Preprint under review for ESSD
Short summary
Short summary
PEATGRIDS, the first dataset containing maps of global peat thickness and carbon stock at 1 km resolution. The dataset has been publicly available at Zenodo to support further analyses and modelling of peatlands across the globe. This work employed the random forest machine learning model to provide spatially explicit peat carbon stock at pixel basis.
Marliana Tri Widyastuti, José Padarian, Budiman Minasny, Mathew Webb, Muh Taufik, and Darren Kidd
EGUsphere, https://doi.org/10.5194/egusphere-2024-2253, https://doi.org/10.5194/egusphere-2024-2253, 2024
Short summary
Short summary
This work aims to predict soil water content across a large region at fine spatial and temporal resolution (80 m grids, daily) to support agricultural management. It covers modelling assessment to predict multilevel soil moisture spatially via deep learning method. We address the challenge of mapping soil moisture at field scale resolution for Tasmania and perform the optimal model for near-real-time monitoring. This contributes to the deep learning method's applicability in soil science.
José Padarian, Budiman Minasny, Alex B. McBratney, and Pete Smith
SOIL Discuss., https://doi.org/10.5194/soil-2021-73, https://doi.org/10.5194/soil-2021-73, 2021
Manuscript not accepted for further review
Short summary
Short summary
Soil organic carbon sequestration is considered an attractive technology to partially mitigate climate change. Here, we show how the SOC storage potential varies globally. The estimated additional SOC storage potential in the topsoil of global croplands (29–67 Pg C) equates to only 2 to 5 years of emissions offsetting and 32 % of agriculture's 92 Pg historical carbon debt. Since SOC is temperature-dependent, this potential is likely to reduce by 18 % by 2040 due to climate change.
José Padarian, Alex B. McBratney, and Budiman Minasny
SOIL, 6, 389–397, https://doi.org/10.5194/soil-6-389-2020, https://doi.org/10.5194/soil-6-389-2020, 2020
Short summary
Short summary
In this paper we introduce the use of game theory to interpret a digital soil mapping (DSM) model to understand the contribution of environmental factors to the prediction of soil organic carbon (SOC) in Chile. The analysis corroborated that the SOC model is capturing sensible relationships between SOC and climatic and topographical factors. We were able to represent them spatially (map) addressing the limitations of the current interpretation of models in DSM.
José Padarian and Alex B. McBratney
SOIL, 6, 89–94, https://doi.org/10.5194/soil-6-89-2020, https://doi.org/10.5194/soil-6-89-2020, 2020
Short summary
Short summary
Data sharing and collaboration are critical to solving large-scale problems. The prevailing soil data-sharing model is of a centralized nature and, consequently, results in the participants ceding control and governance over their data to the lead party. Here we explore the use of a distributed ledger (blockchain) to solve the aforementioned issues. We also describe the potential use case of developing a global soil spectral library between multiple, international institutions.
José Padarian, Budiman Minasny, and Alex B. McBratney
SOIL, 6, 35–52, https://doi.org/10.5194/soil-6-35-2020, https://doi.org/10.5194/soil-6-35-2020, 2020
Short summary
Short summary
The application of machine learning (ML) has shown an accelerated adoption in soil sciences. It is a difficult task to manually review all papers on the application of ML. This paper aims to provide a review of the application of ML aided by topic modelling in order to find patterns in a large collection of publications. The objective is to gain insight into the applications and to discuss research gaps. We found 12 main topics and that ML methods usually perform better than traditional ones.
José Padarian, Budiman Minasny, and Alex B. McBratney
SOIL, 5, 79–89, https://doi.org/10.5194/soil-5-79-2019, https://doi.org/10.5194/soil-5-79-2019, 2019
Short summary
Short summary
Digital soil mapping has been widely used as a cost-effective method for generating soil maps. DSM models are usually calibrated using point observations and rarely incorporate contextual information of the landscape. Here, we use convolutional neural networks to incorporate spatial context. We used as input a 3-D stack of covariate images to simultaneously predict organic carbon content at multiple depths. In this study, our model reduced the error by 30 % compared with conventional techniques.
Marliana Tri Widyastuti, Budiman Minasny, José Padarian, Federico Maggi, Matt Aitkenhead, Amélie Beucher, John Connolly, Dian Fiantis, Darren Kidd, Yuxin Ma, Fraser Macfarlane, Ciaran Robb, Rudiyanto, Budi Indra Setiawan, and Muh Taufik
Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2024-333, https://doi.org/10.5194/essd-2024-333, 2024
Preprint under review for ESSD
Short summary
Short summary
PEATGRIDS, the first dataset containing maps of global peat thickness and carbon stock at 1 km resolution. The dataset has been publicly available at Zenodo to support further analyses and modelling of peatlands across the globe. This work employed the random forest machine learning model to provide spatially explicit peat carbon stock at pixel basis.
Marliana Tri Widyastuti, José Padarian, Budiman Minasny, Mathew Webb, Muh Taufik, and Darren Kidd
EGUsphere, https://doi.org/10.5194/egusphere-2024-2253, https://doi.org/10.5194/egusphere-2024-2253, 2024
Short summary
Short summary
This work aims to predict soil water content across a large region at fine spatial and temporal resolution (80 m grids, daily) to support agricultural management. It covers modelling assessment to predict multilevel soil moisture spatially via deep learning method. We address the challenge of mapping soil moisture at field scale resolution for Tasmania and perform the optimal model for near-real-time monitoring. This contributes to the deep learning method's applicability in soil science.
José Padarian, Budiman Minasny, Alex B. McBratney, and Pete Smith
SOIL Discuss., https://doi.org/10.5194/soil-2021-73, https://doi.org/10.5194/soil-2021-73, 2021
Manuscript not accepted for further review
Short summary
Short summary
Soil organic carbon sequestration is considered an attractive technology to partially mitigate climate change. Here, we show how the SOC storage potential varies globally. The estimated additional SOC storage potential in the topsoil of global croplands (29–67 Pg C) equates to only 2 to 5 years of emissions offsetting and 32 % of agriculture's 92 Pg historical carbon debt. Since SOC is temperature-dependent, this potential is likely to reduce by 18 % by 2040 due to climate change.
José Padarian, Alex B. McBratney, and Budiman Minasny
SOIL, 6, 389–397, https://doi.org/10.5194/soil-6-389-2020, https://doi.org/10.5194/soil-6-389-2020, 2020
Short summary
Short summary
In this paper we introduce the use of game theory to interpret a digital soil mapping (DSM) model to understand the contribution of environmental factors to the prediction of soil organic carbon (SOC) in Chile. The analysis corroborated that the SOC model is capturing sensible relationships between SOC and climatic and topographical factors. We were able to represent them spatially (map) addressing the limitations of the current interpretation of models in DSM.
José Padarian and Alex B. McBratney
SOIL, 6, 89–94, https://doi.org/10.5194/soil-6-89-2020, https://doi.org/10.5194/soil-6-89-2020, 2020
Short summary
Short summary
Data sharing and collaboration are critical to solving large-scale problems. The prevailing soil data-sharing model is of a centralized nature and, consequently, results in the participants ceding control and governance over their data to the lead party. Here we explore the use of a distributed ledger (blockchain) to solve the aforementioned issues. We also describe the potential use case of developing a global soil spectral library between multiple, international institutions.
José Padarian, Budiman Minasny, and Alex B. McBratney
SOIL, 6, 35–52, https://doi.org/10.5194/soil-6-35-2020, https://doi.org/10.5194/soil-6-35-2020, 2020
Short summary
Short summary
The application of machine learning (ML) has shown an accelerated adoption in soil sciences. It is a difficult task to manually review all papers on the application of ML. This paper aims to provide a review of the application of ML aided by topic modelling in order to find patterns in a large collection of publications. The objective is to gain insight into the applications and to discuss research gaps. We found 12 main topics and that ML methods usually perform better than traditional ones.
José Padarian, Budiman Minasny, and Alex B. McBratney
SOIL, 5, 79–89, https://doi.org/10.5194/soil-5-79-2019, https://doi.org/10.5194/soil-5-79-2019, 2019
Short summary
Short summary
Digital soil mapping has been widely used as a cost-effective method for generating soil maps. DSM models are usually calibrated using point observations and rarely incorporate contextual information of the landscape. Here, we use convolutional neural networks to incorporate spatial context. We used as input a 3-D stack of covariate images to simultaneously predict organic carbon content at multiple depths. In this study, our model reduced the error by 30 % compared with conventional techniques.
Related subject area
Soil and methods
Spatial prediction of organic carbon in German agricultural topsoil using machine learning algorithms
On the benefits of clustering approaches in digital soil mapping: an application example concerning soil texture regionalization
An open Soil Structure Library based on X-ray CT data
Identification of thermal signature and quantification of charcoal in soil using differential scanning calorimetry and benzene polycarboxylic acid (BPCA) markers
Estimating soil fungal abundance and diversity at a macroecological scale with deep learning spectrotransfer functions
An underground, wireless, open-source, low-cost system for monitoring oxygen, temperature, and soil moisture
Estimation of soil properties with mid-infrared soil spectroscopy across yam production landscapes in West Africa
The central African soil spectral library: a new soil infrared repository and a geographical prediction analysis
Developing the Swiss mid-infrared soil spectral library for local estimation and monitoring
Predicting the spatial distribution of soil organic carbon stock in Swedish forests using a group of covariates and site-specific data
Improved calibration of the Green–Ampt infiltration module in the EROSION-2D/3D model using a rainfall-runoff experiment database
Quantifying soil carbon in temperate peatlands using a mid-IR soil spectral library
Are researchers following best storage practices for measuring soil biochemical properties?
Quantifying and correcting for pre-assay CO2 loss in short-term carbon mineralization assays
The influence of training sample size on the accuracy of deep learning models for the prediction of soil properties with near-infrared spectroscopy data
Game theory interpretation of digital soil mapping convolutional neural networks
Comparing three approaches of spatial disaggregation of legacy soil maps based on the Disaggregation and Harmonisation of Soil Map Units Through Resampled Classification Trees (DSMART) algorithm
Oblique geographic coordinates as covariates for digital soil mapping
Development of pedotransfer functions for water retention in tropical mountain soil landscapes: spotlight on parameter tuning in machine learning
The 15N gas-flux method to determine N2 flux: a comparison of different tracer addition approaches
A new model for intra- and inter-institutional soil data sharing
Machine learning and soil sciences: a review aided by machine learning tools
Identification of new microbial functional standards for soil quality assessment
Identifying and quantifying geogenic organic carbon in soils – the case of graphite
Error propagation in spectrometric functions of soil organic carbon
Soil lacquer peel do-it-yourself: simply capturing beauty
Multi-source data integration for soil mapping using deep learning
Using deep learning for digital soil mapping
No silver bullet for digital soil mapping: country-specific soil organic carbon estimates across Latin America
Separation of soil respiration: a site-specific comparison of partition methods
Proximal sensing for soil carbon accounting
Evaluation of digital soil mapping approaches with large sets of environmental covariates
Planning spatial sampling of the soil from an uncertain reconnaissance variogram
Mapping of soil properties at high resolution in Switzerland using boosted geoadditive models
Quantitative imaging of the 3-D distribution of cation adsorption sites in undisturbed soil
Decision support for the selection of reference sites using 137Cs as a soil erosion tracer
Soil organic carbon stocks are systematically overestimated by misuse of the parameters bulk density and rock fragment content
The added value of biomarker analysis to the genesis of plaggic Anthrosols; the identification of stable fillings used for the production of plaggic manure
Synchrotron microtomographic quantification of geometrical soil pore characteristics affected by compaction
Pedotransfer functions for Irish soils – estimation of bulk density (ρb) per horizon type
Assessing the performance of a plastic optical fibre turbidity sensor for measuring post-fire erosion from plot to catchment scale
Passive soil heating using an inexpensive infrared mirror design – a proof of concept
The application of terrestrial laser scanner and SfM photogrammetry in measuring erosion and deposition processes in two opposite slopes in a humid badlands area (central Spanish Pyrenees)
Soil surface roughness: comparing old and new measuring methods and application in a soil erosion model
Comparison of spatial association approaches for landscape mapping of soil organic carbon stocks
Eddy covariance for quantifying trace gas fluxes from soils
Ali Sakhaee, Anika Gebauer, Mareike Ließ, and Axel Don
SOIL, 8, 587–604, https://doi.org/10.5194/soil-8-587-2022, https://doi.org/10.5194/soil-8-587-2022, 2022
Short summary
Short summary
As soil carbon has become a key component of climate-smart agriculture, the demand for high-resolution maps has increased drastically. Meanwhile, machine learning algorithms are becoming more widely used and are opening up new solutions in soil mapping. This paper shows which algorithms perform best, how soil inventory data can be most efficiently used for digital soil mapping, and the different available options and methods to derive high-resolution soil carbon data at the large regional scale.
István Dunkl and Mareike Ließ
SOIL, 8, 541–558, https://doi.org/10.5194/soil-8-541-2022, https://doi.org/10.5194/soil-8-541-2022, 2022
Short summary
Short summary
Digital soil mapping (DSM) allows us to regionalize soil properties by relating them to environmental covariates with the help of an empirical model. Legacy soil data provide a valuable basis to generate high-resolution soil maps with DSM. We studied the usefulness of data-clustering methods to tackle potential sampling bias in legacy soil data while applying DSM for soil texture regionalization. Clustering has proved to be useful in various steps of the DSM process.
Ulrich Weller, Lukas Albrecht, Steffen Schlüter, and Hans-Jörg Vogel
SOIL, 8, 507–515, https://doi.org/10.5194/soil-8-507-2022, https://doi.org/10.5194/soil-8-507-2022, 2022
Short summary
Short summary
Soil structure is of central importance for soil functions. It is, however, ill defined. With the increasing availability of X-ray CT scanners, more and more soils are scanned and an undisturbed image of the soil's structure is produced. Often, a qualitative description is all that is derived from these images. We provide now a web-based Soil Structure Library where these images can be evaluated in a standardized quantitative way and can be compared to a world-wide data set.
Brieuc Hardy, Nils Borchard, and Jens Leifeld
SOIL, 8, 451–466, https://doi.org/10.5194/soil-8-451-2022, https://doi.org/10.5194/soil-8-451-2022, 2022
Short summary
Short summary
Soil amendment with artificial black carbon (BC; biomass transformed by incomplete combustion) has the potential to mitigate climate change. Nevertheless, the accurate quantification of BC in soil remains a critical issue. Here, we successfully used dynamic thermal analysis (DTA) to quantify centennial BC in soil. We demonstrate that DTA is largely under-exploited despite providing rapid and low-cost quantitative information over the range of soil organic matter.
Yuanyuan Yang, Zefang Shen, Andrew Bissett, and Raphael A. Viscarra Rossel
SOIL, 8, 223–235, https://doi.org/10.5194/soil-8-223-2022, https://doi.org/10.5194/soil-8-223-2022, 2022
Short summary
Short summary
We present a new method to estimate the relative abundance of the dominant phyla and diversity of fungi in Australian soil. It uses state-of-the-art machine learning with publicly available data on soil and environmental proxies for edaphic, climatic, biotic and topographic factors, and visible–near infrared wavelengths. The estimates could serve to supplement the more expensive molecular approaches towards a better understanding of soil fungal abundance and diversity in agronomy and ecology.
Elad Levintal, Yonatan Ganot, Gail Taylor, Peter Freer-Smith, Kosana Suvocarev, and Helen E. Dahlke
SOIL, 8, 85–97, https://doi.org/10.5194/soil-8-85-2022, https://doi.org/10.5194/soil-8-85-2022, 2022
Short summary
Short summary
Do-it-yourself hardware is a new approach for improving measurement resolution in research. Here we present a new low-cost, wireless underground sensor network for soil monitoring. All data logging, power, and communication component cost is USD 150, much cheaper than other available commercial solutions. We provide the complete building guide to reduce any technical barriers, which we hope will allow easier reproducibility and open new environmental monitoring applications.
Philipp Baumann, Juhwan Lee, Emmanuel Frossard, Laurie Paule Schönholzer, Lucien Diby, Valérie Kouamé Hgaza, Delwende Innocent Kiba, Andrew Sila, Keith Sheperd, and Johan Six
SOIL, 7, 717–731, https://doi.org/10.5194/soil-7-717-2021, https://doi.org/10.5194/soil-7-717-2021, 2021
Short summary
Short summary
This work delivers openly accessible and validated calibrations for diagnosing 26 soil properties based on mid-infrared spectroscopy. These were developed for four regions in Burkina Faso and Côte d'Ivoire, including 80 fields of smallholder farmers. The models can help to site-specifically and cost-efficiently monitor soil quality and fertility constraints to ameliorate soils and yields of yam or other staple crops in the four regions between the humid forest and the northern Guinean savanna.
Laura Summerauer, Philipp Baumann, Leonardo Ramirez-Lopez, Matti Barthel, Marijn Bauters, Benjamin Bukombe, Mario Reichenbach, Pascal Boeckx, Elizabeth Kearsley, Kristof Van Oost, Bernard Vanlauwe, Dieudonné Chiragaga, Aimé Bisimwa Heri-Kazi, Pieter Moonen, Andrew Sila, Keith Shepherd, Basile Bazirake Mujinya, Eric Van Ranst, Geert Baert, Sebastian Doetterl, and Johan Six
SOIL, 7, 693–715, https://doi.org/10.5194/soil-7-693-2021, https://doi.org/10.5194/soil-7-693-2021, 2021
Short summary
Short summary
We present a soil mid-infrared library with over 1800 samples from central Africa in order to facilitate soil analyses of this highly understudied yet critical area. Together with an existing continental library, we demonstrate a regional analysis and geographical extrapolation to predict total carbon and nitrogen. Our results show accurate predictions and highlight the value that the data contribute to existing libraries. Our library is openly available for public use and for expansion.
Philipp Baumann, Anatol Helfenstein, Andreas Gubler, Armin Keller, Reto Giulio Meuli, Daniel Wächter, Juhwan Lee, Raphael Viscarra Rossel, and Johan Six
SOIL, 7, 525–546, https://doi.org/10.5194/soil-7-525-2021, https://doi.org/10.5194/soil-7-525-2021, 2021
Short summary
Short summary
We developed the Swiss mid-infrared spectral library and a statistical model collection across 4374 soil samples with reference measurements of 16 properties. Our library incorporates soil from 1094 grid locations and 71 long-term monitoring sites. This work confirms once again that nationwide spectral libraries with diverse soils can reliably feed information to a fast chemical diagnosis. Our data-driven reduction of the library has the potential to accurately monitor carbon at the plot scale.
Kpade O. L. Hounkpatin, Johan Stendahl, Mattias Lundblad, and Erik Karltun
SOIL, 7, 377–398, https://doi.org/10.5194/soil-7-377-2021, https://doi.org/10.5194/soil-7-377-2021, 2021
Short summary
Short summary
Forests store large amounts of carbon in soils. Implementing suitable measures to improve the sink potential of forest soils would require accurate data on the carbon stored in forest soils and a better understanding of the factors affecting this storage. This study showed that the prediction of soil carbon stock in Swedish forest soils can increase in accuracy when one divides a big region into smaller areas in combination with information collected locally and derived from satellites.
Hana Beitlerová, Jonas Lenz, Jan Devátý, Martin Mistr, Jiří Kapička, Arno Buchholz, Ilona Gerndtová, and Anne Routschek
SOIL, 7, 241–253, https://doi.org/10.5194/soil-7-241-2021, https://doi.org/10.5194/soil-7-241-2021, 2021
Short summary
Short summary
This study presents transfer functions for a calibration parameter of the Green–Ampt infiltration module of the EROSION-2D/3D model, which are significantly improving the model performance compared to the current state. The relationships found between calibration parameters and soil parameters however put the Green–Ampt implementation in the model and the state-of-the-art parametrization method in question. A new direction of the infiltration module development is proposed.
Anatol Helfenstein, Philipp Baumann, Raphael Viscarra Rossel, Andreas Gubler, Stefan Oechslin, and Johan Six
SOIL, 7, 193–215, https://doi.org/10.5194/soil-7-193-2021, https://doi.org/10.5194/soil-7-193-2021, 2021
Short summary
Short summary
In this study, we show that a soil spectral library (SSL) can be used to predict soil carbon at new and very different locations. The importance of this finding is that it requires less time-consuming lab work than calibrating a new model for every local application, while still remaining similar to or more accurate than local models. Furthermore, we show that this method even works for predicting (drained) peat soils, using a SSL with mostly mineral soils containing much less soil carbon.
Jennifer M. Rhymes, Irene Cordero, Mathilde Chomel, Jocelyn M. Lavallee, Angela L. Straathof, Deborah Ashworth, Holly Langridge, Marina Semchenko, Franciska T. de Vries, David Johnson, and Richard D. Bardgett
SOIL, 7, 95–106, https://doi.org/10.5194/soil-7-95-2021, https://doi.org/10.5194/soil-7-95-2021, 2021
Matthew A. Belanger, Carmella Vizza, G. Philip Robertson, and Sarah S. Roley
SOIL, 7, 47–52, https://doi.org/10.5194/soil-7-47-2021, https://doi.org/10.5194/soil-7-47-2021, 2021
Short summary
Short summary
Soil health is often assessed by re-wetting a dry soil and measuring CO2 production, but the potential bias introduced by soils of different moisture contents is unclear. Our study found that wetter soil tended to lose more carbon during drying than drier soil, thus affecting soil health interpretations. We developed a correction factor to account for initial soil moisture effects, which future studies may benefit from adapting for their soil.
Wartini Ng, Budiman Minasny, Wanderson de Sousa Mendes, and José Alexandre Melo Demattê
SOIL, 6, 565–578, https://doi.org/10.5194/soil-6-565-2020, https://doi.org/10.5194/soil-6-565-2020, 2020
Short summary
Short summary
The number of samples utilised to create predictive models affected model performance. This research compares the number of samples needed by a deep learning model to outperform the traditional machine learning models using visible near-infrared spectroscopy data for soil properties predictions. The deep learning model was found to outperform machine learning models when the sample size was above 2000.
José Padarian, Alex B. McBratney, and Budiman Minasny
SOIL, 6, 389–397, https://doi.org/10.5194/soil-6-389-2020, https://doi.org/10.5194/soil-6-389-2020, 2020
Short summary
Short summary
In this paper we introduce the use of game theory to interpret a digital soil mapping (DSM) model to understand the contribution of environmental factors to the prediction of soil organic carbon (SOC) in Chile. The analysis corroborated that the SOC model is capturing sensible relationships between SOC and climatic and topographical factors. We were able to represent them spatially (map) addressing the limitations of the current interpretation of models in DSM.
Yosra Ellili-Bargaoui, Brendan Philip Malone, Didier Michot, Budiman Minasny, Sébastien Vincent, Christian Walter, and Blandine Lemercier
SOIL, 6, 371–388, https://doi.org/10.5194/soil-6-371-2020, https://doi.org/10.5194/soil-6-371-2020, 2020
Anders Bjørn Møller, Amélie Marie Beucher, Nastaran Pouladi, and Mogens Humlekrog Greve
SOIL, 6, 269–289, https://doi.org/10.5194/soil-6-269-2020, https://doi.org/10.5194/soil-6-269-2020, 2020
Short summary
Short summary
Decision trees have become a widely adapted tool for mapping soil properties in geographic space. However, it is problematic to implement spatial relationships in the models. We present a new method which uses geographic coordinates along several axes tilted at oblique angles in the models. We test this method on four spatial datasets. The results show that the new method is at least as accurate as other proposed alternatives, has a computational advantage and is flexible and interpretable.
Anika Gebauer, Monja Ellinger, Victor M. Brito Gomez, and Mareike Ließ
SOIL, 6, 215–229, https://doi.org/10.5194/soil-6-215-2020, https://doi.org/10.5194/soil-6-215-2020, 2020
Short summary
Short summary
Pedotransfer functions (PTFs) for soil water retention were developed for two tropical soil landscapes using machine learning. The models corresponding to these PTFs had to be adjusted by tuning their parameters. The standard tuning approach was compared to mathematical optimization. The latter resulted in much better model performance. The PTFs derived are of particular importance for soil process and hydrological models.
Dominika Lewicka-Szczebak and Reinhard Well
SOIL, 6, 145–152, https://doi.org/10.5194/soil-6-145-2020, https://doi.org/10.5194/soil-6-145-2020, 2020
Short summary
Short summary
This study aimed at comparison of various experimental strategies for incubating soil samples to determine the N2 flux. Such experiments require addition of isotope tracer, i.e. nitrogen fertilizer enriched in heavy nitrogen isotopes (15N). Here we compared the impact of soil homogenization and mixing with the tracer and tracer injection to the intact soil cores. The results are well comparable: both techniques would provide similar conclusions on the magnitude of N2 flux.
José Padarian and Alex B. McBratney
SOIL, 6, 89–94, https://doi.org/10.5194/soil-6-89-2020, https://doi.org/10.5194/soil-6-89-2020, 2020
Short summary
Short summary
Data sharing and collaboration are critical to solving large-scale problems. The prevailing soil data-sharing model is of a centralized nature and, consequently, results in the participants ceding control and governance over their data to the lead party. Here we explore the use of a distributed ledger (blockchain) to solve the aforementioned issues. We also describe the potential use case of developing a global soil spectral library between multiple, international institutions.
José Padarian, Budiman Minasny, and Alex B. McBratney
SOIL, 6, 35–52, https://doi.org/10.5194/soil-6-35-2020, https://doi.org/10.5194/soil-6-35-2020, 2020
Short summary
Short summary
The application of machine learning (ML) has shown an accelerated adoption in soil sciences. It is a difficult task to manually review all papers on the application of ML. This paper aims to provide a review of the application of ML aided by topic modelling in order to find patterns in a large collection of publications. The objective is to gain insight into the applications and to discuss research gaps. We found 12 main topics and that ML methods usually perform better than traditional ones.
Sören Thiele-Bruhn, Michael Schloter, Berndt-Michael Wilke, Lee A. Beaudette, Fabrice Martin-Laurent, Nathalie Cheviron, Christian Mougin, and Jörg Römbke
SOIL, 6, 17–34, https://doi.org/10.5194/soil-6-17-2020, https://doi.org/10.5194/soil-6-17-2020, 2020
Short summary
Short summary
Soil quality depends on the functioning of soil microbiota. Only a few standardized methods are available to assess this as well as adverse effects of human activities. So we need to identify promising additional methods that target soil microbial function. Discussed are (i) molecular methods using qPCR for new endpoints, e.g. in N and P cycling and greenhouse gas emissions, (ii) techniques for fungal enzyme activities, and (iii) field methods on carbon turnover such as the litter bag test.
Jeroen H. T. Zethof, Martin Leue, Cordula Vogel, Shane W. Stoner, and Karsten Kalbitz
SOIL, 5, 383–398, https://doi.org/10.5194/soil-5-383-2019, https://doi.org/10.5194/soil-5-383-2019, 2019
Short summary
Short summary
A widely overlooked source of carbon (C) in the soil environment is organic C of geogenic origin, e.g. graphite. Appropriate methods are not available to quantify graphite and to differentiate it from other organic and inorganic C sources in soils. Therefore, we examined Fourier transform infrared spectroscopy, thermogravimetric analysis and the smart combustion method for their ability to identify and quantify graphitic C in soils. The smart combustion method showed the most promising results.
Monja Ellinger, Ines Merbach, Ulrike Werban, and Mareike Ließ
SOIL, 5, 275–288, https://doi.org/10.5194/soil-5-275-2019, https://doi.org/10.5194/soil-5-275-2019, 2019
Short summary
Short summary
Vis–NIR spectrometry is often applied to capture soil organic carbon (SOC). This study addresses the impact of the involved data and modelling aspects on SOC precision with a focus on the propagation of input data uncertainties. It emphasizes the necessity of transparent documentation of the measurement protocol and the model building and validation procedure. Particularly, when Vis–NIR spectrometry is used for soil monitoring, the aspect of uncertainty propagation becomes essential.
Cathelijne R. Stoof, Jasper H. J. Candel, Laszlo A. G. M. van der Wal, and Gert Peek
SOIL, 5, 159–175, https://doi.org/10.5194/soil-5-159-2019, https://doi.org/10.5194/soil-5-159-2019, 2019
Short summary
Short summary
Teaching and outreach of soils is often done with real-life snapshots of soils and sediments in lacquer or glue peels. While it may seem hard, anyone can make such a peel. Illustrated with handmade drawings and an instructional video, we explain how to capture soils in peels using readily available materials. A new twist to old methods makes this safer, simpler, and more successful, and thus a true DIY (do-it-yourself) activity, highlighting the value and beauty of the ground below our feet.
Alexandre M. J.-C. Wadoux, José Padarian, and Budiman Minasny
SOIL, 5, 107–119, https://doi.org/10.5194/soil-5-107-2019, https://doi.org/10.5194/soil-5-107-2019, 2019
José Padarian, Budiman Minasny, and Alex B. McBratney
SOIL, 5, 79–89, https://doi.org/10.5194/soil-5-79-2019, https://doi.org/10.5194/soil-5-79-2019, 2019
Short summary
Short summary
Digital soil mapping has been widely used as a cost-effective method for generating soil maps. DSM models are usually calibrated using point observations and rarely incorporate contextual information of the landscape. Here, we use convolutional neural networks to incorporate spatial context. We used as input a 3-D stack of covariate images to simultaneously predict organic carbon content at multiple depths. In this study, our model reduced the error by 30 % compared with conventional techniques.
Mario Guevara, Guillermo Federico Olmedo, Emma Stell, Yusuf Yigini, Yameli Aguilar Duarte, Carlos Arellano Hernández, Gloria E. Arévalo, Carlos Eduardo Arroyo-Cruz, Adriana Bolivar, Sally Bunning, Nelson Bustamante Cañas, Carlos Omar Cruz-Gaistardo, Fabian Davila, Martin Dell Acqua, Arnulfo Encina, Hernán Figueredo Tacona, Fernando Fontes, José Antonio Hernández Herrera, Alejandro Roberto Ibelles Navarro, Veronica Loayza, Alexandra M. Manueles, Fernando Mendoza Jara, Carolina Olivera, Rodrigo Osorio Hermosilla, Gonzalo Pereira, Pablo Prieto, Iván Alexis Ramos, Juan Carlos Rey Brina, Rafael Rivera, Javier Rodríguez-Rodríguez, Ronald Roopnarine, Albán Rosales Ibarra, Kenset Amaury Rosales Riveiro, Guillermo Andrés Schulz, Adrian Spence, Gustavo M. Vasques, Ronald R. Vargas, and Rodrigo Vargas
SOIL, 4, 173–193, https://doi.org/10.5194/soil-4-173-2018, https://doi.org/10.5194/soil-4-173-2018, 2018
Short summary
Short summary
We provide a reproducible multi-modeling approach for SOC mapping across Latin America on a country-specific basis as required by the Global Soil Partnership of the United Nations. We identify key prediction factors for SOC across each country. We compare and test different methods to generate spatially explicit predictions of SOC and conclude that there is no best method on a quantifiable basis.
Louis-Pierre Comeau, Derrick Y. F. Lai, Jane Jinglan Cui, and Jenny Farmer
SOIL, 4, 141–152, https://doi.org/10.5194/soil-4-141-2018, https://doi.org/10.5194/soil-4-141-2018, 2018
Short summary
Short summary
To date, there are still many uncertainties and unknowns regarding the soil respiration partitioning procedures. This study compared the suitability and accuracy of five different respiration partitioning methods. A qualitative evaluation table of the partition methods with five performance parameters was produced. Overall, no systematically superior or inferior partition method was found and the combination of two or more methods optimizes assessment reliability.
Jacqueline R. England and Raphael A. Viscarra Rossel
SOIL, 4, 101–122, https://doi.org/10.5194/soil-4-101-2018, https://doi.org/10.5194/soil-4-101-2018, 2018
Short summary
Short summary
Proximal sensing can be used for soil C accounting, but the methods need to be standardized and procedural guidelines developed to ensure proficient measurement and accurate reporting. This is particularly important if there are financial incentives for landholders to adopt practices to sequester C. We review sensing for C accounting and discuss the requirements for the development of new soil C accounting methods based on sensing, including requirements for reporting, auditing and verification.
Madlene Nussbaum, Kay Spiess, Andri Baltensweiler, Urs Grob, Armin Keller, Lucie Greiner, Michael E. Schaepman, and Andreas Papritz
SOIL, 4, 1–22, https://doi.org/10.5194/soil-4-1-2018, https://doi.org/10.5194/soil-4-1-2018, 2018
Short summary
Short summary
This paper presents an extensive evaluation of digital soil mapping (DSM) tools. Recently, large sets of environmental covariates (e.g. from analysis of terrain on multiple scales) have become more common for DSM. Many DSM studies, however, only compared DSM methods using less than 30 covariates or tested approaches on few responses. We built DSM models from 300–500 covariates using six approaches that are either popular in DSM or promising for large covariate sets.
R. Murray Lark, Elliott M. Hamilton, Belinda Kaninga, Kakoma K. Maseka, Moola Mutondo, Godfrey M. Sakala, and Michael J. Watts
SOIL, 3, 235–244, https://doi.org/10.5194/soil-3-235-2017, https://doi.org/10.5194/soil-3-235-2017, 2017
Short summary
Short summary
An advantage of geostatistics for mapping soil properties is that, given a statistical model of the variable of interest, we can make a rational decision about how densely to sample so that the map is sufficiently precise. However, uncertainty about the statistical model affects this process. In this paper we show how Bayesian methods can be used to support decision making on sampling with an uncertain model, ensuring that the probability of meeting certain levels of precision is high enough.
Madlene Nussbaum, Lorenz Walthert, Marielle Fraefel, Lucie Greiner, and Andreas Papritz
SOIL, 3, 191–210, https://doi.org/10.5194/soil-3-191-2017, https://doi.org/10.5194/soil-3-191-2017, 2017
Short summary
Short summary
Digital soil mapping (DSM) relates soil property data to environmental data that describe soil-forming factors. With imagery sampled from satellites or terrain analysed at multiple scales, large sets of possible input to DSM are available. We propose a new statistical framework (geoGAM) that selects parsimonious models for DSM and illustrate the application of geoGAM to two study regions. Straightforward interpretation of the modelled effects likely improves end-user acceptance of DSM products.
Hannes Keck, Bjarne W. Strobel, Jon Petter Gustafsson, and John Koestel
SOIL, 3, 177–189, https://doi.org/10.5194/soil-3-177-2017, https://doi.org/10.5194/soil-3-177-2017, 2017
Short summary
Short summary
Several studies have shown that the cation adsorption sites in soils are heterogeneously distributed in space. In many soil system models this knowledge is not included yet. In our study we proposed a new method to map the 3-D distribution of cation adsorption sites in undisturbed soils. The method is based on three-dimensional X-ray scanning with a contrast agent and image analysis. We are convinced that this approach will strongly aid the development of more realistic soil system models.
Laura Arata, Katrin Meusburger, Alexandra Bürge, Markus Zehringer, Michael E. Ketterer, Lionel Mabit, and Christine Alewell
SOIL, 3, 113–122, https://doi.org/10.5194/soil-3-113-2017, https://doi.org/10.5194/soil-3-113-2017, 2017
Christopher Poeplau, Cora Vos, and Axel Don
SOIL, 3, 61–66, https://doi.org/10.5194/soil-3-61-2017, https://doi.org/10.5194/soil-3-61-2017, 2017
Short summary
Short summary
This paper shows that three out of four frequently used methods to calculate soil organic carbon stocks lead to systematic overestimation of those stocks. Stones, which can be assumed to be free of carbon, have to be corrected for in both bulk density and layer thickness. We used data of the German Agricultural Soil Inventory to illustrate the potential bias and suggest a unified and unbiased calculation method for stocks of soil organic carbon, which is the largest terrestrial carbon pool.
Jan M. van Mourik, Thomas V. Wagner, J. Geert de Boer, and Boris Jansen
SOIL, 2, 299–310, https://doi.org/10.5194/soil-2-299-2016, https://doi.org/10.5194/soil-2-299-2016, 2016
Ranjith P. Udawatta, Clark J. Gantzer, Stephen H. Anderson, and Shmuel Assouline
SOIL, 2, 211–220, https://doi.org/10.5194/soil-2-211-2016, https://doi.org/10.5194/soil-2-211-2016, 2016
Short summary
Short summary
Soil compaction degrades soil structure and affects water, heat, and gas exchange as well as root penetration and crop production. The objective of this study was to use X-ray computed microtomography (CMT) techniques to compare differences in geometrical soil pore parameters as influenced by compaction of two different aggregate size classes.
B. Reidy, I. Simo, P. Sills, and R. E. Creamer
SOIL, 2, 25–39, https://doi.org/10.5194/soil-2-25-2016, https://doi.org/10.5194/soil-2-25-2016, 2016
Short summary
Short summary
This study reviews pedotransfer functions from the literature for different soil and horizon types. It uses these formulae to predict bulk density (ρb) per horizon using measured data of other soil properties. These data were compared to known pb per horizon and recalibrated. These calculations were used to fill missing horizon data in the Irish soil database. This allowed the generation of a pb map to 50 cm. These pb data are at horizon level allowing more accurate estimation of C with depth.
J. J. Keizer, M. A. S. Martins, S. A. Prats, L. F. Santos, D. C. S. Vieira, R. Nogueira, and L. Bilro
SOIL, 1, 641–650, https://doi.org/10.5194/soil-1-641-2015, https://doi.org/10.5194/soil-1-641-2015, 2015
Short summary
Short summary
In this study, a novel plastic optical fibre turbidity sensor was exhaustively tested with a large set of runoff samples, mainly from a recently burnt area. The different types of samples from the distinct study sites revealed without exception an increase in normalized light loss with increasing sediment concentrations that agreed (reasonably) well with a power function. Nevertheless, sensor-based predictions of sediment concentration should ideally involve site-specific calibrations.
C. Rasmussen, R. E. Gallery, and J. S. Fehmi
SOIL, 1, 631–639, https://doi.org/10.5194/soil-1-631-2015, https://doi.org/10.5194/soil-1-631-2015, 2015
Short summary
Short summary
There is a need to understand the response of soil systems to predicted climate warming for modeling soil processes. Current experimental methods for soil warming include expensive and difficult to implement active and passive techniques. Here we test a simple, inexpensive in situ passive soil heating approach, based on easy to construct infrared mirrors that do not require automation or enclosures. Results indicated that the infrared mirrors yielded significant heating and drying of soils.
E. Nadal-Romero, J. Revuelto, P. Errea, and J. I. López-Moreno
SOIL, 1, 561–573, https://doi.org/10.5194/soil-1-561-2015, https://doi.org/10.5194/soil-1-561-2015, 2015
Short summary
Short summary
Geomatic techniques have been routinely applied in erosion studies, providing the opportunity to build high-resolution topographic models.The aim of this study is to assess and compare the functioning of terrestrial laser scanner and close range photogrammetry techniques to evaluate erosion and deposition processes in a humid badlands area.
Our results demonstrated that north slopes experienced more intense and faster dynamics than south slopes as well as the highest erosion rates.
L. M. Thomsen, J. E. M. Baartman, R. J. Barneveld, T. Starkloff, and J. Stolte
SOIL, 1, 399–410, https://doi.org/10.5194/soil-1-399-2015, https://doi.org/10.5194/soil-1-399-2015, 2015
B. A. Miller, S. Koszinski, M. Wehrhan, and M. Sommer
SOIL, 1, 217–233, https://doi.org/10.5194/soil-1-217-2015, https://doi.org/10.5194/soil-1-217-2015, 2015
Short summary
Short summary
There are many different strategies for mapping SOC, among which is to model the variables needed to calculate the SOC stock indirectly or to model the SOC stock directly. The purpose of this research was to compare these two approaches for mapping SOC stocks from multiple linear regression models applied at the landscape scale via spatial association. Although the indirect approach had greater spatial variation and higher R2 values, the direct approach had a lower total estimated error.
W. Eugster and L. Merbold
SOIL, 1, 187–205, https://doi.org/10.5194/soil-1-187-2015, https://doi.org/10.5194/soil-1-187-2015, 2015
Short summary
Short summary
The eddy covariance (EC) method has become increasingly popular in soil science. The basic concept of this method and its use in different types of experimental designs in the field are given, and we indicate where progress in advancing and extending the field of applications is made. The greatest strengths of EC measurements in soil science are (1) their uninterrupted continuous measurement of gas concentrations and fluxes and (2) spatial integration over
small-scale heterogeneity in the soil.
Cited articles
Baroni, M., Bernardi, R., Do, N.-Q., and chieh Shan, C.: Entailment above the
word level in distributional semantics, in: Proceedings of the 13th
Conference of the European Chapter of the Association for Computational
Linguistics, Association for Computational Linguistics, 23–32, 2012. a
Baroni, M., Dinu, G., and Kruszewski, G.: Don't count, predict! A systematic
comparison of context-counting vs. context-predicting semantic vectors, in:
Proceedings of the 52nd Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), Vol. 1, 238–247, 2014. a
Baxter, W. and ichi Anjyo, K.: Latent doodle space, in: Computer Graphics
Forum, Wiley Online Library, Vol. 25, 477–485, 2006. a
Bengio, Y.: Neural net language models, Scholarpedia, 3, 3881, https://doi.org/10.4249/scholarpedia.3881, 2008. a
Bidwell, O. and Hole, F.: Numerical taxonomy and soil classification, Soil
Sci., 97, 58–62, 1964. a
Bird, S. and Loper, E.: NLTK: the natural language toolkit, in: Proceedings
of the ACL 2004 on Interactive poster and demonstration sessions,
Association for Computational Linguistics, p. 31, 2004. a
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T.: Enriching Word Vectors
with Subword Information, arXiv preprint arXiv:1607.04606, 2016. a
Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., and Bengio,
S.: Generating sentences from a continuous space, arXiv preprint
arXiv:1511.06349, 2015. a
Caté, A., Perozzi, L., Gloaguen, E., and Blouin, M.: Machine learning as a
tool for geologists, The Leading Edge, 36, 215–219, 2017. a
Crommelin, R. D. and De Gruijter, J.: Cluster analysis applied to
mineralogical data from the coversand formation in the Netherlands, Tech.
Rep., Stichting voor Bodemkartering Wageningen, 1973. a
Davies, M. and Fleiss, J. L.: Measuring agreement for multinomial data,
Biometrics, 1047–1051, 1982. a
Doherty, M. E. and Balzer, W. K.: Cognitive feedback, in: Advances in
psychology, Elsevier, Vol. 54, 163–197, 1988. a
Duong, L., Kanayama, H., Ma, T., Bird, S., and Cohn, T.: Learning crosslingual
word embeddings without bilingual corpora, arXiv preprint arXiv:1606.09403,
2016. a
FAO: FAO/UNESCO Soil Map of the World. Revised legend, with corrections and
updates, World Soil Resources Report, 60, 140 pp., 1988. a
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman,
G., and Ruppin, E.: Placing search in context: The concept revisited, ACM
T. Inform. Syst., 20, 116–131, 2002. a
Fonseca, F. T., Egenhofer, M. J., Agouris, P., and Câmara, G.: Using
ontologies for integrated geographic information systems, T.
GIS, 6, 231–257, 2002. a
Gilbert, S. W.: Model building and a definition of science, J.
Res. Sci. Teach., 28, 73–79, 1991. a
Goldstein, J., Mittal, V., Carbonell, J., and Kantrowitz, M.: Multi-document
summarization by sentence extraction, in: Proceedings of the 2000
NAACL-ANLP Workshop on Automatic summarization, Association for
Computational Linguistics, 40–48, 2000. a
Heimerl, F. and Gleicher, M.: Interactive analysis of word vector embeddings,
in: Computer Graphics Forum, Wiley Online Library,
Vol. 37, 253–265, 2018. a
Hsu, W.-N., Zhang, Y., and Glass, J.: Learning latent representations for
speech generation and transformation, arXiv preprint arXiv:1704.04222, 2017. a
Hughes, P. A., McBratney, A. B., Minasny, B., and Campbell, S.: End members,
end points and extragrades in numerical soil classification, Geoderma, 226,
365–375, 2014. a
Jain, A., Kulkarni, G., and Shah, V.: Natural language processing,
Int. J. Comput. Sci. Eng., 6, 161–167, 2018. a
Kartchner, D., Christensen, T., Humpherys, J., and Wade, S.: Code2vec:
Embedding and clustering medical diagnosis data, in: 2017 IEEE
International Conference on Healthcare Informatics (ICHI),
IEEE, 386–390, 2017. a
Lary, D. J., Alavi, A. H., Gandomi, A. H., and Walker, A. L.: Machine learning
in geosciences and remote sensing, Geosci. Front., 7, 3–10, 2016. a
LeCun, Y., Bengio, Y., and Hinton, G.: Deep learning, Nature, 521, 436–444, 2015. a
Maxwell, A. E., Warner, T. A., and Fang, F.: Implementation of
machine-learning classification in remote sensing: An applied review,
Int. J. Remote Sens., 39, 2784–2817, 2018. a
McBratney, A., Mendonça Santos, M. L., and Minasny, B.: On digital soil
mapping, Geoderma, 117, 3–52, 2003. a
Mikolov, T., Le, Q. V., and Sutskever, I.: Exploiting similarities among
languages for machine translation, arXiv preprint arXiv:1309.4168,
2013a. a
Mikolov, T., tau Yih, W., and Zweig, G.: Linguistic regularities in continuous
space word representations, in: Proceedings of the 2013 Conference of the
North American Chapter of the Association for Computational Linguistics:
Human Language Technologies, 746–751, 2013c. a
Miller, G. A.: WordNet: a lexical database for English, Commun.
ACM, 38, 39–41, 1995. a
Mosavi, A., Ozturk, P., and wing Chau, K.: Flood prediction using machine
learning models: Literature review, Water, 10, 1536, https://doi.org/10.3390/w10111536, 2018. a
Nooralahzadeh, F., Øvrelid, L., and Lønning, J. T.: Evaluation of
Domain-specific Word Embeddings using Knowledge Resources, in: Proceedings
of the Eleventh International Conference on Language Resources and Evaluation
(LREC-2018), 1438–1445, 2018. a
Nunez-Mir, G. C., Iannone, B. V., Pijanowski, B. C., Kong, N., and Fei, S.:
Automated content analysis: addressing the big literature challenge in
ecology and evolution, Methods Ecol. Evol., 7, 1262–1272,
2016. a
Padarian, J. and Fuentes, I.: GeoVec, Word embeddings for application in geosciences: development, evaluation and examples of soil-related concepts, https://doi.org/10.17605/OSF.IO/4UYEQ, last access: 12 July 2019.
Pande, H.: Effective search space reduction for spell correction using
character neural embeddings, in: Proceedings of the 15th Conference of the
European Chapter of the Association for Computational Linguistics: Volume 2,
Short Papers, Vol. 2, 170–174, 2017. a
Peckham, S.: The CSDMS standard names: Cross-domain naming conventions for describing
process models, data sets and their associated variables, in: Proceedings of the 7th International
Congress on Environmental Modelling and Software, San Diego, California, 67–74, 2014. a
Pedersen, T., Pakhomov, S. V., Patwardhan, S., and Chute, C. G.: Measures of
semantic similarity and relatedness in the biomedical domain, J.
Biomed. Inform., 40, 288–299, 2007. a
Rosenberg, A. and Hirschberg, J.: V-measure: A conditional entropy-based
external cluster evaluation measure, in: Proceedings of the 2007 joint
conference on empirical methods in natural language processing and
computational natural language learning (EMNLP-CoNLL), 2007. a
Roy, A., Park, Y., and Pan, S.: Learning Domain-Specific Word Embeddings from
Sparse Cybersecurity Texts, arXiv preprint arXiv:1709.07470, 2017. a
Rubenstein, H. and Goodenough, J. B.: Contextual correlates of synonymy,
Commun. ACM, 8, 627–633, 1965. a
Schnabel, T., Labutov, I., Mimno, D., and Joachims, T.: Evaluation methods for
unsupervised word embeddings, in: Proceedings of the 2015 Conference on
Empirical Methods in Natural Language Processing, 298–307, 2015. a
Sneath, P. H., and Sokal, R. R.: Numerical taxonomy, The principles and
practice of numerical classification, 573 pp., 1973. a
Suits, D. B.: Use of dummy variables in regression equations, J.
Am. Stat. Assoc., 52, 548–551, 1957. a
Turian, J., Ratinov, L., and Bengio, Y.: Word representations: a simple and
general method for semi-supervised learning, in: Proceedings of the 48th
annual meeting of the association for computational linguistics,
Association for Computational Linguistics, 384–394, 2010. a
Upchurch, P., Gardner, J. R., Pleiss, G., Pless, R., Snavely, N., Bala, K., and
Weinberger, K. Q.: Deep Feature Interpolation for Image Content Changes,
Proceedings of the IEEE conference on computer vision and pattern
recognition, 1, 7064–7073, 2017. a
USDA, N.: Keys to soil taxonomy, Soil Survey Staff, Washington, 2010. a
Venugopalan, S., Hendricks, L. A., Mooney, R., and Saenko, K.: Improving
LSTM-based video description with linguistic knowledge mined from text,
arXiv preprint arXiv:1604.01729, 2016. a
Wang, C. and Blei, D. M.: Collaborative topic modeling for recommending
scientific articles, in: Proceedings of the 17th ACM SIGKDD international
conference on Knowledge discovery and data mining, ACM, 448–456, 2011. a
Webster, R.: Quantitative and numerical methods in soil classification
and survey, p. 269, 1977. a
Yeh, R., Liu, Z., Goldman, D. B., and Agarwala, A.: Semantic facial expression
editing using autoencoded flow, arXiv preprint arXiv:1611.09961, 2016. a
Short summary
A large amount of descriptive information is available in geosciences. Considering the advances in natural language it is possible to
rescuethis information and transform it into a numerical form (embeddings). We used 280764 full-text scientific articles to train a language model capable of generating such embeddings. Our domain-specific embeddings (GeoVec) outperformed general domain embedding tasks such as analogies, relatedness, and categorisation, and can be used in novel applications.
A large amount of descriptive information is available in geosciences. Considering the advances...