Articles | Volume 9, issue 1
https://doi.org/10.5194/soil-9-155-2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/soil-9-155-2023
© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Potential of natural language processing for metadata extraction from environmental scientific publications
Guillaume Blanchy
CORRESPONDING AUTHOR
Department of Plant, Flanders Research Institute for Agriculture, Fisheries and Food
(ILVO), Melle, Belgium
Lukas Albrecht
Soil Fertility and Soil Protection, Agroscope, Reckenholzstrasse 191, 8046, Zurich, Switzerland
John Koestel
Soil Fertility and Soil Protection, Agroscope, Reckenholzstrasse 191, 8046, Zurich, Switzerland
Department of Soil and Environment, Institute for Soil and Environment, Swedish University of Agricultural
Sciences, Box 7014, 75007 Uppsala, Sweden
Sarah Garré
Department of Plant, Flanders Research Institute for Agriculture, Fisheries and Food
(ILVO), Melle, Belgium
Related authors
Guillaume Blanchy, Waldo Deroo, Tom De Swaef, Peter Lootens, Paul Quataert, Isabel Roldán-Ruíz, Roelof Versteeg, and Sarah Garré
SOIL, 11, 67–84, https://doi.org/10.5194/soil-11-67-2025, https://doi.org/10.5194/soil-11-67-2025, 2025
Short summary
Short summary
This work implemented automated electrical resistivity tomography (ERT) for belowground field phenotyping alongside conventional field breeding techniques, thereby closing the phenotyping gap. We show that ERT is not only capable of measuring differences between crops but also has sufficient precision to capture the differences between genotypes of the same crop. We automatically derive indicators, which can be translated to static and dynamic plant traits, directly useful for breeders.
Solomon Ehosioke, Sarah Garré, Johan Alexander Huisman, Egon Zimmermann, Mathieu Javaux, and Frédéric Nguyen
Biogeosciences, 22, 2853–2869, https://doi.org/10.5194/bg-22-2853-2025, https://doi.org/10.5194/bg-22-2853-2025, 2025
Short summary
Short summary
Understanding the electromagnetic properties of plant roots is useful to quantify plant properties and monitor plant physiological responses to changing environmental factors. We investigated the electrical properties of the primary roots of Brachypodium and maize plants during the uptake of fresh and saline water using spectral induced polarization. Our results indicate that salinity tolerance varies with the species and that Maize is more tolerant to salinity than Brachypodium.
Jayson Gabriel Pinza, Ona-Abeni Devos Stoffels, Robrecht Debbaut, Jan Staes, Jan Vanderborght, Patrick Willems, and Sarah Garré
EGUsphere, https://doi.org/10.5194/egusphere-2025-1166, https://doi.org/10.5194/egusphere-2025-1166, 2025
Short summary
Short summary
We can use hydrological models to estimate how water is allocated in soils with compaction. However, compaction can also affect how much plants can grow in the field. Here, we show that when we consider this affected plant growth in our sandy soil compaction model, the resulting water allocation can change a lot. Thus, to get more reliable model results, we should know the plant growth (above and below the ground) in the field and include them in the models.
Guillaume Blanchy, Waldo Deroo, Tom De Swaef, Peter Lootens, Paul Quataert, Isabel Roldán-Ruíz, Roelof Versteeg, and Sarah Garré
SOIL, 11, 67–84, https://doi.org/10.5194/soil-11-67-2025, https://doi.org/10.5194/soil-11-67-2025, 2025
Short summary
Short summary
This work implemented automated electrical resistivity tomography (ERT) for belowground field phenotyping alongside conventional field breeding techniques, thereby closing the phenotyping gap. We show that ERT is not only capable of measuring differences between crops but also has sufficient precision to capture the differences between genotypes of the same crop. We automatically derive indicators, which can be translated to static and dynamic plant traits, directly useful for breeders.
Tobias Karl David Weber, Lutz Weihermüller, Attila Nemes, Michel Bechtold, Aurore Degré, Efstathios Diamantopoulos, Simone Fatichi, Vilim Filipović, Surya Gupta, Tobias L. Hohenbrink, Daniel R. Hirmas, Conrad Jackisch, Quirijn de Jong van Lier, John Koestel, Peter Lehmann, Toby R. Marthews, Budiman Minasny, Holger Pagel, Martine van der Ploeg, Shahab Aldin Shojaeezadeh, Simon Fiil Svane, Brigitta Szabó, Harry Vereecken, Anne Verhoef, Michael Young, Yijian Zeng, Yonggen Zhang, and Sara Bonetti
Hydrol. Earth Syst. Sci., 28, 3391–3433, https://doi.org/10.5194/hess-28-3391-2024, https://doi.org/10.5194/hess-28-3391-2024, 2024
Short summary
Short summary
Pedotransfer functions (PTFs) are used to predict parameters of models describing the hydraulic properties of soils. The appropriateness of these predictions critically relies on the nature of the datasets for training the PTFs and the physical comprehensiveness of the models. This roadmap paper is addressed to PTF developers and users and critically reflects the utility and future of PTFs. To this end, we present a manifesto aiming at a paradigm shift in PTF research.
Gina Garland, John Koestel, Alice Johannes, Olivier Heller, Sebastian Doetterl, Dani Or, and Thomas Keller
SOIL, 10, 23–31, https://doi.org/10.5194/soil-10-23-2024, https://doi.org/10.5194/soil-10-23-2024, 2024
Short summary
Short summary
The concept of soil aggregates is hotly debated, leading to confusion about their function or relevancy to soil processes. We propose that the use of conceptual figures showing detached and isolated aggregates can be misleading and has contributed to this skepticism. Here, we conceptually illustrate how aggregates can form and dissipate within the context of undisturbed soils, highlighting the fact that aggregates do not necessarily need to have distinct physical boundaries.
Guillaume Blanchy, Lukas Albrecht, Gilberto Bragato, Sarah Garré, Nicholas Jarvis, and John Koestel
Hydrol. Earth Syst. Sci., 27, 2703–2724, https://doi.org/10.5194/hess-27-2703-2023, https://doi.org/10.5194/hess-27-2703-2023, 2023
Short summary
Short summary
We collated the Open Tension-disk Infiltrometer Meta-database (OTIM). We analysed topsoil hydraulic conductivities at supply tensions between 0 and 100 mm of 466 data entries. We found indications of different flow mechanisms at saturation and at tensions >20 mm. Climate factors were better correlated with near-saturated hydraulic conductivities than soil properties. Land use, tillage system, soil compaction and experimenter bias significantly influenced K to a similar degree to soil properties.
Guillaume Blanchy, Gilberto Bragato, Claudia Di Bene, Nicholas Jarvis, Mats Larsbo, Katharina Meurer, and Sarah Garré
SOIL, 9, 1–20, https://doi.org/10.5194/soil-9-1-2023, https://doi.org/10.5194/soil-9-1-2023, 2023
Short summary
Short summary
European agriculture is vulnerable to weather extremes. Nevertheless, by choosing well how to manage their land, farmers can protect themselves against drought and peak rains. More than a thousand observations across Europe show that it is important to keep the soil covered with living plants, even in winter. A focus on a general reduction of traffic on agricultural land is more important than reducing tillage. Organic material needs to remain or be added on the field as much as possible.
Ulrich Weller, Lukas Albrecht, Steffen Schlüter, and Hans-Jörg Vogel
SOIL, 8, 507–515, https://doi.org/10.5194/soil-8-507-2022, https://doi.org/10.5194/soil-8-507-2022, 2022
Short summary
Short summary
Soil structure is of central importance for soil functions. It is, however, ill defined. With the increasing availability of X-ray CT scanners, more and more soils are scanned and an undisturbed image of the soil's structure is produced. Often, a qualitative description is all that is derived from these images. We provide now a web-based Soil Structure Library where these images can be evaluated in a standardized quantitative way and can be compared to a world-wide data set.
Cited articles
Angeli, G., Johnson Premkumar, M. J., and Manning, C. D.: Leveraging
Linguistic Structure For Open Domain Information Extraction, in: Proceedings
of the 53rd Annual Meeting of the Association for Computational Linguistics
and the 7th International Joint Conference on Natural Language Processing
(Volume 1: Long Papers), Proceedings of the 53rd Annual Meeting of the
Association for Computational Linguistics and the 7th International Joint
Conference on Natural Language Processing (Volume 1: Long Papers), Beijing,
China, 344–354, https://doi.org/10.3115/v1/P15-1034, 2015.
Caracciolo, C., Stellato, A., Morshed, A., Johannsen, G., Rajbhandari, S.,
Jaques, Y., and Keizer, J.: The AGROVOC linked dataset, AGROVOC, 4, 341–348, 2013.
EJP SOIL – CLIMASOMA: CLIMASOMA – Final report Climate change
adaptation through soil and crop management: Synthesis and ways forward, https://climasoma.curve.space/report (last access: 1 March 2023),
2022.
Furey, J., Davis, A., and Seiter-Moser, J.: Natural language indexing for
pedoinformatics, Geoderma, 334, 49–54,
https://doi.org/10.1016/j.geoderma.2018.07.050, 2019.
Haddaway, N. R., Callaghan, M. W., Collins, A. M., Lamb, W. F., Minx, J. C.,
Thomas, J., and John, D.: On the use of computer-assistance to facilitate
systematic mapping, Campbell Systematic Reviews, 16, e1129,
https://doi.org/10.1002/cl2.1129, 2020.
Hirschberg, J. and Manning, C. D.: Advances in natural language processing,
Science, 349, 261–266, https://doi.org/10.1126/science.aaa8685, 2015.
Honnibal, M. and Montani, I.: spaCy 2: Natural language understanding with
bloom embeddings, convolutional neural networks and incremental parsing, To Appear, 7, 411–420,
2017.
Jarvis, N., Koestel, J., Messing, I., Moeys, J., and Lindahl, A.: Influence of soil, land use and climatic factors on the hydraulic conductivity of soil, Hydrol. Earth Syst. Sci., 17, 5185–5195, https://doi.org/10.5194/hess-17-5185-2013, 2013.
Koroteev, M. V.: BERT: A Review of Applications in Natural Language Processing and Understanding (arXiv:2103.11943), arXiv, https://doi.org/10.48550/arXiv.2103.11943, 2021.
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., and Kang, J.:
BioBERT: a pre-trained biomedical language representation model for
biomedical text mining, Bioinformatics, 36, 1234–1240,
https://doi.org/10.1093/bioinformatics/btz682, 2020.
Lin, J.: Divergence measures based on the Shannon entropy, IEEE T. Inform. Theory, 37, 145–151,
https://doi.org/10.1109/18.61115, 1991.
Loper, E. and Bird, S.: NLTK: The Natural Language Toolkit (arXiv:cs/0205028),
arXiv, https://doi.org/10.48550/arXiv.cs/0205028, 2002.
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., and McClosky,
D.: The Stanford CoreNLP Natural Language Processing Toolkit, in:
Proceedings of 52nd Annual Meeting of the Association for Computational
Linguistics: System Demonstrations, Proceedings of 52nd Annual Meeting of
the Association for Computational Linguistics: System Demonstrations,
Baltimore, Maryland, 55–60,
https://doi.org/10.3115/v1/P14-5010, 2014.
Nadkarni, P. M., Ohno-Machado, L., and Chapman, W. W.: Natural language
processing: an introduction, J. Am. Med. Inform. Assoc., 18, 544–551,
https://doi.org/10.1136/amiajnl-2011-000464, 2011.
Nasar, Z., Jaffry, S. W., and Malik, M. K.: Information extraction from
scientific articles: a survey, Scientometrics, 117, 1931–1990,
https://doi.org/10.1007/s11192-018-2921-5, 2018.
Niklaus, C., Cetto, M., Freitas, A., and Handschuh, S.: A Survey on Open Information Extraction (arXiv:1806.05599), arXiv,
https://doi.org/10.48550/arXiv.1806.05599, 2018.
Padarian, J. and Fuentes, I.: Word embeddings for application in geosciences: development, evaluation, and examples of soil-related concepts, SOIL, 5, 177–187, https://doi.org/10.5194/soil-5-177-2019, 2019.
Padarian, J., Minasny, B., and McBratney, A. B.: Machine learning and soil sciences: a review aided by machine learning tools, SOIL, 6, 35–52, https://doi.org/10.5194/soil-6-35-2020, 2020.
Ramakrishnan, C., Patnia, A., Hovy, E., and Burns, G. A.: Layout-aware text
extraction from full-text PDF of scientific articles, Source Code Biol. Med.,
7, 7, https://doi.org/10.1186/1751-0473-7-7, 2012.
Rastan, R., Paik, H.-Y., and Shepherd, J.: TEXUS: A unified framework for
extracting and understanding tables in PDF documents, Info. Proc. Manage., 56, 895–918,
https://doi.org/10.1016/j.ipm.2019.01.008, 2019.
Röder, M., Both, A., and Hinneburg, A.: Exploring the Space of Topic
Coherence Measures, in: Proceedings of the Eighth ACM International
Conference on Web Search and Data Mining, WSDM 2015: Eighth ACM
International Conference on Web Search and Data Mining, Shanghai China,
399–408, https://doi.org/10.1145/2684822.2685324, 2015.
Sievert, C. and Shirley, K.: LDAvis: A method for visualizing and
interpreting topics, in: Proceedings of the Workshop on Interactive Language
Learning, Visualization, and Interfaces, Proceedings of the Workshop on
Interactive Language Learning, Visualization, and Interfaces, Baltimore,
Maryland, USA, 63–70, https://doi.org/10.3115/v1/W14-3110,
2014.
Tao, C., Filannino, M., and Uzuner, Ö.: Prescription Extraction Using
CRFs and Word Embeddings, J. Biomed. Inform., 72, 60–66,
https://doi.org/10.1016/j.jbi.2017.07.002, 2017.
Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N.,
Liu, S., Zeng, Y., Mehrabi, S., Sohn, S., and Liu, H.: Clinical information
extraction applications: A literature review, J. Biomed. Inform., 77, 34–49,
https://doi.org/10.1016/j.jbi.2017.11.011, 2017.
Short summary
Adapting agricultural practices to future climatic conditions requires us to synthesize the effects of management practices on soil properties with respect to local soil and climate. We showcase different automated text-processing methods to identify topics, extract metadata for building a database and summarize findings from publication abstracts. While human intervention remains essential, these methods show great potential to support evidence synthesis from large numbers of publications.
Adapting agricultural practices to future climatic conditions requires us to synthesize the...