Articles | Volume 5, issue 2
https://doi.org/10.5194/soil-5-177-2019
© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/soil-5-177-2019
© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Word embeddings for application in geosciences: development, evaluation, and examples of soil-related concepts
José Padarian
CORRESPONDING AUTHOR
Sydney Institute of Agriculture & School of Life and Environmental Sciences, The University of Sydney, New South Wales, Australia
Ignacio Fuentes
Sydney Institute of Agriculture & School of Life and Environmental Sciences, The University of Sydney, New South Wales, Australia
Viewed
Total article views: 5,068 (including HTML, PDF, and XML)
Cumulative views and downloads
(calculated since 29 Jan 2019)
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 3,541 | 1,334 | 193 | 5,068 | 208 | 205 |
- HTML: 3,541
- PDF: 1,334
- XML: 193
- Total: 5,068
- BibTeX: 208
- EndNote: 205
Total article views: 4,189 (including HTML, PDF, and XML)
Cumulative views and downloads
(calculated since 17 Jul 2019)
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 3,215 | 793 | 181 | 4,189 | 195 | 194 |
- HTML: 3,215
- PDF: 793
- XML: 181
- Total: 4,189
- BibTeX: 195
- EndNote: 194
Total article views: 879 (including HTML, PDF, and XML)
Cumulative views and downloads
(calculated since 29 Jan 2019)
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 326 | 541 | 12 | 879 | 13 | 11 |
- HTML: 326
- PDF: 541
- XML: 12
- Total: 879
- BibTeX: 13
- EndNote: 11
Viewed (geographical distribution)
Total article views: 5,068 (including HTML, PDF, and XML)
Thereof 4,410 with geography defined
and 658 with unknown origin.
Total article views: 4,189 (including HTML, PDF, and XML)
Thereof 3,752 with geography defined
and 437 with unknown origin.
Total article views: 879 (including HTML, PDF, and XML)
Thereof 658 with geography defined
and 221 with unknown origin.
| Country | # | Views | % |
|---|
| Country | # | Views | % |
|---|
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
1
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
1
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
1
Cited
19 citations as recorded by crossref.
- Evaluating the feasibility of using artificial neural networks to predict lithofacies in complex glacial deposits Z. Hammond & D. Allen https://doi.org/10.1007/s10040-023-02726-2
- Method for Automatic Classification of Full-Text Descriptions of Cores Using Dictionaries A. Antonov et al. https://doi.org/10.3103/S0005105526700081
- 3D lithological mapping of borehole descriptions using word embeddings I. Fuentes et al. https://doi.org/10.1016/j.cageo.2020.104516
- Classification of geological borehole descriptions using a domain adapted large language model H. Ghorbanfekr et al. https://doi.org/10.1016/j.acags.2025.100229
- Machine Learning and Artificial Intelligence Applications in Soil Science B. Minasny & A. McBratney https://doi.org/10.1111/ejss.70093
- Portuguese word embeddings for the oil and gas industry: Development and evaluation D. Gomes et al. https://doi.org/10.1016/j.compind.2020.103347
- Deep learning text classification of borehole logs for regional scale modeling of hydrofacies (Po Plain, N Italy) A. Previati et al. https://doi.org/10.1016/j.ejrh.2024.102157
- Applications of Natural Language Processing to Geoscience Text Data and Prospectivity Modeling C. Lawley et al. https://doi.org/10.1007/s11053-023-10216-1
- Geoscience language models and their intrinsic evaluation C. Lawley et al. https://doi.org/10.1016/j.acags.2022.100084
- Standardization and interpretable analysis of geological database using retrieval-augmented large language model W. Yan et al. https://doi.org/10.1016/j.geoai.2025.100047
- dh2loop 1.0: an open-source Python library for automated processing and classification of geological logs R. Joshi et al. https://doi.org/10.5194/gmd-14-6711-2021
- Can linguistic features extracted from geo-referenced tweets help building function classification in remote sensing? M. Häberle et al. https://doi.org/10.1016/j.isprsjprs.2022.04.006
- Potential of natural language processing for metadata extraction from environmental scientific publications G. Blanchy et al. https://doi.org/10.5194/soil-9-155-2023
- A novel few-shot learning framework for rock images dually driven by data and knowledge Z. Chen et al. https://doi.org/10.1016/j.acags.2024.100155
- Enhancing soil science research with multi-agent artificial intelligence systems B. Minasny et al. https://doi.org/10.3389/fsci.2026.1721295
- Artificial intelligence in soil science A. Wadoux https://doi.org/10.1111/ejss.70080
- Interpreting semi-structured tabular documents with domain-specific knowledge from construction site investigation texts E. Yang et al. https://doi.org/10.1016/j.aei.2026.104796
- Machine Learning-Based Prospective Modeling for Alluvial Gold Mining: A Study Area in Colombia F. Bertaiola et al. https://doi.org/10.1007/s00024-025-03830-y
- Forecasting landslide deformation by integrating domain knowledge into interpretable deep learning considering spatiotemporal correlations Z. Ma & G. Mei https://doi.org/10.1016/j.jrmge.2024.02.034
19 citations as recorded by crossref.
- Evaluating the feasibility of using artificial neural networks to predict lithofacies in complex glacial deposits Z. Hammond & D. Allen https://doi.org/10.1007/s10040-023-02726-2
- Method for Automatic Classification of Full-Text Descriptions of Cores Using Dictionaries A. Antonov et al. https://doi.org/10.3103/S0005105526700081
- 3D lithological mapping of borehole descriptions using word embeddings I. Fuentes et al. https://doi.org/10.1016/j.cageo.2020.104516
- Classification of geological borehole descriptions using a domain adapted large language model H. Ghorbanfekr et al. https://doi.org/10.1016/j.acags.2025.100229
- Machine Learning and Artificial Intelligence Applications in Soil Science B. Minasny & A. McBratney https://doi.org/10.1111/ejss.70093
- Portuguese word embeddings for the oil and gas industry: Development and evaluation D. Gomes et al. https://doi.org/10.1016/j.compind.2020.103347
- Deep learning text classification of borehole logs for regional scale modeling of hydrofacies (Po Plain, N Italy) A. Previati et al. https://doi.org/10.1016/j.ejrh.2024.102157
- Applications of Natural Language Processing to Geoscience Text Data and Prospectivity Modeling C. Lawley et al. https://doi.org/10.1007/s11053-023-10216-1
- Geoscience language models and their intrinsic evaluation C. Lawley et al. https://doi.org/10.1016/j.acags.2022.100084
- Standardization and interpretable analysis of geological database using retrieval-augmented large language model W. Yan et al. https://doi.org/10.1016/j.geoai.2025.100047
- dh2loop 1.0: an open-source Python library for automated processing and classification of geological logs R. Joshi et al. https://doi.org/10.5194/gmd-14-6711-2021
- Can linguistic features extracted from geo-referenced tweets help building function classification in remote sensing? M. Häberle et al. https://doi.org/10.1016/j.isprsjprs.2022.04.006
- Potential of natural language processing for metadata extraction from environmental scientific publications G. Blanchy et al. https://doi.org/10.5194/soil-9-155-2023
- A novel few-shot learning framework for rock images dually driven by data and knowledge Z. Chen et al. https://doi.org/10.1016/j.acags.2024.100155
- Enhancing soil science research with multi-agent artificial intelligence systems B. Minasny et al. https://doi.org/10.3389/fsci.2026.1721295
- Artificial intelligence in soil science A. Wadoux https://doi.org/10.1111/ejss.70080
- Interpreting semi-structured tabular documents with domain-specific knowledge from construction site investigation texts E. Yang et al. https://doi.org/10.1016/j.aei.2026.104796
- Machine Learning-Based Prospective Modeling for Alluvial Gold Mining: A Study Area in Colombia F. Bertaiola et al. https://doi.org/10.1007/s00024-025-03830-y
- Forecasting landslide deformation by integrating domain knowledge into interpretable deep learning considering spatiotemporal correlations Z. Ma & G. Mei https://doi.org/10.1016/j.jrmge.2024.02.034
Saved (final revised paper)
Latest update: 21 Jun 2026
Short summary
A large amount of descriptive information is available in geosciences. Considering the advances in natural language it is possible to
rescuethis information and transform it into a numerical form (embeddings). We used 280764 full-text scientific articles to train a language model capable of generating such embeddings. Our domain-specific embeddings (GeoVec) outperformed general domain embedding tasks such as analogies, relatedness, and categorisation, and can be used in novel applications.
A large amount of descriptive information is available in geosciences. Considering the advances...