Articles | Volume 5, issue 2
SOIL, 5, 177–187, 2019
https://doi.org/10.5194/soil-5-177-2019
SOIL, 5, 177–187, 2019
https://doi.org/10.5194/soil-5-177-2019

Original research article 17 Jul 2019

Original research article | 17 Jul 2019

Word embeddings for application in geosciences: development, evaluation, and examples of soil-related concepts

José Padarian and Ignacio Fuentes

Data sets

GeoVec J. Padarian and I. Fuentes https://doi.org/10.17605/OSF.IO/4UYEQ

Download
Short summary
A large amount of descriptive information is available in geosciences. Considering the advances in natural language it is possible to rescue this information and transform it into a numerical form (embeddings). We used 280764 full-text scientific articles to train a language model capable of generating such embeddings. Our domain-specific embeddings (GeoVec) outperformed general domain embedding tasks such as analogies, relatedness, and categorisation, and can be used in novel applications.