Articles | Volume 5, issue 2
https://doi.org/10.5194/soil-5-177-2019
https://doi.org/10.5194/soil-5-177-2019
Original research article
 | 
17 Jul 2019
Original research article |  | 17 Jul 2019

Word embeddings for application in geosciences: development, evaluation, and examples of soil-related concepts

José Padarian and Ignacio Fuentes

Data sets

GeoVec J. Padarian and I. Fuentes https://doi.org/10.17605/OSF.IO/4UYEQ

Download
Short summary
A large amount of descriptive information is available in geosciences. Considering the advances in natural language it is possible to rescue this information and transform it into a numerical form (embeddings). We used 280764 full-text scientific articles to train a language model capable of generating such embeddings. Our domain-specific embeddings (GeoVec) outperformed general domain embedding tasks such as analogies, relatedness, and categorisation, and can be used in novel applications.