the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Reference soil groups map of Ethiopia based on legacy data and machine learning-technique: EthioSoilGrids 1.0
Ashenafi Ali
Teklu Erkossa
Kiflu Gudeta
Wuletawu Abera
Ephrem Mesfin
Terefe Mekete
Mitiku Haile
Wondwosen Haile
Assefa Abegaz
Demeke Tafesse
Gebeyhu Belay
Mekonen Getahun
Sheleme Beyene
Mohamed Assen
Alemayehu Regassa
Yihenew G. Selassie
Solomon Tadesse
Dawit Abebe
Yitbarek Wolde
Nesru Hussien
Abebe Yirdaw
Addisu Mera
Tesema Admas
Feyera Wakoya
Awgachew Legesse
Nigat Tessema
Ayele Abebe
Simret Gebremariam
Yismaw Aregaw
Bizuayehu Abebaw
Damtew Bekele
Eylachew Zewdie
Steffen Schulz
Lulseged Tamene
Eyasu Elias
Download
- Final revised paper (published on 05 Mar 2024)
- Preprint (discussion started on 23 May 2022)
Interactive discussion
Status: closed
-
CC1: 'Comment on egusphere-2022-301', Sileshi W Gudeta, 24 May 2022
Dear Editor,
This is a very useful work and I congratulate the authors for taking the initiative. I have the following concerns, which I believe the authors will address for this work to be useful.
(1) My main concern relates to the discrepancy between the map they produced in Figure 7 and the Soil Atlas of Africa (see Jones et al., 2013), which is currently the authoritative reference material. For their map to be useful, it is important to reconcile with the map and wherever discrepancies exist it will be helpful to explain. Below are some of the discrepancies:
1.1. Cambisols are represented by a small proportion of the area in isolated pockets of Ethiopia according to the Soil Atlas of Africa. On the other hand, in this manuscript Cambisols are the top ranked in Figure 8. The explanation given for this in the manuscript is unsatisfactory.
1.2. Areas bordering Dijibout and Eritrea that are predominantly covered by Leptosols (according to the Soil Atlas of Africa) are now covered by Fluvisols according to this manuscript. Many of these mountaineous areas are not expected to have Fluvisols because Fluvisols naturaly form in fluvial, lacustrine or marine deposits and periodically flooded areas.
1.3. Areas in eastern and southeastern Ethiopia bordering Somalia that are predominantly covered by Calcisols and Gypsisols (according to the Soil Atlas of Africa) have a continuous cover of Cambisols and some Fluvisols according to this manuscript. That cannot be possible.
1.4. Areas in northwestern Ethiopia bordering Sudan that are predominantly covered by Nitisols, Luvisols and Alisols (according to the Soil Atlas of Africa) have almost a continuous cover of Vertisols according to this manuscript. That also does not make sense given that Vertisols form in depressions and level plains.
1.5. Andosols were shown in Eastern Ethiopia where they are not expected to occur (Andosols are formed from volcanic ejecta) and are common in the Rift Valley. Their occurence outside is uncharacteristic.
2. The colour coding in the map is realy confusing. For example, Acrisols, Cambisols and Leptosols were shown with colours that look alike. For this map to be useful it will be good if it is done with the same colour coding of the Soil Atlas of Africa and the Harmonisation of the soil map of Africa described in Dewitte.
Jones, A., Breuning-Madsen, H., Brossard, M., Dampha, A., Deckers, J., Dewitte, O., Hallett, S., Jones, R., Kilasara, M., Le Roux, P., Micheli, E., Montanarella, L., Spaargaren, O., Tahar, G., Thiombiano, L., Van Ranst, E., Yemefack, M. and Zougmore, R. (Eds.), (2013). Soil Atlas of Africa. European Commission, 176 pp., European Commission Luxembourg. DOI: 10.2788/52319
Dewitte, O., Jones, A., Spaargaren, O., Breuning-Madsen, H., Brossard, M., Dampha, A., Deckers, J., Gallali, T., Hallett, S., Jones, R., Kilasara, M., Le Roux, P., Michéli, E., Montanarella, L., Thiombiano, L., van Ranst, E., Yemefack, M. and Zougmore, R. (2013). Harmonisation of the soil map of Africa at the continental scale. Geoderma 212: 138-153. ODI: 10.1016/j.geoderma.2013.07.007.
My appeal to the authors is to compare their soil profile data with used for creating the map with the data used for the Soil Atlas of Africa.
It is also important to check whether imbalances in sample sizes among soil types (e.g., preponderence of vertisols and fewer Gypsisols) has influenced the analysis.
-
Citation: https://doi.org/10.5194/egusphere-2022-301-CC1 -
AC1: 'Comment on egusphere-2022-301', Ashenafi Ali, 27 Jun 2022
Date: 27 June 2022
Dear Editor Subject: Response to interactive comment on our manuscript entitled: Ali et al.: Reference Soil Groups Map of Ethiopia Based on Legacy Data and Machine Learning Technique: EthioSoilGrids 1.0
By Ashenafi Ali et al.
Dear Editor,
Below, the contents of community comment 1 (CC1) by Seleshi W Gudeta are provided in black text and our responses are marked in blue text.
Dear Seleshi W Gudeta,
Thank you for taking the time to review our manuscript. We will address the comments and revise the paper accordingly.
Dear Editor,
Comment 1. This is a very useful work and I congratulate the authors for taking the initiative.
Response 1: We are grateful for the positive comments indicating that the work is very useful.
Comment 2. I have the following concerns, which I believe the authors will address for this work to be useful.
(1) My main concern relates to the discrepancy between the map they produced in Figure 7 and the Soil Atlas of Africa (see Jones et al., 2013), which is currently the authoritative reference material. For their map to be useful, it is important to reconcile with the map and wherever discrepancies exist it will be helpful to explain.
Response 2: We thank Seleshi W Gudeata for the comments. The following are our responses:
We acknowledge that the Soil Atlas of Africa is still useful to provide harmonisation and improvement, however, it is too general for diverse soil information users at local levels. It is derived from the Harmonized World Soil Database (HWSD) with expert-based modifications. The HWSD for East Africa, including Ethiopia, combines existing data/maps from the Soil and Terrain (SOTER) and SOTER-based soil parameter estimates (SOTWIS), while the soil map in SOTER has the following limitations:
- it is based on qualitative (polygon) maps, which were based on the previous maps.
- the SOTER soil nomenclature doesn’t meet the present demand since it is based on FAO 1974 and FAO soil map of the world revised legend 1988 (reprint FAO-1990).
- since it is on a smaller scale, it depicts the dominant soil types from a larger area coverage and masked important soil units which would have been reported if a larger scale had been used. For example, in the HWSD, in the delineation of a given soil type, only the major one is reported, while up to 9 soil types coexist in each delineation.
- the geographic location of the dominant and associated soil types is not defined as it is based on a qualitative approach
Conclusion: The existing spatial soil information of Ethiopia is based either on a conventional/traditional qualitative approach using the mental model for extrapolation or quantitative/ digital soil mapping with limited unevenly distributed profile observations. Currently, we do not have a consistent spatial soil types information for Ethiopia, which necessitated the development of EthioSoilGrids 1.0.
On the other hand, the development of the EthioSoilGrids 1.0 is based on the following state-of-the-art techniques and procedures:
- it is based on rigorous quantitative spatial predictive model (Machine learning) that combine information from soil observations with environmental variables/covariates and remote sensing products.
- the mapping of soil types is based on the quantitatively defined probability of occurrence of each reference soil group (RSGs) per modelling window (250 meters).
- it is based on a much larger number of soil profile observations than any other soil mapping initiatives layering Ethiopia.
- the process of its development involved soil profile-based harmonization and translation to IUSS WRB 2015.
- it followed a hybrid approach, i.e., a combination of digital soil mapping, and expert validation of the soil types and their spatial patterns for generating consistent and updatable national spatial SoilGrid.
Therefore, given the above differences, in the approaches followed, scale, data source, etc, one should expect the difference between the Soil Atlas of Africa and the EthioSoilGrids 1.0. In other words, the latter is developed not to match the former, but to come up with improved and quality soil information, an objective fully achieved. Consequently, we are not surprised that the two products do not coincide since that was the assumption when the work was initiated. By the way, this is not the first report on Ethiopian soils’ information showing such discrepancies as compared to the global products; for example -the spatial soil grids layering Ethiopia based on digital soil mapping techniques (e.g., SoilGrids, 2017) a similar approach followed in the preparation of EthioSoilGrid 1.0, reflected differences in RSGs area coverage.
Comment: Below is some of the discrepancies:
Comment 2.1: Cambisols are represented by a small proportion of the area in isolated pockets of Ethiopia according to the Soil Atlas of Africa. On the other hand, in this manuscript, Cambisols are the top-ranked in Figure 8. The explanation given for this in the manuscript is unsatisfactory.
Response 2.1
Cambisols’ most abundance is acceptable, because Cambisols are developed in areas where pedogenetic development is slow (i) because of continuous erosion, but is in equilibrium with the weathering process, or continuous erosion and depositional cycles are common. As the result, they covered significant parts of the highlands of Ethiopia at the foot-slopes of undulating mountainous or hilly terrains, where erosion and weathering processes are in equilibrium, or erosion and deposition cycles are common. (ii) because of low precipitation, or weathering-resistant parent materials. In this case, Cambisols occur in the large area of the lowlands of Ethiopia on weathering-resistant calcareous limestone, and on colluvial and alluvial deposits, where precipitation is low.
It is worth noting that the total number of profile observations per reference soil group (RSGs) in which Cambisols ranked third (with n=2219) following Luvisols (n= 2,229) and Vertisols (3,935). In fact, in some of the existing conventionally made country-wide legacy soil maps of Ethiopia, Cambisols were reported to cover e.g., 21% and 16% of the land mass of Ethiopia.
Comment 2.2: Areas bordering Djibouti and Eritrea that are predominantly covered by Leptosols (according to the Soil Atlas of Africa) are now covered by Fluvisols according to this manuscript. Many of these mountainous areas are not expected to have Fluvisols because Fluvisols naturally form in fluvial, lacustrine or marine deposits and periodically flooded areas.
Response 2.2. Yes, as noted by Seleshi W Gudeta, Pedogenetically Fluvisols are developed on flood plains, riverbanks, and lacustrine deposits. Since the areas bordering Djibouti and north-eastern lowlands (Afar and Somali lowlands) are under the influence of floods; where deposits from Awash, Wabishebele and Genale rivers are frequent, the predominance of Fluvisols is expected. Note that Leptosols are well represented on the volcanic mountains of Fantale, Boseti Guda and Ziqualla in the Awash valley, volcanic hills of the Afar lowlands, and the eastern escarpment of the central and northeastern rift valley, which are situated in these areas.
Comment 2.3: Areas in eastern and south-eastern Ethiopia bordering Somalia that are predominantly covered by Calcisols and Gypsisols (according to the Soil Atlas of Africa) have a continuous cover of Cambisols and some Fluvisols according to this manuscript. That cannot be possible.
Response 2.3: On comments about the formation and distribution of Cambisols and Fluvisols, we addressed the above in responses 2.1 and 2.2.
EthioGridSoil 1.0- is based on measured point observations collated from these areas after excluding RSGs with less than thirty observations including Gypsisols which had only 11 profiles. In this case, Gypsisols are excluded from mapping. Regarding Calcisols, as indicated by Seleshi W Gudeta, the probability of occurrence map (Figure C1 of Appendix C) depicts Calcisols dominantly occurring in eastern and south-eastern Ethiopia, bordering Somalia. However, when the relative abundance of RSGs per modelling window is assessed, Calcisols’ area coverage as the dominant soil type as depicted in Figure 7, is the 7th most abundant soil in Ethiopia.
By the same token, in the polygon-based soil mapping like Soil Atlas of Africa, where a polygon is mapped as one soil unit does not mean that the polygon 100% represents that specific soil unit, but it also contains associations which are not depicted as dominant. Further, both the dominant and association geographic locations are not defined and hence do not directly indicate the specific location of each soil type.
Comment 2.4: Areas in north-western Ethiopia bordering Sudan that are predominantly covered by Nitisols, Luvisols and Alisols (according to the Soil Atlas of Africa) have almost a continuous cover of Vertisols according to this manuscript. That also does not make sense given that Vertisols form in depressions and level plains.
Response 2.4:
The north-western part of Ethiopia bordering Sudan from the Tekeze river (Humera area) down to the Baro basin is dominated by Vertisols while Luvisols and Nitisols intermingled before these two RSGs become dominant in relatively near distance/landscapes. The proportion of each soil type varies across the landscape. However, both the quantitative and qualitative assessments in those areas showed good agreement at this level of accuracy while the occurrence probability of each RSG is reported.
Comment 2.5: Andosols were shown in Eastern Ethiopia where they are not expected to occur (Andosols are formed from volcanic ejecta) and are common in the Rift Valley. Their occurrence outside is uncharacteristic.
Response 2.5:
Andosols are confirmed to occur outside the rift valley especially in the highland volcanic regions in
the presence of organic matter. In Ethiopia, Andosols occur along the rift valley and on highlands for
Examples on Bale mountains, Siemen Mountains (RasDashen), Choke Mountain, Abune Yosef Mountain and other mountains of the country. Below are some of the published references for confirmation:
Reference:
Assen, M., and Belay, T. 2008. Characteristics and classification of the soils of the plateau of
simen mountains national park (smnp), Ethiopia.
Belay ,T.1995. Morphological, physical and chemical characteristics of Mollic Andosols of Tib
Mountains, Central Ethiopian Highlands. SINET: Ethiop. J. Sci. 18 (2): 143–169.
Simane, B., Zaitchik, B.F, and Mutlu, O. 2013. Agroecosystem Analysis of the Choke Mountain
Watersheds, Ethiopia" Sustainability 5, no. 2: 592-616. https://doi.org/10.3390/su5020592.
Gebrehiwot, K., Desalegn, T., Woldu, Z., Sebsebe, D., and Ermias, T.2018. Soil organic carbon
stock in Abune Yosef afroalpine and sub-afroalpine vegetation, northern Ethiopia. Ecol Process 7, 6 (2018). https://doi.org/10.1186/s13717-018-0117-9.
In our study, the overall occurrence and the relative position of each of the reference soil groups along the topo sequence and its association with other RSGs agree with previous works and pedological expected/established schematic sequences. However, there were cases where the RSGs’ position along the topo-sequence and association with other reference soil groups required further investigation, which was not adequately captured and explained in this study. This might be attributed to the positional accuracy of legacy point observations, modelling approach, and most importantly the level of details and scale/resolution of the environmental variables used in this study. For clarity, we will specify areas that require explanation arising from the above-stated likely reasons.
Comment 3: The colour coding in the map is confusing. For example, Acrisols, Cambisols and Leptosols were shown with colours that look alike. For this map to be useful it will be good if it is done with the same colour coding as the Soil Atlas of Africa and the Harmonisation of the soil map of Africa described in Dewitte.
Jones, A., Breuning-Madsen, H., Brossard, M., Dampha, A., Deckers, J., Dewitte, O., Hallett, S., Jones, R., Kilasara, M., Le Roux, P., Micheli, E., Montanarella, L., Spaargaren, O., Tahar, G., Thiombiano, L., Van Ranst, E., Yemefack, M. and Zougmore, R. (Eds.), (2013). Soil Atlas of Africa. European Commission, 176 pp., European Commission Luxembourg. DOI: 10.2788/52319
Dewitte, O., Jones, A., Spaargaren, O., Breuning-Madsen, H., Brossard, M., Dampha, A., Deckers, J., Gallali, T., Hallett, S., Jones, R., Kilasara, M., Le Roux, P., Michéli, E., Montanarella, L., Thiombiano, L., van Ranst, E., Yemefack, M. and Zougmore, R. (2013). Harmonisation of the soil map of Africa at the continental scale. Geoderma 212: 138-153. ODI: 10.1016/j.geoderma.2013.07.007.
Response 3:
As commented, we will address the colour coding and ensure distinct contrast among RSGs.
Comment 4: My appeal to the authors is to compare the soil profile data used for creating the map with the data used for the Soil Atlas of Africa.
Response 4:
See the preceding responses!
Comment 5: It is also important to check whether imbalances in sample sizes among soil types (e.g., preponderanc of vertisols and fewer Gypsisols) has influenced the analysis.
Response 5:
Kindly note that again Gypsisols are confirmed to occur based on the point profile observations but excluded from the modelling and not mapped in EthioSoilGrids version 1.0 product. However, as admitted in Line 441 to 444 of the manuscript, balanced datasets are ideal for modelling and mapping but the effect of datasets with uneven class along with various data treatment (pruning) techniques are recommended for future studies. The reason for this was that as we know there are different unbalanced categorical data treatment techniques targeting majority or minority classes leading to different predicted map accuracy and different overall, producers and users’ accuracy.
-
CC3: 'Reply on AC1', Sileshi W Gudeta, 10 Jul 2022
I appreciate the effort made by the authors in responding to my comments. However, my intention in my previous comment was not to receive a rebutal. I still believe the authors have not addressed the main problems, i.e., (1) mismatch between their map (i.e., EthioSoil Grid map) and the Soil Atlas of Africa, and (2) relatively low accuracies of classification. In the attached document I have highlighted the various issues. I have also tried to identify reasons why the producer and user accuracies for some reference soil groups (RSGs). I encourage the authors to explore (1) opportunities to include other variables not included in the present analysis, (2) dimension reduction, (3) use of other cross-validation methods, and (4) use of an ensemble approach to see whether overall accuracy could be improved and classification errors reduce for individual RSGs. This way, I hope the classification errors could be reduced and a more refined map could be produced. I also encourage the authors in the long run to consider a map that shows the qualifiers for each of the major RSGs. For example, identifying the RSG as Calcic Cambisol, Chromic Cambisol, Dystric Cambisol, Eutric Cambisol, ... Vertic Cambisols is more informative than just saying Cambisols.
-
AC2: 'Reply on CC3', Ashenafi Ali, 13 Jul 2022
Dear Sileshi W Gudeta,
Thank you very much. We have considered all comments and we are improving.
Best regards,
Ashenafi Ali and co-authors.
Citation: https://doi.org/10.5194/egusphere-2022-301-AC2
-
AC2: 'Reply on CC3', Ashenafi Ali, 13 Jul 2022
-
CC2: 'Comment on egusphere-2022-301', Yitbarek Walde, 06 Jul 2022
- From the co-author name Yitbarek Walde, the letter "a" from Walde need to be replaced with "o" i.e. to be written as Wolde
- In line 22, the name Oromia Engineering Corporation, has to be renamed as Engineering Corporation of Oromia
- From the line number 167 to 172, the font size, and line spacing looks different from other paragraphs
Citation: https://doi.org/10.5194/egusphere-2022-301-CC2 -
AC3: 'Reply on CC2', Ashenafi Ali, 01 Sep 2022
Dear Yitbarek Wolde,
Thank you very much. All of this will be addressed during the resubmission phase.
Best regards,
Ashenafi Ali and co-authors.
Citation: https://doi.org/10.5194/egusphere-2022-301-AC3
-
CC4: 'Comment on egusphere-2022-301', Fuat Kaya, 10 Sep 2022
Dear Associate Editor,
I have carefully read the study As the voluntary "commentor" of the article “Reference Soil Groups Map of Ethiopia Based on Legacy Data and Machine Learning Technique: EthioSoilGrids 1.0”.Since I am not an official referee, my comments are sincere.
The authors should be commended for their work in Ethiopia, feeling sincerely about the data sharing process. However, the authors have edited this article to produce only one output. I have concerns about research questions. There are many challenges to address in digital soil mapping. And these challenges are voiced by the DSM community. Here's an example: Ten challenges for the future of pedometrics (https://www.sciencedirect.com/science/article/pii/S0016706121002354).
In this regard, I invite the author, who does the modeling in this valuable team, to model the events globally with two more accepted algorithms in SoilGrids 1.0 and SoilGrids 2.0.
https://soil.copernicus.org/articles/7/217/2021/--SoilGrids 2.0: producing soil information for the globe with quantified spatial uncertainty------Usedhttps://journals.plos.org/plosone/article?id=10.1371/journal.pone.0105992---SoilGrids1km — Global Soil Information Based on Automated Mapping
Spesific comments:
Line 1:
As far as We know, This map not "conventional", well this map "digital" map.
I think "digital" must added to title.Line 35:
Really, honestly, "awesome" work for this team to collaboratively extract and collate the data.
But, We (DSM community and public) know, Soilgrids 1.0 and 2.0 versions have been released.
Publishing by running a single algorithm here is just to produce an output. There is a need for an approach to address current DSM issues. We know that there is something "Unknown" in Big data. And we will discover the unknown in Data with machine learning algorithms. So why one algorithm. Comparative results are necessary for this study to make accurate inferences for regional results.
multinomial logistic regression for Soilgrids 1.0 and quantile random forests for Soilgrids 2.0.
If reference soil groups are estimated in the field with these algorithms, their outputs will be appreciated by the DSM community at the international level.
Line 70:
the last part of the introduction, the authors define a brief research purpose/question. In the last paragraph of the Introduction chapter, the Authors wrote that ... objectives of this study. In this part of the article, I rather expected a clearly formulated research goal. I suggest that in the article it is precisely stated what the purpose of the research is, using the example statement: "The goal of the study / research was ...". When formulating the research goal (s), it would be worth writing what was the cognitive (scientific) goal and what was the utilitarian (useful) goal. Before stating the purpose of the study, it would be worth formulating the research problem. The research problem may constitute a premise to indicate a gap in the current state of knowledge. It is worth writing what the current gaps in knowledge the Authors would like to fill in on the basis of planned and conducted research.
Line 178:
Is it just "model accuracy" ?
How do we evaluate uncertainty?
To evaluate classification-based algorithms that produce probabilistic predictions, D.G. I recommend Rossiter's valuable work.https://www.sciencedirect.com/science/article/pii/S0016706116303901#bb0110
Please control "confusion index" released by Burroug et al. (1997 --https://www.sciencedirect.com/science/article/pii/S0016706197000189)
And the other 2 sources applied quantify in different regions, large and small areas.https://www.sciencedirect.com/science/article/pii/S0016706116304864
https://www.tandfonline.com/doi/full/10.1080/02571862.2022.2059115
Line 263:
What "reference" soil group did the models predict in areas with these classes? Is there a taxonomic relationship here? Please read this title paper: Accounting for taxonomic distance in accuracy assessment of soil class predictions
Line 305:
Climate, Organism and topgrapy. If it is related to them, how would it be to compile it with a sentence?
Line 420, Fgure 7:
Very nice map. Most probable class maps, I think , for True phrase
Citation: https://doi.org/10.5194/egusphere-2022-301-CC4 -
AC4: 'Reply on CC4', Ashenafi Ali, 30 Sep 2022
We thank Fuat Kaya for having an interest in the work and voluntary community review. We respond to the key issues raised as indicated below:
Dear Associate Editor,
I have carefully read the study As the voluntary "commentor" of the article “Reference Soil Groups Map of Ethiopia Based on Legacy Data and Machine Learning Technique: EthioSoilGrids 1.0”.Since I am not an official referee, my comments are sincere.
The authors should be commended for their work in Ethiopia, feeling sincerely about the data sharing process.
Response 1: We are grateful for the positive comments
However, the authors have edited this article to produce only one output. I have concerns about research questions. There are many challenges to address in digital soil mapping. And these challenges are voiced by the DSM community. Here's an example: Ten challenges for the future of pedometrics (https://www.sciencedirect.com/science/article/pii/S0016706121002354).
Response 2: Thank you for bringing this to our attention, we are aware of the publication you indicated and found it helpful.In this regard, I invite the author, who does the modeling in this valuable team, to model the events globally with two more accepted algorithms in SoilGrids 1.0 and SoilGrids 2.0.
https://soil.copernicus.org/articles/7/217/2021/--SoilGrids 2.0: producing soil information for the globe with quantified spatial uncertainty------Used
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0105992---SoilGrids1km — Global Soil Information Based on Automated Mapping
Response 3: This work considered the SoilGrids 250m (2017) as a base which succeeded the development of the SoilGrids 1km (https://www.isric.org/explore/soilgrids/faq-soilgrids-2017). As indicated in the Soil Grids2.0 (https://soil.copernicus.org/articles/7/217/2021/), the numeric soil variables were only modelled and mapped (but not the soil reference groups/soil types). We understand that SoilGrids250m (2017) is the framework in which soil type/class modelling and mapping are done using Random Forest (RF), and as shown in lines 178 to 188 of this manuscript, RF was used for EthioGrid 1.0.
Specific comments:
Line 1:
As far as We know, This map not "conventional", well this map "digital" map.
I think "digital" must added to title.Response 4: It is possible to qualify the map by adding “Digital” to the title. However, digital maps can be generated either based on a predictive/digital soil mapping framework or digitalised conventional maps. Therefore to avoid confusion, we prefer to qualify the map as it is generated based on the legacy soil data and machine learning techniques which explicitly indicate that the digital soil mapping approach was followed.
Line 35:
Really, honestly, "awesome" work for this team to collaboratively extract and collate the data.
But, We (DSM community and public) know, Soilgrids 1.0 and 2.0 versions have been released.
Publishing by running a single algorithm here is just to produce an output. There is a need for an approach to address current DSM issues. We know that there is something "Unknown" in Big data. And we will discover the unknown in Data with machine learning algorithms. So why one algorithm. Comparative results are necessary for this study to make accurate inferences for regional results.
multinomial logistic regression for Soilgrids 1.0 and quantile random forests for Soilgrids 2.0.
If reference soil groups are estimated in the field with these algorithms, their outputs will be appreciated by the DSM community at the international level.
Response 5: Yes, the data extraction and compilation process is something that we are proud of. Regarding the algorithm used as explained under response 3, the scope of the work is not to compare algorithms, but to develop SoilGrid1.0 using a selected algorithm.
Line 70:
the last part of the introduction, the authors define a brief research purpose/question. In the last paragraph of the Introduction chapter, the Authors wrote that ... objectives of this study. In this part of the article, I rather expected a clearly formulated research goal. I suggest that in the article it is precisely stated what the purpose of the research is, using the example statement: "The goal of the study / research was ...". When formulating the research goal (s), it would be worth writing what was the cognitive (scientific) goal and what was the utilitarian (useful) goal. Before stating the purpose of the study, it would be worth formulating the research problem. The research problem may constitute a premise to indicate a gap in the current state of knowledge. It is worth writing what the current gaps in knowledge the Authors would like to fill in on the basis of planned and conducted research.
Response 6: Thank you for this specific comment, we will revisit and clear up confusing statements.
Line 178:
Is it just "model accuracy" ?
How do we evaluate uncertainty?
To evaluate classification-based algorithms that produce probabilistic predictions, D.G. I recommend Rossiter's valuable work.https://www.sciencedirect.com/science/article/pii/S0016706116303901#bb0110
Please control "confusion index" released by Burroug et al. (1997 --https://www.sciencedirect.com/science/article/pii/S0016706197000189)
And the other 2 sources applied quantify in different regions, large and small areas.https://www.sciencedirect.com/science/article/pii/S0016706116304864
https://www.tandfonline.com/doi/full/10.1080/02571862.2022.2059115
Response 7: The accuracy assessment (overall, user’s and producer's accuracy) method and uncertainty are indicated in lines 361 to 365. Among the reviewed techniques, we have used the most commonly used cross-validation technique and accordingly the 95% confidence interval is indicated (lines 362 and 363). These are in line with the approach followed by global/regional soil grid development frameworks. However, as you indicated, there are various accuracy assessment techniques or issues that need to be considered in selecting an accuracy assessment of modelling soil classes e.g. accounting for taxonomy distance (which has also different sub-techniques), spatial cross-validation which is presumed to have limitations, dealing with clustered samples for assessing map accuracy by cross-validation, and dealing with imbalanced data in categorical mapping which might lead to issues on the accuracy of majority and minority classes. We recommend future studies to consider these issues in line 441 to 444.
Line 263:
What "reference" soil group did the models predict in areas with these classes? Is there a taxonomic relationship here? Please read this title paper: Accounting for taxonomic distance in accuracy assessment of soil class predictions
Response 8: Thank you for the recommendation. The reference soil groups indicated in line 263 were excluded from the modelling and hence comparison was not made. However, we now get insights to include some RSGs left unmapped and improve the accuracy of this beta version. As indicated in the confusion matrix even those soil groups modelled and mapped have depicted different accuracy values and we noticed that some reference soil groups are mapped at the expense of others which enables to interpret taxonomic relationships.
Line 305:
Climate, Organism and topgrapy. If it is related to them, how would it be to compile it with a sentence?
Response 9: It indicates the relative importance of the predictor variables in determining the spatial distribution of reference soil groups across the landscapes of Ethiopia. It is an effort to go beyond prediction and incorporate model interpretations i.e. extract information on the relationships among variables found by the models. However, as is clearly indicated in various kinds of literature, model interpretations are not straightforward/simple in complex/ensemble models e.g. Wadoux et al. (2022): Beyond prediction: methods for interpreting complex models of soil variation, https://www.sciencedirect.com/science/article/abs/pii/S0016706122002609?via%3Dihub
Line 420, Fgure 7:
Very nice map. Most probable class maps, I think, for True phrase
Response 10: We are grateful for the appreciation.
Citation: https://doi.org/10.5194/egusphere-2022-301-AC4
-
AC4: 'Reply on CC4', Ashenafi Ali, 30 Sep 2022
-
CC5: 'Comment on egusphere-2022-301', Skye Wills, 13 Sep 2022
I commend the authors for this large and important effort and I appreciate the chance to review this work. This is a worthy effort that should be published and shared widely. I am very keen to explore the intersection between digital tool and expert knowledge in soil survey. However, reading this manuscript, I found myself with some additional questions and points of clarification needed. At numerous points, information was provided, but out of the order the reader might expect. This is at least partially due to the iterative nature of the project; but I found that some of the results were like part of the methods and some of the results read like conclusions. The repetition of information might cause a reader to skip sections and miss important pieces of information. I think with some additional explanation and minor edits, this paper will be ready for publication.
Please find specific comments by line number:
Line 57: What number of profiles were used in the notable efforts referred to above (soilgrids 1 and 2)? How many of the thousands collected were included. This information would link the two parts of the intro – soil maps and soil profile collection.
Line 59: What do you mean that gridded spatial soil info is hardly available. Do you mean they were inaccessible, hard to use, incomplete? Please be explicit explaining why the previous products were not adequate.
Line 64: This paragraph makes more sense to me prior to the previous paragraph – to line 59.
Line 71: What do you mean by improved?
Line 121: this is the accuracy of the profile data. Figure 2. What is Data Ecosystem Mapping? Does this include getting the metadata for each profile correct according to the covariates?
Line 152: Are the terrain variables used listed anywhere…………. I see I think this paragraph is confusing as many of the details I was looking for are in the next paragraph. I recommend creating one paragraph or a separate climate and topography paragraph. Please list the DEM derivatives.
Line 176: Did you consider evaluating your covariates for correlation and limiting the number used? Why or why not?
Line 179: this paragraphs seems more introductory and not part of explaining your process.
Line 194: Are you saying previous studies have used this technique? I think you could eliminate this sentence.
Line 199: were optimized how? Is there a metric you were evaluating? Does the Caret package give you some sort of evaluation?
Line 202: Did you state how you separated the training and testing sets and what the ‘new’ dataset is. You should define those sets, how they were selected and used.
Line 224: typo ‘-runto’ should have a space ‘-run to’
Line 254: Consider something more definitive and eliminate ‘the results suggest’. I think these are straightforward results that need no wiggle words like ‘suggest’.
Line 255: I am not sure the word ‘museum’ is what I would use here. Perhaps ‘display’ or ‘diversity’ is more appropriate?
Line 268: Is this section not part of the methods? This describes how you collected and evaluated profiles, which is covered earlier.
Line 323: This is a great description of the setting and climate; but I think it might fit better in the methods or introduction.
Figure 6. My preference is to rename the covariates or list the abbreviations in the figure captions. It is cumbersome for the reader to have to toggle between this figure and an appendix.
Line 357: could the low influence of lithology have anything to do with WRB class breaks and how they intersect with the scale of parent material variability?
Line 361: can you take mtry and the comma out of this sentence, does it still mean the same thing?
Line 362: Did you test the accuracy of previous maps or find other reported accuracies of maps from the area (not just general averages)?
Line 375: I am very curious what the accuracy of Global Soil Grids is using your updated soil profiles. Without that information, it is difficult to know how successful this effort using expert knowledge has been.
Line 401: the portion of this paragraph dealing with landscapes/top-sequeces belongs with the paragraph below (line 409) focused on topo-sequences.
Line 426: Are the soil qualities (I think you mean properties) transitional or are the covariates transitional (or both?).
Line 441: I think this is an ‘and’ not a ‘but’. Did you consider adjusting you training dataset for more balanced set of soil profiles?
Line 445: this paragraph read very much like a concluding statement, was that the intention?
Line 458 – Section 458. It would be much more powerful to compare the expert evaluation of this map vs. the expert evaluation of previous maps. Was any re-evaluation done after re-running the model. Did the output from the tests change throughout the process? Were the scales used to evaluate by experts useful to the scale of your model?
Citation: https://doi.org/10.5194/egusphere-2022-301-CC5 -
AC5: 'Reply on CC5', Ashenafi Ali, 06 Oct 2022
We thank Skye Wills for taking the time to review our manuscript. We respond to the issues raised as indicated below:
Comment on egusphere-2022-301
Skye Wills Community comment on "Reference Soil Groups Map of Ethiopia Based on Legacy Data and Machine Learning Technique: EthioSoilGrids 1.0" by Ashenafi Ali et al., EGUsphere, https://doi.org/10.5194/egusphere-2022-301-CC5, 2022
I commend the authors for this large and important effort and I appreciate the chance to review this work. This is a worthy effort that should be published and shared widely.
Response 1: Thank you for taking the time to review our manuscript and we are grateful for the positive comments.
I am very keen to explore the intersection between digital tool and expert knowledge in soil survey. However, reading this manuscript, I found myself with some additional questions and points of clarification needed. At numerous points, information was provided, but out of the order the reader might expect. This is at least partially due to the iterative nature of the project; but I found that some of the results were like part of the methods and some of the results read like conclusions. The repetition of information might cause a reader to skip sections and miss important pieces of information. I think with some additional explanation and minor edits, this paper will be ready for publication.
Response 2: Thank you for the comments. We improved issues related to redundancy, mix-up of statements in the methods, results and conclusions in the revised manuscript.
Please find specific comments by line number:
Line 57: What number of profiles were used in the notable efforts referred to above (soilgrids 1 and 2)? How many of the thousands collected were included. This information would link the two parts of the intro – soil maps and soil profile collection.
Response 3: During legacy data collection campaign, over 20,000 profile data were collated (line 107). However, 14,742 profiles (Fig.4, line 265 to 267) were georeferenced with reference soil group naming. Following exclusion of five reference soil groups from the modelling, only 14, 681 profiles (line 112) were used for developing Ethio-Soil Grids v 1.0. In fact, some profiles data might have been dropped during the modelling process due to lack of data values with the corresponding covariate(s) as depicted in the confusion matrix. However, the global soil grids (1 and 2) development is based on the Africa soil profile database/global soil profile database in which only about 1,712 profiles (line 283) covering Ethiopia were used. These soil profile information are included in the development of EthioSoilGrid 1.0
Line 59: What do you mean that gridded spatial soil info is hardly available. Do you mean they were inaccessible, hard to use, incomplete? Please be explicit in explaining why the previous products were not adequate.
Response 4: We wanted to say that a national quantitative and spatially continuous predicted reference soil group/soil type map does not exist. We admit that hardly available is confusing and in the revised manuscript, it is revised by “does not exist”. We explain why the previous products were not adequate in lines 48 to 69, as you noticed, especially in line 64. Further, we will revisit the statements.
Line 64: This paragraph makes more sense to me prior to the previous paragraph – to line 59.
Response 5: Thank you for this feedback. Your concern regarding line 59 will be addressed as indicated in response 4.
Line 71: What do you mean by improved?
Response 6: We wanted to mean we will develop an improved 250m soil grid map, which is more accurate as compared to the available global and regional soil grids.
Line 121: this is the accuracy of the profile data. Figure 2. What is Data Ecosystem Mapping? Does this include getting the metadata for each profile correct according to the covariates?
Response 7: The data ecosystem sketch is an effort to summarise the efforts involved starting from data sourcing to single standardised database. Data ecosystem mapping is the activity conducted to locate which data is available including the type of format and the level of completeness. It included getting metadata of each profile data. Harmonization of the coordinate reference system according to the covariate and different soil classification systems was worked out in the “Standardization phase” of the process.
Line 152: Are the terrain variables used listed anywhere…………. I see I think this paragraph is confusing as many of the details I was looking for are in the next paragraph. I recommend creating one paragraph or a separate climate and topography paragraph. Please list the DEM derivatives.
Response 8: All the variables including DEM variables listed in Appendix B. We will consider creating separate paragraphs for climate and topography.
Line 176: Did you consider evaluating your covariates for correlation and limiting the number used? Why or why not?
Response 9: We selected covariates representing the soil forming factors based on expert knowledge and a review of the literature. We used near zero variance analysis to reduce variables that are not contributing to the RSG modelling and mapping. We didn’t test covariates for correlation because we opted to include any covariates as long as it contributes to the prediction. This is in line with the suggestion by Helfenstein et al (2022) who stated that Ensemble decision tree models are robust against highly correlated data and we consider prediction accuracy more important than model interpretability. Based on the suggestion of the reviewer, however, we have explicitly indicated that correlation between the covariates is not done in the analysis.
Helfenstein, A., Mulder, V. L., Heuvelink, G. B., & Okx, J. P. (2022). Tier 4 maps of soil pH at 25 m resolution for the Netherlands. Geoderma, 410, 115659. https://doi.org/10.1016/j.geoderma.2021.115659
Line 179: this paragraphs seems more introductory and not part of explaining your process.
Response 10: Thank you, we revised it accordingly.
Line 194: Are you saying previous studies have used this technique? I think you could eliminate this sentence.
Response 11: Thank you this is deleted.
Line 199: were optimized how? Is there a metric you were evaluating? Does the Caret package give you some sort of evaluation?
Response 12: “expand.grid” function in Caret package was used to create a set of different tuning features while training the model. The three tuning parameters for Ranger method in Caret package are mtry, splitrule, min.node.size. Generally this function is used to tune the parameters in modelling in an automated fashion, as this will automatically check all the possible tuning parameters and return the optimized parameters on which the model gives the best accuracy.
Line 202: Did you state how you separated the training and testing sets and what the ‘new’ dataset is? You should define those sets, and how they were selected and used.
Response 13: The function “createDataPartition” was used to create balanced splits of the data. As the y argument (response variable) to this function is a factor, the random sampling occurs within each class and preserves the overall class distribution of the data. Overall, it is 70% for training and 30% for testing.
Line 224: typo ‘-runto’
should have a space ‘-run to’
Response 14: Thank you. Corrected accordingly..
Line 254: Consider something more definitive and eliminate ‘the results suggest’. I think these are straightforward results that need no wiggle words like ‘suggest’.
Response 15: We will correct it as commented.
Line 255: I am not sure the word ‘museum’ is what I would use here. Perhaps ‘display’ or ‘diversity’ is more appropriate?
Response 16: Thank you and revised accordingly.
Line 268: Is this section not part of the methods? This describes how you collected and evaluated profiles, which is covered earlier.
Response 17: In this section, we are describing the spatial density of the new database, which is one of the key results of this work. In doing so, we present these results by comparing with existing and previous databases used for developing similar soil group maps. We think these are appropriate results to be presented in this section. Therefore, we do ask the kind understanding of the reviewer to allow us to maintain this description as it is and where it is.
Line 323: This is a great description of the setting and climate; but I think it might fit better in the methods or introduction. Figure 6. My preference is to rename the covariates or list the abbreviations in the figure captions. It is cumbersome for the reader to have to toggle between this figure and an appendix.
Response 18: In this section, the effort is to explain the different covariates that are important in predicting the soil type. In order of their importance, we tried to explain what would be the reason why these factors are important in defining the soil type based on our experience and existing literature. That is what and why the climate is detailed in this section. Based on your comment, we added the description of the variable in the caption of figure 6 for easy referencing.
Line 357: could the low influence of lithology have anything to do with WRB class breaks and how they intersect with the scale of parent material variability?
Response 19: It is the relative importance which is low, and may be related to the use of a coarse-scale and less detailed lithology map, which may not sufficiently capture the spatial variability of the parent materials.
Line 361: can you take mtry and the comma out of this sentence, does it still mean the same thing?
Response 20: we revised this for clarity. It is basically mtry = 20, split rule = extra trees and minimum node size = 5. For better clarity, the sentence will be revised. See also Response 12.
Line 362: Did you test the accuracy of previous maps or find other reported accuracies of maps from the area (not just general averages)?
Response 21: We didn’t test the accuracy of previous maps rather we used the reported accuracies from published sources.
Line 375: I am very curious what the accuracy of Global Soil Grids is using your updated soil profiles. Without that information, it is difficult to know how successful this effort using expert knowledge has been.
Response 22: Here we wanted to communicate that qualitative assessment of spatial patterns was not done for SoilGrids 2017 which considers soil type mapping. This is to indicate similar accuracy might lead to different spatial patterns and hence expert-based qualitative evaluation is of paramount importance.
Line 401: the portion of this paragraph dealing with landscapes/top-sequeces belongs with the paragraph below (line 409) focused on topo-sequences.
Response 23: Thank you for the observation, this is revised accordingly.
Line 426: Are the soil qualities (I think you mean properties) transitional or are the covariates transitional (or both?).
Response 24: yes properties, properties transitional implies it is because of the covariates/soil forming factors and hence we can say both.
Line 441: I think this is an ‘and’ not a ‘but’. Did you consider adjusting you training dataset for more balanced set of soil profiles?
Response 25: For randomly sampling and splitting the dataset into training and testing set, we tried different set.seed values to ensure inclusion of each RSGs in both splitted sets and better accuracy. See also Response 13
Line 445: this paragraph read very much like a concluding statement, was that the intention?
Response 26: Thank you - we have revised accordingly. Some parts of this paragraph are revised and maintained there. The other descriptions which look like conclusions are taken to the conclusion section.
Line 458 – Section 458. It would be much more powerful to compare the expert evaluation of this map vs. the expert evaluation of previous maps. Was any re-evaluation done after re-running the model. Did the output from the tests change throughout the process? Were the scales used to evaluate by experts useful to the scale of your model?
Response 27: After re-running the model, about ten soil scientists and geospatial experts re-evaluate the output using 20-25 districts. Further, the geospatial and soil experts checked the raster map of the RSGs in GIS environment to ensure areas with no concern before re-running the model are kept the same or changes are acceptable. The quality of input data (profile data, covariates, mask layer) was assessed to improve the overall accuracy. As a general working norm, the expert’s qualitative assessment was set to consider the representation of mappable soil types at the target resolution/scale.
Citation: https://doi.org/10.5194/egusphere-2022-301-AC5
-
AC5: 'Reply on CC5', Ashenafi Ali, 06 Oct 2022
-
RC1: 'Comment on egusphere-2022-301', Skye Wills, 06 Oct 2022
I commend the authors for this large and important effort and I appreciate the chance to review this work. This is a worthy effort that should be published and shared widely. I am very keen to explore the intersection between digital tool and expert knowledge in soil survey. However, reading this manuscript, I found myself with some additional questions and points of clarification needed. At numerous points, information was provided, but out of the order the reader might expect. This is at least partially due to the iterative nature of the project; but I found that some of the results were like part of the methods and some of the results read like conclusions. The repetition of information might cause a reader to skip sections and miss important pieces of information. I think with some additional explanation and minor edits, this paper will be ready for publication.
Please find specific comments by line number:
Line 57: What number of profiles were used in the notable efforts referred to above (soilgrids 1 and 2)? How many of the thousands collected were included. This information would link the two parts of the intro – soil maps and soil profile collection.
Line 59: What do you mean that gridded spatial soil info is hardly available. Do you mean they were inaccessible, hard to use, incomplete? Please be explicit explaining why the previous products were not adequate.
Line 64: This paragraph makes more sense to me prior to the previous paragraph – to line 59.
Line 71: What do you mean by improved?
Line 121: this is the accuracy of the profile data. Figure 2. What is Data Ecosystem Mapping? Does this include getting the metadata for each profile correct according to the covariates?
Line 152: Are the terrain variables used listed anywhere…………. I see I think this paragraph is confusing as many of the details I was looking for are in the next paragraph. I recommend creating one paragraph or a separate climate and topography paragraph. Please list the DEM derivatives.
Line 176: Did you consider evaluating your covariates for correlation and limiting the number used? Why or why not?
Line 179: this paragraphs seems more introductory and not part of explaining your process.
Line 194: Are you saying previous studies have used this technique? I think you could eliminate this sentence.
Line 199: were optimized how? Is there a metric you were evaluating? Does the Caret package give you some sort of evaluation?
Line 202: Did you state how you separated the training and testing sets and what the ‘new’ dataset is. You should define those sets, how they were selected and used.
Line 224: typo ‘-runto’ should have a space ‘-run to’
Line 254: Consider something more definitive and eliminate ‘the results suggest’. I think these are straightforward results that need no wiggle words like ‘suggest’.
Line 255: I am not sure the word ‘museum’ is what I would use here. Perhaps ‘display’ or ‘diversity’ is more appropriate?
Line 268: Is this section not part of the methods? This describes how you collected and evaluated profiles, which is covered earlier.
Line 323: This is a great description of the setting and climate; but I think it might fit better in the methods or introduction.
Figure 6. My preference is to rename the covariates or list the abbreviations in the figure captions. It is cumbersome for the reader to have to toggle between this figure and an appendix.
Line 357: could the low influence of lithology have anything to do with WRB class breaks and how they intersect with the scale of parent material variability?
Line 361: can you take mtry and the comma out of this sentence, does it still mean the same thing?
Line 362: Did you test the accuracy of previous maps or find other reported accuracies of maps from the area (not just general averages)?
Line 375: I am very curious what the accuracy of Global Soil Grids is using your updated soil profiles. Without that information, it is difficult to know how successful this effort using expert knowledge has been.
Line 401: the portion of this paragraph dealing with landscapes/top-sequeces belongs with the paragraph below (line 409) focused on topo-sequences.
Line 426: Are the soil qualities (I think you mean properties) transitional or are the covariates transitional (or both?).
Line 441: I think this is an ‘and’ not a ‘but’. Did you consider adjusting you training dataset for more balanced set of soil profiles?
Line 445: this paragraph read very much like a concluding statement, was that the intention?
Line 458 – Section 458. It would be much more powerful to compare the expert evaluation of this map vs. the expert evaluation of previous maps. Was any re-evaluation done after re-running the model. Did the output from the tests change throughout the process? Were the scales used to evaluate by experts useful to the scale of your model?
Citation: https://doi.org/10.5194/egusphere-2022-301-RC1 -
AC6: 'Reply on RC1', Ashenafi Ali, 06 Oct 2022
We thank Skye Wills (RC 1) for taking the time to review our manuscript. We respond to the issues raised as indicated below:
I commend the authors for this large and important effort and I appreciate the chance to review this work. This is a worthy effort that should be published and shared widely.
Response 1: Thank you for taking the time to review our manuscript and we are grateful for the positive comments.
I am very keen to explore the intersection between digital tool and expert knowledge in soil survey. However, reading this manuscript, I found myself with some additional questions and points of clarification needed. At numerous points, information was provided, but out of the order the reader might expect. This is at least partially due to the iterative nature of the project; but I found that some of the results were like part of the methods and some of the results read like conclusions. The repetition of information might cause a reader to skip sections and miss important pieces of information. I think with some additional explanation and minor edits, this paper will be ready for publication.
Response 2: Thank you for the comments. We improved issues related to redundancy, mix-up of statements in the methods, results and conclusions in the revised manuscript.
Please find specific comments by line number:
Line 57: What number of profiles were used in the notable efforts referred to above (soilgrids 1 and 2)? How many of the thousands collected were included. This information would link the two parts of the intro – soil maps and soil profile collection.
Response 3: During legacy data collection campaign, over 20,000 profile data were collated (line 107). However, 14,742 profiles (Fig.4, line 265 to 267) were georeferenced with reference soil group naming. Following exclusion of five reference soil groups from the modelling, only 14, 681 profiles (line 112) were used for developing Ethio-Soil Grids v 1.0. In fact, some profiles data might have been dropped during the modelling process due to lack of data values with the corresponding covariate(s) as depicted in the confusion matrix. However, the global soil grids (1 and 2) development is based on the Africa soil profile database/global soil profile database in which only about 1,712 profiles (line 283) covering Ethiopia were used. These soil profile information are included in the development of EthioSoilGrid 1.0
Line 59: What do you mean that gridded spatial soil info is hardly available. Do you mean they were inaccessible, hard to use, incomplete? Please be explicit in explaining why the previous products were not adequate.
Response 4: We wanted to say that a national quantitative and spatially continuous predicted reference soil group/soil type map does not exist. We admit that hardly available is confusing and in the revised manuscript, it is revised by “does not exist”. We explain why the previous products were not adequate in lines 48 to 69, as you noticed, especially in line 64. Further, we will revisit the statements.
Line 64: This paragraph makes more sense to me prior to the previous paragraph – to line 59.
Response 5: Thank you for this feedback. Your concern regarding line 59 will be addressed as indicated in response 4.
Line 71: What do you mean by improved?
Response 6: We wanted to mean we will develop an improved 250m soil grid map, which is more accurate as compared to the available global and regional soil grids.
Line 121: this is the accuracy of the profile data. Figure 2. What is Data Ecosystem Mapping? Does this include getting the metadata for each profile correct according to the covariates?
Response 7: The data ecosystem sketch is an effort to summarise the efforts involved starting from data sourcing to single standardised database. Data ecosystem mapping is the activity conducted to locate which data is available including the type of format and the level of completeness. It included getting metadata of each profile data. Harmonization of the coordinate reference system according to the covariate and different soil classification systems was worked out in the “Standardization phase” of the process.
Line 152: Are the terrain variables used listed anywhere…………. I see I think this paragraph is confusing as many of the details I was looking for are in the next paragraph. I recommend creating one paragraph or a separate climate and topography paragraph. Please list the DEM derivatives.
Response 8: All the variables including DEM variables listed in Appendix B. We will consider creating separate paragraphs for climate and topography.
Line 176: Did you consider evaluating your covariates for correlation and limiting the number used? Why or why not?
Response 9: We selected covariates representing the soil-forming factors based on expert knowledge and a review of the literature. We used near zero variance analysis to reduce variables that are not contributing to the RSG modelling and mapping. We didn’t test covariates for correlation because we opted to include any covariates as long as it contributes to the prediction. This is in line with the suggestion by Helfenstein et al (2022) who stated that Ensemble decision tree models are robust against highly correlated data and we consider prediction accuracy more important than model interpretability. Based on the suggestion of the reviewer, however, we have explicitly indicated that correlation between the covariates is not done in the analysis.
Helfenstein, A., Mulder, V. L., Heuvelink, G. B., & Okx, J. P. (2022). Tier 4 maps of soil pH at 25 m resolution for the Netherlands. Geoderma, 410, 115659. https://doi.org/10.1016/j.geoderma.2021.115659
Line 179: this paragraphs seems more introductory and not part of explaining your process.
Response 10: Thank you, we revised it accordingly.
Line 194: Are you saying previous studies have used this technique? I think you could eliminate this sentence.
Response 11: Thank you this is deleted.
Line 199: were optimized how? Is there a metric you were evaluating? Does the Caret package give you some sort of evaluation?
Response 12: “expand.grid” function in Caret package was used to create a set of different tuning features while training the model. The three tuning parameters for Ranger method in Caret package are mtry, splitrule, min.node.size. Generally this function is used to tune the parameters in modelling in an automated fashion, as this will automatically check all the possible tuning parameters and return the optimized parameters on which the model gives the best accuracy.
Line 202: Did you state how you separated the training and testing sets and what the ‘new’ dataset is? You should define those sets, and how they were selected and used.
Response 13: The function “createDataPartition” was used to create balanced splits of the data. As the y argument (response variable) to this function is a factor, the random sampling occurs within each class and preserves the overall class distribution of the data.
Line 224: typo ‘-runto’
should have a space ‘-run to’
Response 14: Thank you. Corrected accordingly..
Line 254: Consider something more definitive and eliminate ‘the results suggest’. I think these are straightforward results that need no wiggle words like ‘suggest’.
Response 15: We will correct it as commented.
Line 255: I am not sure the word ‘museum’ is what I would use here. Perhaps ‘display’ or ‘diversity’ is more appropriate?
Response 16: Thank you and revised accordingly.
Line 268: Is this section not part of the methods? This describes how you collected and evaluated profiles, which is covered earlier.
Response 17: In this section, we are describing the spatial density of the new database, which is one of the key results of this work. In doing so, we present these results by comparing with existing and previous databases used for developing similar soil group maps. We think these are appropriate results to be presented in this section. Therefore, we do ask the kind understanding of the reviewer to allow us to maintain this description as it is and where it is.
Line 323: This is a great description of the setting and climate; but I think it might fit better in the methods or introduction. Figure 6. My preference is to rename the covariates or list the abbreviations in the figure captions. It is cumbersome for the reader to have to toggle between this figure and an appendix.
Response 18: In this section, the effort is to explain the different covariates that are important in predicting the soil type. In order of their importance, we tried to explain what would be the reason why these factors are important in defining the soil type based on our experience and existing literature. That is what and why the climate is detailed in this section. Based on your comment, we added the description of the variable in the caption of figure 6 for easy referencing.
Line 357: could the low influence of lithology have anything to do with WRB class breaks and how they intersect with the scale of parent material variability?
Response 19: It is the relative importance which is low, and may be related to the use of a coarse-scale and less detailed lithology map, which may not sufficiently capture the spatial variability of the parent materials.
Line 361: can you take mtry and the comma out of this sentence, does it still mean the same thing?
Response 20: we revised this for clarity. It is basically mtry = 20, split rule = extra trees and minimum node size = 5. For better clarity, the sentence will be revised. See also Response 12.
Line 362: Did you test the accuracy of previous maps or find other reported accuracies of maps from the area (not just general averages)?
Response 21: We didn’t test the accuracy of previous maps rather we used the reported accuracies from published sources.
Line 375: I am very curious what the accuracy of Global Soil Grids is using your updated soil profiles. Without that information, it is difficult to know how successful this effort using expert knowledge has been.
Response 22: Here we wanted to communicate that qualitative assessment of spatial patterns was not done for SoilGrids 2017 which considers soil type mapping. This is to indicate similar accuracy might lead to different spatial patterns and hence expert-based qualitative evaluation is of paramount importance.
Line 401: the portion of this paragraph dealing with landscapes/top-sequeces belongs with the paragraph below (line 409) focused on topo-sequences.
Response 23: Thank you for the observation, this is revised accordingly.
Line 426: Are the soil qualities (I think you mean properties) transitional or are the covariates transitional (or both?).
Response 24: yes properties, properties transitional implies it is because of the covariates/soil forming factors and hence we can say both.
Line 441: I think this is an ‘and’ not a ‘but’. Did you consider adjusting you training dataset for more balanced set of soil profiles?
Response 25: For randomly sampling and splitting the dataset into training and testing set, we tried different set.seed values to ensure inclusion of each RSGs in both splitted sets and better accuracy. See also Response 13
Line 445: this paragraph read very much like a concluding statement, was that the intention?
Response 26: Thank you - we have revised accordingly. Some parts of this paragraph are revised and maintained there. The other descriptions which look like conclusions are taken to the conclusion section.
Line 458 – Section 458. It would be much more powerful to compare the expert evaluation of this map vs. the expert evaluation of previous maps. Was any re-evaluation done after re-running the model. Did the output from the tests change throughout the process? Were the scales used to evaluate by experts useful to the scale of your model?
Response 27: After re-running the model, about ten soil scientists and geospatial experts re-evaluate the output using 20-25 districts. Further, the geospatial and soil experts checked the raster map of the RSGs in GIS environment to ensure areas with no concern before re-running the model are kept the same or changes are acceptable. The quality of input data (profile data, covariates, mask layer) was assessed to improve the overall accuracy. As a general working norm, the expert’s qualitative assessment was set to consider the representation of mappable soil types at the target resolution/scale.
Citation: https://doi.org/10.5194/egusphere-2022-301-AC6 -
AC7: 'Reply on RC1', Ashenafi Ali, 06 Oct 2022
We thank Skye Wills (RC 1) for taking the time to review our manuscript. We respond to the issues raised as indicated below:
I commend the authors for this large and important effort and I appreciate the chance to review this work. This is a worthy effort that should be published and shared widely.
Response 1: Thank you for taking the time to review our manuscript and we are grateful for the positive comments.
I am very keen to explore the intersection between digital tool and expert knowledge in soil survey. However, reading this manuscript, I found myself with some additional questions and points of clarification needed. At numerous points, information was provided, but out of the order the reader might expect. This is at least partially due to the iterative nature of the project; but I found that some of the results were like part of the methods and some of the results read like conclusions. The repetition of information might cause a reader to skip sections and miss important pieces of information. I think with some additional explanation and minor edits, this paper will be ready for publication.
Response 2: Thank you for the comments. We improved issues related to redundancy, mix-up of statements in the methods, results and conclusions in the revised manuscript.
Please find specific comments by line number:
Line 57: What number of profiles were used in the notable efforts referred to above (soilgrids 1 and 2)? How many of the thousands collected were included. This information would link the two parts of the intro – soil maps and soil profile collection.
Response 3: During legacy data collection campaign, over 20,000 profile data were collated (line 107). However, 14,742 profiles (Fig.4, line 265 to 267) were georeferenced with reference soil group naming. Following exclusion of five reference soil groups from the modelling, only 14, 681 profiles (line 112) were used for developing Ethio-Soil Grids v 1.0. In fact, some profiles data might have been dropped during the modelling process due to lack of data values with the corresponding covariate(s) as depicted in the confusion matrix. However, the global soil grids (1 and 2) development is based on the Africa soil profile database/global soil profile database in which only about 1,712 profiles (line 283) covering Ethiopia were used. These soil profile information are included in the development of EthioSoilGrid 1.0
Line 59: What do you mean that gridded spatial soil info is hardly available. Do you mean they were inaccessible, hard to use, incomplete? Please be explicit in explaining why the previous products were not adequate.
Response 4: We wanted to say that a national quantitative and spatially continuous predicted reference soil group/soil type map does not exist. We admit that hardly available is confusing and in the revised manuscript, it is revised by “does not exist”. We explain why the previous products were not adequate in lines 48 to 69, as you noticed, especially in line 64. Further, we will revisit the statements.
Line 64: This paragraph makes more sense to me prior to the previous paragraph – to line 59.
Response 5: Thank you for this feedback. Your concern regarding line 59 will be addressed as indicated in response 4.
Line 71: What do you mean by improved?
Response 6: We wanted to mean we will develop an improved 250m soil grid map, which is more accurate as compared to the available global and regional soil grids.
Line 121: this is the accuracy of the profile data. Figure 2. What is Data Ecosystem Mapping? Does this include getting the metadata for each profile correct according to the covariates?
Response 7: The data ecosystem sketch is an effort to summarise the efforts involved starting from data sourcing to single standardised database. Data ecosystem mapping is the activity conducted to locate which data is available including the type of format and the level of completeness. It included getting metadata of each profile data. Harmonization of the coordinate reference system according to the covariate and different soil classification systems was worked out in the “Standardization phase” of the process.
Line 152: Are the terrain variables used listed anywhere…………. I see I think this paragraph is confusing as many of the details I was looking for are in the next paragraph. I recommend creating one paragraph or a separate climate and topography paragraph. Please list the DEM derivatives.
Response 8: All the variables including DEM variables listed in Appendix B. We will consider creating separate paragraphs for climate and topography.
Line 176: Did you consider evaluating your covariates for correlation and limiting the number used? Why or why not?
Response 9: We selected covariates representing the soil-forming factors based on expert knowledge and a review of the literature. We used near zero variance analysis to reduce variables that are not contributing to the RSG modelling and mapping. We didn’t test covariates for correlation because we opted to include any covariates as long as it contributes to the prediction. This is in line with the suggestion by Helfenstein et al (2022) who stated that Ensemble decision tree models are robust against highly correlated data and we consider prediction accuracy more important than model interpretability. Based on the suggestion of the reviewer, however, we have explicitly indicated that correlation between the covariates is not done in the analysis.
Helfenstein, A., Mulder, V. L., Heuvelink, G. B., & Okx, J. P. (2022). Tier 4 maps of soil pH at 25 m resolution for the Netherlands. Geoderma, 410, 115659. https://doi.org/10.1016/j.geoderma.2021.115659
Line 179: this paragraphs seems more introductory and not part of explaining your process.
Response 10: Thank you, we revised it accordingly.
Line 194: Are you saying previous studies have used this technique? I think you could eliminate this sentence.
Response 11: Thank you this is deleted.
Line 199: were optimized how? Is there a metric you were evaluating? Does the Caret package give you some sort of evaluation?
Response 12: “expand.grid” function in Caret package was used to create a set of different tuning features while training the model. The three tuning parameters for Ranger method in Caret package are mtry, splitrule, min.node.size. Generally this function is used to tune the parameters in modelling in an automated fashion, as this will automatically check all the possible tuning parameters and return the optimized parameters on which the model gives the best accuracy.
Line 202: Did you state how you separated the training and testing sets and what the ‘new’ dataset is? You should define those sets, and how they were selected and used.
Response 13: The function “createDataPartition” was used to create balanced splits of the data. As the y argument (response variable) to this function is a factor, the random sampling occurs within each class and preserves the overall class distribution of the data.
Line 224: typo ‘-runto’
should have a space ‘-run to’
Response 14: Thank you. Corrected accordingly..
Line 254: Consider something more definitive and eliminate ‘the results suggest’. I think these are straightforward results that need no wiggle words like ‘suggest’.
Response 15: We will correct it as commented.
Line 255: I am not sure the word ‘museum’ is what I would use here. Perhaps ‘display’ or ‘diversity’ is more appropriate?
Response 16: Thank you and revised accordingly.
Line 268: Is this section not part of the methods? This describes how you collected and evaluated profiles, which is covered earlier.
Response 17: In this section, we are describing the spatial density of the new database, which is one of the key results of this work. In doing so, we present these results by comparing with existing and previous databases used for developing similar soil group maps. We think these are appropriate results to be presented in this section. Therefore, we do ask the kind understanding of the reviewer to allow us to maintain this description as it is and where it is.
Line 323: This is a great description of the setting and climate; but I think it might fit better in the methods or introduction. Figure 6. My preference is to rename the covariates or list the abbreviations in the figure captions. It is cumbersome for the reader to have to toggle between this figure and an appendix.
Response 18: In this section, the effort is to explain the different covariates that are important in predicting the soil type. In order of their importance, we tried to explain what would be the reason why these factors are important in defining the soil type based on our experience and existing literature. That is what and why the climate is detailed in this section. Based on your comment, we added the description of the variable in the caption of figure 6 for easy referencing.
Line 357: could the low influence of lithology have anything to do with WRB class breaks and how they intersect with the scale of parent material variability?
Response 19: It is the relative importance which is low, and may be related to the use of a coarse-scale and less detailed lithology map, which may not sufficiently capture the spatial variability of the parent materials.
Line 361: can you take mtry and the comma out of this sentence, does it still mean the same thing?
Response 20: we revised this for clarity. It is basically mtry = 20, split rule = extra trees and minimum node size = 5. For better clarity, the sentence will be revised. See also Response 12.
Line 362: Did you test the accuracy of previous maps or find other reported accuracies of maps from the area (not just general averages)?
Response 21: We didn’t test the accuracy of previous maps rather we used the reported accuracies from published sources.
Line 375: I am very curious what the accuracy of Global Soil Grids is using your updated soil profiles. Without that information, it is difficult to know how successful this effort using expert knowledge has been.
Response 22: Here we wanted to communicate that qualitative assessment of spatial patterns was not done for SoilGrids 2017 which considers soil type mapping. This is to indicate similar accuracy might lead to different spatial patterns and hence expert-based qualitative evaluation is of paramount importance.
Line 401: the portion of this paragraph dealing with landscapes/top-sequeces belongs with the paragraph below (line 409) focused on topo-sequences.
Response 23: Thank you for the observation, this is revised accordingly.
Line 426: Are the soil qualities (I think you mean properties) transitional or are the covariates transitional (or both?).
Response 24: yes properties, properties transitional implies it is because of the covariates/soil forming factors and hence we can say both.
Line 441: I think this is an ‘and’ not a ‘but’. Did you consider adjusting you training dataset for more balanced set of soil profiles?
Response 25: For randomly sampling and splitting the dataset into training and testing set, we tried different set.seed values to ensure inclusion of each RSGs in both splitted sets and better accuracy. See also Response 13
Line 445: this paragraph read very much like a concluding statement, was that the intention?
Response 26: Thank you - we have revised accordingly. Some parts of this paragraph are revised and maintained there. The other descriptions which look like conclusions are taken to the conclusion section.
Line 458 – Section 458. It would be much more powerful to compare the expert evaluation of this map vs. the expert evaluation of previous maps. Was any re-evaluation done after re-running the model. Did the output from the tests change throughout the process? Were the scales used to evaluate by experts useful to the scale of your model?
Response 27: After re-running the model, about ten soil scientists and geospatial experts re-evaluate the output using 20-25 districts. Further, the geospatial and soil experts checked the raster map of the RSGs in GIS environment to ensure areas with no concern before re-running the model are kept the same or changes are acceptable. The quality of input data (profile data, covariates, mask layer) was assessed to improve the overall accuracy. As a general working norm, the expert’s qualitative assessment was set to consider the representation of mappable soil types at the target resolution/scale.
Citation: https://doi.org/10.5194/egusphere-2022-301-AC7
-
AC6: 'Reply on RC1', Ashenafi Ali, 06 Oct 2022
-
RC2: 'Comment on egusphere-2022-301', Skye Wills, 06 Oct 2022
I commend the authors for this large and important effort and I appreciate the chance to review this work. This is a worthy effort that should be published and shared widely. I am very keen to explore the intersection between digital tool and expert knowledge in soil survey. However, reading this manuscript, I found myself with some additional questions and points of clarification needed. At numerous points, information was provided, but out of the order the reader might expect. This is at least partially due to the iterative nature of the project; but I found that some of the results were like part of the methods and some of the results read like conclusions. The repetition of information might cause a reader to skip sections and miss important pieces of information. I think with some additional explanation and minor edits, this paper will be ready for publication.
Please find specific comments by line number:
Line 57: What number of profiles were used in the notable efforts referred to above (soilgrids 1 and 2)? How many of the thousands collected were included. This information would link the two parts of the intro – soil maps and soil profile collection.
Line 59: What do you mean that gridded spatial soil info is hardly available. Do you mean they were inaccessible, hard to use, incomplete? Please be explicit explaining why the previous products were not adequate.
Line 64: This paragraph makes more sense to me prior to the previous paragraph – to line 59.
Line 71: What do you mean by improved?
Line 121: this is the accuracy of the profile data. Figure 2. What is Data Ecosystem Mapping? Does this include getting the metadata for each profile correct according to the covariates?
Line 152: Are the terrain variables used listed anywhere…………. I see I think this paragraph is confusing as many of the details I was looking for are in the next paragraph. I recommend creating one paragraph or a separate climate and topography paragraph. Please list the DEM derivatives.
Line 176: Did you consider evaluating your covariates for correlation and limiting the number used? Why or why not?
Line 179: this paragraphs seems more introductory and not part of explaining your process.
Line 194: Are you saying previous studies have used this technique? I think you could eliminate this sentence.
Line 199: were optimized how? Is there a metric you were evaluating? Does the Caret package give you some sort of evaluation?
Line 202: Did you state how you separated the training and testing sets and what the ‘new’ dataset is. You should define those sets, how they were selected and used.
Line 224: typo ‘-runto’ should have a space ‘-run to’
Line 254: Consider something more definitive and eliminate ‘the results suggest’. I think these are straightforward results that need no wiggle words like ‘suggest’.
Line 255: I am not sure the word ‘museum’ is what I would use here. Perhaps ‘display’ or ‘diversity’ is more appropriate?
Line 268: Is this section not part of the methods? This describes how you collected and evaluated profiles, which is covered earlier.
Line 323: This is a great description of the setting and climate; but I think it might fit better in the methods or introduction.
Figure 6. My preference is to rename the covariates or list the abbreviations in the figure captions. It is cumbersome for the reader to have to toggle between this figure and an appendix.
Line 357: could the low influence of lithology have anything to do with WRB class breaks and how they intersect with the scale of parent material variability?
Line 361: can you take mtry and the comma out of this sentence, does it still mean the same thing?
Line 362: Did you test the accuracy of previous maps or find other reported accuracies of maps from the area (not just general averages)?
Line 375: I am very curious what the accuracy of Global Soil Grids is using your updated soil profiles. Without that information, it is difficult to know how successful this effort using expert knowledge has been.
Line 401: the portion of this paragraph dealing with landscapes/top-sequeces belongs with the paragraph below (line 409) focused on topo-sequences.
Line 426: Are the soil qualities (I think you mean properties) transitional or are the covariates transitional (or both?).
Line 441: I think this is an ‘and’ not a ‘but’. Did you consider adjusting you training dataset for more balanced set of soil profiles?
Line 445: this paragraph read very much like a concluding statement, was that the intention?
Line 458 – Section 458. It would be much more powerful to compare the expert evaluation of this map vs. the expert evaluation of previous maps. Was any re-evaluation done after re-running the model. Did the output from the tests change throughout the process? Were the scales used to evaluate by experts useful to the scale of your model?
Citation: https://doi.org/10.5194/egusphere-2022-301-RC2 -
AC8: 'Reply on RC2', Ashenafi Ali, 09 Oct 2022
Dear Sky Wills (RC2),
Kindly please refer to our response (AC6) to RC1, as both RC1 and RC2 are the same.
Kind regards,
Ashenafi Ali (on behalf of the co-authors)
Citation: https://doi.org/10.5194/egusphere-2022-301-AC8
-
AC8: 'Reply on RC2', Ashenafi Ali, 09 Oct 2022
-
RC3: 'Comment on egusphere-2022-301', Anonymous Referee #2, 12 May 2023
Overall evaluation:
- I feel that the paper is a great effort by the authors to draw together a set of soils data for Ethiopia and improve the spatial resolution of the mapping. I think just pulling together the data set is a big achievement.
- However, I feel the paper lacks a critical evaluation of the results and of the subsequent learning and recommendations that could be made. To do this it needs an assessment of where the modelling worked well and where it didn’t and explanations of why these results may have occured.
- I think the discussion of the maps with experts is a really useful way of validating the maps and more could be made of the results of these discussions.
- There needs to be a discussion about where results are unexpected/expected and how that links back to figure 5 and the availability of the input soil profile data and covariates in different areas.
- The paper needs to highlight what we can learn from mapping in Ethiopia for mapping in similar landscapes. If this can be added I think it would be a really valuable addition to the DSM literature.
Specific queries:
- Could the resolution of the input data explain why the results may not be as expected in certain areas?
- In the discussion of the confusion matrix (Table 1) the authors could look at where there are large differences between soils pedologically and where a miss mapping of soils might lead to different management decisions in areas.
- The paper mentions a rerun of the modelling after the workshop. Can the authors explain what was changed to improve the results between the 2 runs and which versions of the runs are presented in this paper.
- I think its structure needs some thought specifically. The results of the validation described in section 2.4.2 need to be part of the results rather than the methods.
Points of clarification:
- Line 59: What is meant by “hardly available”
- Line 113: What criteria were used to define if a profile is complete and clean?
- Line 223: How were the polygons for review selected?
- Line 233: How are the authors looking to improve the version of the map from the first version?
- Line 247 – 253: Do the number of samples used represent what would be expected in terms of areas of specific soils in Ethiopia or are the input data biased to specific land cover or soil types.
- Line 274-278: Do the authors see a difference in the quality of the results where they had an increased density of input profiles?
- Figure 6: Add an axis label to the X axis
- Line 409-418: The authors need to discuss in more detail the reasons why certain points in the topographic sequences do match other work and where they don’t and offer potential explanations of why.
- Line 428-435: This section assumes that the new soil grids that have been generated are better than the "soil grids" without explaining what the insight come from the new modelling and why it’s important. It would also be valuable if the authors could offer insight into which of the 3 reasons the results may be different.
- Line 441-444: Is it likely that the data used in this study are biased and can the authors offer a recommendation on what new data might be needed in which areas to improve the results.
- lines 473-479 it is unclear whether the rerun version of the map is what has been presented in the curretn paperor whether that is something that is to follow. If it isn’t presented can the authors explain why not.
Citation: https://doi.org/10.5194/egusphere-2022-301-RC3 - AC9: 'Reply on RC3', Ashenafi Ali, 29 May 2023