Abstract:
Digital soil mapping techniques represent a cost-effective method for obtaining detailed information regarding
the spatial distribution of chemical elements in soils. Machine learning (ML) algorithms using random forest (RF)
models have been developed for classification, pattern recognition and regression tasks, they are capable of
modelling non-linear relationships using a range of datasets, identifying hierarchical relationships, and deter-
mining the importance of predictor variables. In this study, we describe a framework for spatial prediction based
on RF modelling where inverse distance weighted (IDW) predictors are used in conjunction with ancillary
environmental covariates. The model was applied to predict the total concentration (mg kg 1) and assess the
prediction uncertainty of 56 elements, soil pH and organic matter content using 466 soil samples in western
Kenya; the results of iodine (I), selenium (Se), zinc (Zn) and soil pH are highlighted in this work. These elements
were selected due to contrasting biogeochemical cycles and widespread dietary deficiencies in sub-Saharan
Africa, whilst soil pH is an important parameter controlling soil chemical reactions. Algorithm performance
was evaluated determining the relative importance of each predictor variable and the model's response using
partial dependence profiles. The accuracy and precision of each RF model were assessed by evaluating out-of-bag
predicted values. The models R2 values range from 0.31 to 0.64 whilst CCC values range from 0.51 to 0.77. The
IDW predictor variables had the greatest impact on assessing the distribution of soil properties in the study area,
however, the inclusion of ancillary environmental data improved model performance for all soil properties. The
results presented in this paper highlight the benefits of ML algorithms which can incorporate multiple layers of
data for spatial prediction, uncertainty assessment and attributing variable importance. Additional research is
now required to ensure health practitioners and the agri-community utilise the geochemical maps presented here
for assessing the relationship between environmental geochemistry, endemic diseases and preventable micro-
nutrient deficiency.