Moi University Open Access Repository

Improved balanced random survival forest for the analysis of right censored data: application in determining under five child mortality

Show simple item record

dc.contributor.author Wanjiru, Waititu Hellen
dc.date.accessioned 2022-01-18T12:21:05Z
dc.date.available 2022-01-18T12:21:05Z
dc.date.issued 2021
dc.identifier.uri http://ir.mu.ac.ke:8080/jspui/handle/123456789/5704
dc.description.abstract The desire to understand the determinants of Under Five Child Mortality (U 5CM ) poses a very important aspect of research. One of the main challenges affecting the Low and Middle Income Countries (LM IC) is the aspect of child mortality. The Sustainable Development Goals target of at most 25 deaths per 1000 live births has not been met, despite the many interventions governments have put in place to avert child mortality. There is huge need to understand the determinants of child mortality, especially the U5CM. Most studies rely on household surveys such as the Kenya Demographic and Health Survey (KDHS) data, with KDHS − 2014 be- ing the most recent household survey in Kenya. Some of the statistical challenges that come with DHS datasets are the presence of high imbalance in comparison classes, high dimensional problem, statistical selection of variables, and distribu- tional assumptions among other factors. Random Survival Forests (RSF ) have recently become a popular method for survival data analysis. However, statistical challenges such as imbalance between mortality and non mortality class and viola- tion of Proportional Hazard (P H) assumption pose significant challenge(s) to RSF . This is due to its stopping criterion based on daughter node constraint which demon- strates bias towards predictors in a large population and use of log-lank splitting rule whose optimality is achieved when P H assumptions are satisfied. The main aim of this study was to develop a machine learning algorithm to handle the above men- tioned statistical challenges that come with high dimensional survey data in identifying the determinants of U5CM. The specific objectives were: To analyze Balanced Random Survival Forests (BRSF ) using specified balancing techniques; to analyze BRSF using specified splitting rules; to develop an Improved Balanced Random Survival Forests (IBRSF ) model and finally to apply the BRSF to determine the U5CM. The study methodology involved data balancing using four specified exter- nal data balancing techniques: Random Under-sampling, Random Over-sampling, Both-sampling, and Synthetic Minority Oversampling technique. The balanced data was integrated with RSF for variable selection and model selection done using con- cordance index to identify the model with the best balancing technique. The BRSF was then analyzed using three specified splitting rules: log-rank, log-rank score and Bs.gradient splitting rules. Finally, an IBRSF algorithm was developed by integrat- ing balanced data with RSF while using optimal splitting rule. The study found that the model with random under-sampling balancing method produced the best fit with a concordance index of 0.90. The model using Bs.gradient splitting rule recorded a concordance of 0.87, and was the most optimal method when P H assumptions were violated. The final model, the IBRSF model, integrated data balancing using random under-sampling method and Bs.gradient rule in splitting the nodes. Based on this model, B7 (age at death of the child) resulted as the highest determinant of U 5CM with the largest variable importance (V IM P ) value of 0.0472. In conclu- sion, IBRSF produced a good fit to the data and enabled data analysis that solved all the specified statistical challenges that come with KDHS type of data. The study recommends the use of IBRSF model for prediction of highly imbalanced right censored data in situations where P H assumption is violated. en_US
dc.language.iso en en_US
dc.publisher Moi University en_US
dc.subject Mortality en_US
dc.subject Censored data en_US
dc.title Improved balanced random survival forest for the analysis of right censored data: application in determining under five child mortality en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account