Please use this identifier to cite or link to this item: http://ir.mu.ac.ke:8080/jspui/handle/123456789/5704
Title: Improved balanced random survival forest for the analysis of right censored data: application in determining under five child mortality
Authors: Wanjiru, Waititu Hellen
Keywords: Mortality
Censored data
Issue Date: 2021
Publisher: Moi University
Abstract: The desire to understand the determinants of Under Five Child Mortality (U 5CM ) poses a very important aspect of research. One of the main challenges affecting the Low and Middle Income Countries (LM IC) is the aspect of child mortality. The Sustainable Development Goals target of at most 25 deaths per 1000 live births has not been met, despite the many interventions governments have put in place to avert child mortality. There is huge need to understand the determinants of child mortality, especially the U5CM. Most studies rely on household surveys such as the Kenya Demographic and Health Survey (KDHS) data, with KDHS − 2014 be- ing the most recent household survey in Kenya. Some of the statistical challenges that come with DHS datasets are the presence of high imbalance in comparison classes, high dimensional problem, statistical selection of variables, and distribu- tional assumptions among other factors. Random Survival Forests (RSF ) have recently become a popular method for survival data analysis. However, statistical challenges such as imbalance between mortality and non mortality class and viola- tion of Proportional Hazard (P H) assumption pose significant challenge(s) to RSF . This is due to its stopping criterion based on daughter node constraint which demon- strates bias towards predictors in a large population and use of log-lank splitting rule whose optimality is achieved when P H assumptions are satisfied. The main aim of this study was to develop a machine learning algorithm to handle the above men- tioned statistical challenges that come with high dimensional survey data in identifying the determinants of U5CM. The specific objectives were: To analyze Balanced Random Survival Forests (BRSF ) using specified balancing techniques; to analyze BRSF using specified splitting rules; to develop an Improved Balanced Random Survival Forests (IBRSF ) model and finally to apply the BRSF to determine the U5CM. The study methodology involved data balancing using four specified exter- nal data balancing techniques: Random Under-sampling, Random Over-sampling, Both-sampling, and Synthetic Minority Oversampling technique. The balanced data was integrated with RSF for variable selection and model selection done using con- cordance index to identify the model with the best balancing technique. The BRSF was then analyzed using three specified splitting rules: log-rank, log-rank score and Bs.gradient splitting rules. Finally, an IBRSF algorithm was developed by integrat- ing balanced data with RSF while using optimal splitting rule. The study found that the model with random under-sampling balancing method produced the best fit with a concordance index of 0.90. The model using Bs.gradient splitting rule recorded a concordance of 0.87, and was the most optimal method when P H assumptions were violated. The final model, the IBRSF model, integrated data balancing using random under-sampling method and Bs.gradient rule in splitting the nodes. Based on this model, B7 (age at death of the child) resulted as the highest determinant of U 5CM with the largest variable importance (V IM P ) value of 0.0472. In conclu- sion, IBRSF produced a good fit to the data and enabled data analysis that solved all the specified statistical challenges that come with KDHS type of data. The study recommends the use of IBRSF model for prediction of highly imbalanced right censored data in situations where P H assumption is violated.
URI: http://ir.mu.ac.ke:8080/jspui/handle/123456789/5704
Appears in Collections:School of Aerospace

Files in This Item:
File Description SizeFormat 
WAITITU HELLEN WANJIRU 2021.pdf3.14 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.