Please use this identifier to cite or link to this item: http://ir.mu.ac.ke:8080/jspui/handle/123456789/5704
Full metadata record
DC FieldValueLanguage
dc.contributor.authorWanjiru, Waititu Hellen-
dc.date.accessioned2022-01-18T12:21:05Z-
dc.date.available2022-01-18T12:21:05Z-
dc.date.issued2021-
dc.identifier.urihttp://ir.mu.ac.ke:8080/jspui/handle/123456789/5704-
dc.description.abstractThe desire to understand the determinants of Under Five Child Mortality (U 5CM ) poses a very important aspect of research. One of the main challenges affecting the Low and Middle Income Countries (LM IC) is the aspect of child mortality. The Sustainable Development Goals target of at most 25 deaths per 1000 live births has not been met, despite the many interventions governments have put in place to avert child mortality. There is huge need to understand the determinants of child mortality, especially the U5CM. Most studies rely on household surveys such as the Kenya Demographic and Health Survey (KDHS) data, with KDHS − 2014 be- ing the most recent household survey in Kenya. Some of the statistical challenges that come with DHS datasets are the presence of high imbalance in comparison classes, high dimensional problem, statistical selection of variables, and distribu- tional assumptions among other factors. Random Survival Forests (RSF ) have recently become a popular method for survival data analysis. However, statistical challenges such as imbalance between mortality and non mortality class and viola- tion of Proportional Hazard (P H) assumption pose significant challenge(s) to RSF . This is due to its stopping criterion based on daughter node constraint which demon- strates bias towards predictors in a large population and use of log-lank splitting rule whose optimality is achieved when P H assumptions are satisfied. The main aim of this study was to develop a machine learning algorithm to handle the above men- tioned statistical challenges that come with high dimensional survey data in identifying the determinants of U5CM. The specific objectives were: To analyze Balanced Random Survival Forests (BRSF ) using specified balancing techniques; to analyze BRSF using specified splitting rules; to develop an Improved Balanced Random Survival Forests (IBRSF ) model and finally to apply the BRSF to determine the U5CM. The study methodology involved data balancing using four specified exter- nal data balancing techniques: Random Under-sampling, Random Over-sampling, Both-sampling, and Synthetic Minority Oversampling technique. The balanced data was integrated with RSF for variable selection and model selection done using con- cordance index to identify the model with the best balancing technique. The BRSF was then analyzed using three specified splitting rules: log-rank, log-rank score and Bs.gradient splitting rules. Finally, an IBRSF algorithm was developed by integrat- ing balanced data with RSF while using optimal splitting rule. The study found that the model with random under-sampling balancing method produced the best fit with a concordance index of 0.90. The model using Bs.gradient splitting rule recorded a concordance of 0.87, and was the most optimal method when P H assumptions were violated. The final model, the IBRSF model, integrated data balancing using random under-sampling method and Bs.gradient rule in splitting the nodes. Based on this model, B7 (age at death of the child) resulted as the highest determinant of U 5CM with the largest variable importance (V IM P ) value of 0.0472. In conclu- sion, IBRSF produced a good fit to the data and enabled data analysis that solved all the specified statistical challenges that come with KDHS type of data. The study recommends the use of IBRSF model for prediction of highly imbalanced right censored data in situations where P H assumption is violated.en_US
dc.language.isoenen_US
dc.publisherMoi Universityen_US
dc.subjectMortalityen_US
dc.subjectCensored dataen_US
dc.titleImproved balanced random survival forest for the analysis of right censored data: application in determining under five child mortalityen_US
dc.typeThesisen_US
Appears in Collections:School of Aerospace

Files in This Item:
File Description SizeFormat 
WAITITU HELLEN WANJIRU 2021.pdf3.14 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.