Using machine learning to predict the risk severity of late effects o f c hildhood cancer survivors

Date
2024-12
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
With improvements in childhood cancer treatment, the number of survivors is continuously increasing. Childhood cancer survivors have a significant risk of developing late effects due to the underlying cancer or the treatment received. These late effects can affect any organ and may influence the survivors’ health related quality of life from a young age. It is important to identify the risk severity of late effects and diagnose them as soon as possible to plan appropriate long-term follow-up care, manage late effects early, and potentially improve the health-related quality of life of these survivors. In low-and-middle-income countries, there is often limited access to routine screening for healthcare problems, which makes it challenging to provide adequate long-term follow-up care for childhood cancer survivors. Due to limited access to health care, there is an opportunity for developing other techniques to predict the risk severity of these late effects. This study utilised two datasets: a South African childhood cancer survivor cohort (comprising haematological and solid cancers) and a North American childhood rhabdomyosarcoma survivor cohort. For both datasets, data analysis, and three clustering- and six classification algorithms were applied to select the best strategy for predicting the risk severity of late effects. The clustering was necessary because the target feature required for the supervised machine learning algorithms was not a single obvious feature already present in the dataset. Therefore, to comprehensively report the extent to which late effects manifested among childhood cancer survivors, the features related to the grade and number of late effects were utilised during the clustering. The indices of these newly created clusters formed the target feature for the supervised machine learning algorithms. Five performance metrics were measured to evaluate the respective classification models. For both the South African and North American cohorts, the gradient boosting model yielded the most promising results across the selected performance metrics of the classification algorithms. The gradient boosting model of the South African cohort identified anthracycline dose, radiotherapy dose, age at study visit, treatment modalities, body mass index, and age at diagnosis as the most important features for predicting the risk severity of late effects. The gradient boosting model of the North American rhabdomyosarcoma cohort identified age at follow-up, age at diagnosis, neck radiotherapy, participants’ educational status, and head radiotherapy as the most important features for these predictions. Risk stratification into low- or high-risk categories may assist with long-term follow-up care planning for childhood cancer survivors. Predicted risk severity of late effects can assist with providing more intensive follow-up to survivors with a higher risk for late effects and reducing the burden on the healthcare system for the follow-up of survivors with a lower risk of complications. Furthermore, since age at diagnosis, age at follow-up, and treatment modalities (including radiotherapy and chemotherapy) were identified in both cohorts, these risk factors can be important to incorporate in managing and planning appropriate long-term follow-up care for childhood cancer survivors to improve their health-related quality of life potentially.
Description
Thesis (MEng)--Stellenbosch University, 2024.
Keywords
Citation