Transcriptomic profile based cancer disease prediction and patient survival time differentiation

Date
2018-12
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT : Cancer disease is an abnormal growth of cells, which may be caused by mutations in genes which, as a result, alter the way cells function mainly in the way they grow and divide. Cancer cells are regulated by complex interactions mediated by a group of proteins and miRNAs which are expressed and repressed. With the help of transcriptomic technologies such as RNA–sequencing (RNA–seq), it is now possible to profile thousands of genes at once to create a global picture of the functions of cells. Here, the study employs a statistical approach, called Significance Analysis of Microarray (SAM), to identify genes that are differentially expressed in breast cancer patients. Genes with scores greater than a threshold are deemed potentially significant. Genes identified as significantly different are used for twofold reasons. First, the study uses these significantly identified genes to predict breast cancer using three machine learning algorithms. The machine learning algorithms used are random forests, artificial neural networks and support vector machines. Secondly, clinical details of patients and significantly identified genes are combined to build a survival model to predict the probability of survival and risk to the event in breast cancer patients. Using The Cancer Genome Atlas (TCGA) as the primary data for the study, SAM reported 23 genes as significantly different. Further investigations revealed that these 23 significant genes are involved in tumour suppression, angiogenesis, cell growth factor, tumourigenesis, cell proliferation, tumour progression and tumour necrosis activities. In predicting breast cancer, 10 out of the 23 genes contribute significantly to the model. Finally, it was identified that log–logistic distribution best describes the survival time of breast cancer patients. Moreover, the survival model revealed that expression levels of six genes influence the survival probability of a breast cancer patient.
AFRIKAANSE OPSOMMING : Kanker siekte is ’n abnormale groei van selle, wat veroorsaak kan word deur mutasies in gene, gevolglik, verander die manier waarop selle hoofsaaklik funksioneer in die manier waarop hulle groei en verdeel. Kanker selle word gereguleer deur komplekse interaksies gemedieer deur ’n groep proteïene en miRNAs wat uitgedruk en onderdruk word. Met behulp van transcriptomiese tegnologie soos RNA–sequencing (RNA - seq), is dit nou moontlik om duisende gene gelyktydig te profileer om ’n globale prentjie van die funksies van selle te skep. Hier gebruik die studie ’n statistiese benadering, genoem Significance Analysis of Microarray (SAM), om betekenisvolle gene te identifiseer wat differensieel uitgedruk word in borskankerpasiënte. Genes met tellings groter as ’n drempel word beskou as potensieel betekenisvol. Vervolgens gebruik die studie hierdie beduidende geïdentifiseerde gene om borskanker te voorspel deur gebruik te maak van drie machine learning algoritmes, insluitend random forests, artificial neural networks en support vector machines. Laastens word kliniese besonderhede van pasiënte en beduidende geïdentifiseerde gene gekombineer om ’n oorlewingsmodel te bou om die waarskynlikheid van oorlewing en risiko vir die gebeurtenis in pasiënte met borskanker te voorspel. Die risiko vir die geleentheid vir hierdie studie is die dood. Met behulp van The Cancer Genome Atlas (TCGA) as die primêre data vir die studie, het SAM 23 gene so beduidend anders aangedui. Verdere ondersoeke het getoon dat hierdie 23 belangrike gene betrokke was by tumour suppression, angiogenesis, sel groeifaktor, tumourigenesis, sel proliferasie, tumor progressie en tumor necrosis aktiwiteite. By die voorspel van borskanker dra 10 uit die 23 gene aansienlik by tot die model. Ten slotte is geïdentifiseer dat log–logistieke verspreiding die oorlewingstyd van pasiënte met borskanker die beste beskryf. Daarbenewens het die oorlewingsmodel geopenbaar dat uitdrukkingsvlakke van ses gene die oorlewingswaarskynlikheid van ’n pasiënt met borskanker beïnvloed. Die oorlewingsmodel het verder getoon dat borskanker pasiënte waarskynlik groter risiko vir die gebeurtenis sal hê, maar na 3243.38 dae kan hul risiko vir die gebeurtenis geleidelik verminder.
Description
Keywords
Cancer Genome Atlas (TCGA), Breast -- Cancer -- Research, RNA–sequencing, Survival analysis (Biometry), UCTD
Citation