Browsing by Author "Madzime, Ruvarashe Joylyne"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemDevelopment of containerized pipelines for the reproducible analysis of amplicon-, shotgun metagenomic- and metatranscriptomic data.(Stellenbosch : Stellenbosch University, 2024-03) Madzime, Ruvarashe Joylyne; Tromp, Gerard; Sanko, Tomasz; Stellenbosch University. Faculty of Medicine and Health Sciences. Dept. of Biomedical Sciences. Molecular Biology and Human Genetics.ENGLISH ABSTRACT: Advances in next generation sequencing technologies have enabled the investigation of microbial genetic material directly from a biological specimen without the need for culturing. This has propelled the field of metagenomics and set techniques like amplicon-, shotgun metagenomic- and meta transcriptomic sequencing at the forefront of investigating complex microbial communities. Data from these techniques is very large, over gigabytes (GB) in size, and often needs to be analysed on high performance clusters and servers. These computational requirements introduce a problem of variation in compute environments, which leads to irreproducibility. The data are high-dimensional and compositional, and there are specific algorithms that address these qualities of the data. However, the software algorithms are updated regularly, providing multiple versions of the same software algorithm, and this too leads to irreproducibility, therefore affecting the integrity of science. This project addresses computational irreproducibility through the development of three independent computational pipelines, designed to be used on Unix/Linux-based clusters and servers. I developed a pipeline implementing QIIME2, for analysing amplicon sequence data. For meta transcriptomic data, I developed a pipeline implementing Trinity and its utility workflow for de novo assembly of transcripts and differential expression analysis. I developed a pipeline for analysing shotgun metagenomic data, implementing multiple algorithms, meta SPAdes, Maxbin2, prokka, BLAST and deep ARG. I packaged all the algorithms in separate Singularity containers for version control and consistency of execution environment. All three pipelines were developed and launched using the Next flow workflow management system. Using the respective data for each pipeline, the pipelines managed to run in an automated manner on a local university server and a PBSPro cluster. All three independent containerized pipelines were successfully implemented. Future work will include multi-stage development of containers, robust validation of the pipelines, and adding features like optional software algorithms to the pipelines.