Understanding genetic diversity and rapid drug resistance prediction in mycobacterium tuberculosis from whole-genome sequence and other epidemiological data

Babirye, Sandra Ruth

dc.contributor.author	Babirye, Sandra Ruth
dc.date.accessioned	2024-11-06T13:35:40Z
dc.date.available	2024-11-06T13:35:40Z
dc.date.issued	2024-06
dc.identifier.citation	Babirye, S.R. (2024). Understanding genetic diversity and rapid drug resistance prediction in mycobacterium tuberculosis from whole-genome sequence and other epidemiological data (Unpublished master's dissertation) Makerere University, Kampala, Uganda	en_US
dc.identifier.uri	http://hdl.handle.net/10570/13654
dc.description	A dissertation submitted to the Directorate of Research and Graduate Training in partial fulfillment of the requirements for the award of the degree of Master of Master of Science in Bioinformatics of Makerere University	en_US
dc.description.abstract	Tuberculosis (TB) remains one of the major global health problems with an estimated 1.6 million deaths worldwide. The availability of whole-genome sequence (WGS) data offers a good avenue for understanding genetic diversity and drug resistance (DR) mutations. We aimed to investigate the genetic diversity and relatedness of Mycobacterium tuberculosis isolates among individuals with different CD4 cell counts and leverage machine learning (ML) algorithms in predicting DR using WGS and epidemiological data from Uganda. Methods: This was a cross-sectional study utilizing 226 WGS samples of MTB isolates in Uganda between 2013 and 2023. Associated patient demographic data and phenotypic drug information was obtained. We utilized TB profiler for lineage and drug resistance prediction, and snippy tool for variant calling and annotation. Phylogenetic analysis was performed on the core genome alignment file in MEGA. For ML model development, we split the data into training (80%) and testing (20%) datasets. The SMOTE technique was applied to handle for class imbalance issue. We evaluated various ML algorithms including random forest (RF), Logistic regression (LR), boosting classifiers such as ada Boost, cat Boost, Gradient Boosting, XGBoost etc. for prediction of drug resistance for the antibiotics Rifampicin, Ethambutol, Isoniazid and Streptomycin. Various key metrics such as recall, precision, Receiver operating characteristic curve (ROC), and Matthews Correlation Coefficient (MCC) were used to assess the performance characteristics of the models. Results: Across the 203 MTB isolates, we observed 5 distinct phylogenetic lineages (L1-4, L3&L4) with L4 being the most prevalent with 149/203 (73.40%) followed by L3 (46(22.66%) among others. The most common sub lineage was L4.6.1.1/Uganda II compared to the other sub lineage. There was statistical association between MTB lineages and CD4 cell count group as either low or high. Overall, all ML algorithms proved that they can predict drug resistance however the boosting classifiers had the highest AUC values. Age, Sex and HIV status proved to be significant features in addition to the SNP positions for ML model development. Conclusion: Our findings of the circulating lineages, sub lineages, drug resistance profiles play a crucial role in understanding the genetic diversity of MTB. Additionally, our approach of ML, can robustly predict drug resistance and also inform on the underlying gene mutations while utilizing both the WGS (SNP) and epidemiological data.	en_US
dc.language.iso	en	en_US
dc.publisher	Makerere University	en_US
dc.subject	Genetic diversity	en_US
dc.subject	Rapid drug resistance	en_US
dc.subject	Mycobacterium tuberculosis	en_US
dc.subject	Whole-genome sequence	en_US
dc.subject	Epidemiological data	en_US
dc.title	Understanding genetic diversity and rapid drug resistance prediction in mycobacterium tuberculosis from whole-genome sequence and other epidemiological data	en_US
dc.type	Thesis	en_US

Files in this item

Name:: Babirye-chs-msbt-2024.pdf
Size:: 1.487Mb
Format:: PDF
Description:: Master's dissertation

View/Open

This item appears in the following Collection(s)

School of Bio-Medical Sciences (Bio-Medical) Collections

Show simple item record