Show simple item record

dc.contributor.authorSuuna, Conrad
dc.date.accessioned2024-03-20T08:17:46Z
dc.date.available2024-03-20T08:17:46Z
dc.date.issued2023
dc.identifier.urihttp://hdl.handle.net/10570/13185
dc.description.abstractRecent research highlights the significant concern of gender bias in machine translation (MT) models, characterized by gender-stereotyped and discriminatory translations. Given the wide application of MT across domains, addressing this issue is crucial. While various approaches have been explored to mitigate gender bias in MT models, further understanding and solutions are needed. Our research focused on treating gender bias as a domain adaptation problem. To achieve this, we artificially crafted a parallel dataset of 446 occupation sentences in the format; the [occupation] finished [his/her] work. We used this to debias the AI-Lab-Makerere/lg en (original) model. We also collected and annotated data for six Personally Identifiable Information (PII) entities; userid, person, location, norp, org, and date. This was used to develop, evaluate and compare the performance of six named entity recognition models for PII anonymisation. Afro-xlmr-base performed better compared to other models, with a 0.81 F1 score and 95% accuracy. We integrated this model into the Microsoft Presidio pipeline and used it to effectively sanitize gender bias test data for the MT model. We debiased the original model by fine-tuning it with the occupation dataset, adjusting hyperparameters and applying Knowledge Distillation to control catastrophic forgetting. When evaluated on the sanitised test set, the final distilled model performed better at translating gendered sentences with +0.3 and +0.27 higher BLEU and Translation Gender Bias Index scores respectively. The results suggest that our technique is a promising technique for mitigating gender bias in MT with less data collection involved.en_US
dc.description.sponsorshipLacuna Fund, AI Laben_US
dc.language.isoenen_US
dc.publisherMakerere Universityen_US
dc.subjectGender biasen_US
dc.subjectMachine translationen_US
dc.subjectNatural language processingen_US
dc.subjectMachine learningen_US
dc.titleGender bias mitigation and evaluation in Luganda to English machine translation modelsen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record