Publications
|
![]() |
Gabriel Kiarie, Lorna Mugambi, Jason Kabi, Ciira wa Maina 2025 IST-Africa Conference (IST-Africa), May, 2025. Paper Class Insecta make up about 40 % of the terrestrial animal biomass. Due to biomass size and incredible adaptability to various stresses in the environment such as changes in temperature and predators, insects can be used as ecosystems health indicator species when closely monitored. This paper details the deployment of an autonomous insect monitoring system, and the development and implementation of an image-processing based insect count algorithm. The Autonomous Monitoring of Insect (AMI) system was developed by the UK Centre for Ecology and Hydrology and deployed at the Dedan Kimathi University Wildlife Conservancy in Kenya. The insect-count algorithm was able to count insects, averaging 10 per night over two months, with peaks between 8.00 pm and midnight. Initial challenges included images with overlapping insects, which affected accuracy, highlighting the need for more refined count algorithms. Future work will expand to species classification and behavioural analysis, particularly moths. By providing continuous and autonomous data, the AMI system offers a scalable tool to assess biodiversity trends which inform ecosystem management in Kenya. |
![]() |
Lorna Mugambi, Ciira wa Maina, Liesl Zühlke Journal of Imaging, March, 2025. Paper Rheumatic heart disease (RHD) poses a significant global health challenge, necessitating improved diagnostic tools. This study investigated the use of self-supervised multi-task learning for automated echocardiographic analysis, aiming to predict echocardiographic views, diagnose RHD conditions, and determine severity. We compared two prominent self-supervised learning (SSL) methods: DINOv2, a vision-transformer-based approach known for capturing implicit features, and simple contrastive learning representation (SimCLR), a ResNet-based contrastive learning method recognised for its simplicity and effectiveness. Both models were pre-trained on a large, unlabelled echocardiogram dataset and fine-tuned on a smaller, labelled subset. DINOv2 achieved accuracies of 92% for view classification, 98% for condition detection, and 99% for severity assessment. SimCLR demonstrated good performance as well, achieving accuracies of 99% for view classification, 92% for condition detection, and 96% for severity assessment. Embedding visualisations, using both Uniform Manifold Approximation Projection (UMAP) and t-distributed Stochastic Neighbor Embedding (t-SNE), revealed distinct clusters for all tasks in both models, indicating the effective capture of the discriminative features of the echocardiograms. This study demonstrates the potential of using self-supervised multi-task learning for automated echocardiogram analysis, offering a scalable and efficient approach to improving RHD diagnosis, especially in resource-limited settings. |
![]() |
Antony M. Gitau, Victor Ruto, Yuri Njathi Lorna Mugambi, Victory A. Sitati, Austin Kaburia Computing in Cardiology Conference, December, 2024. Paper Despite the rise of digital electrocardiogram (ECG) technology, paper-based ECGs continue to be prevalent, especially in underrepresented and underserved communities. This paper presents the DSAIL team's participation in the George B. Moody PhysioNet Challenge 2024 to develop an open-source algorithm for classifying ECG images. We fine-tuned a pre-trained InceptionV3 model on the PTB-XL dataset, comprising 21,799 12-lead ECG recordings, supplemented with synthetic ECG images from the ECG-Image-Kit. The model was trained using 80% of these images, reserving 20% for validation. Our choice of the InceptionV3 architecture leverages its capability to effectively capture local and global features, which is crucial for the inherent variability in ECG image patterns. The model achieved a validation macro F-measure score of 0.429 on a dataset accessible only to the organizers, securing 6th place on the official classification leaderboard. However, the algorithm struggled with mobile phone images of stained, deteriorated, and cleaned ECGs, yielding a low F-score of 0.08. In contrast, it performed significantly better on color scans of clean and deteriorated paper ECGs, achieving an F-score of 0.5. Although further improvements are necessary, neural network-based algorithms demonstrate promising potential for enhancing access to ECG-based diagnosis and cardiac care. |
![]() |
Lorna Mugambi, Gabriel Kiarie, Jason Kabi, Ciira wa Maina, Suvodeep Mazumdar Workshop Report: Research Ideas and Outcomes, October, 2024. Paper The DSAIL-GeJuSTA Data Science Education Workshop was a joint initiative by the Centre for Data Science and Artificial Intelligence (DSAIL) and Gender Justice in STEM Research in Africa (GeJUSTA). GeJUSTA is a programme funded by the International Development Research Centre (IDRC) that is working towards increasing the representation of women in STEM. The workshop was held on 9 November 2023, during the 7th DeKUT International Conference on Science, Technology, Innovation and Entrepreneurship (STI&E) at Dedan Kimathi University of Technology (DeKUT). The conference ran from 8-10 November 2023. The event successfully convened 31 participants. The composition of the attendees was diverse, ranging from data-science educators, industry participants using data science, researchers who use data science and students in a myriad of courses, including engineering and pharmacy. The primary focus of the workshop was to have a discussion with the attendees and share practices around designing data-science curriculum, strategies for achieving gender equity in data-science education, addressing new technological challenges in education and fostering multidisciplinary approaches to data-science education. This report encapsulates the collective vision of the workshop participants, whose contributions have set the stage for progressive strides in data-science education. |
![]() |
Gabriel Kiarie, Lorna Mugambi, Jason Kabi, Ciira wa Maina 7th DeKUT International Conference on Science, Technology, Innovation and Entrepreneurship, November, 2023. Paper Girls and women have consistently been underrepresented in most Science, Technology, Engineering, and Mathematics (STEM) professions, necessitating research. There is a need to define and execute measures and policies to help reduce this gap. The Centre for Data Science and Artificial Intelligence (DSAIL), in collaboration with Gender Justice in STEM Research in Africa (GeJuSTA), is conducting studies to analyse a the representation of women in STEM in Africa. The study will be used to guide the development of policies and curricula aimed at bridging the gap of women representation in STEM. The methods used in this study are analysing the genders of members of staffs in STEM faculties from African universities; analysing the genders of STEM-papers’ authors from African universities and; conducting literature review to evaluate existing measures that have been put in place to encourage and enable women to join STEM professions. Preliminary results show that women are underrepresented in STEM fields in Africa. |
![]() |
Gabriel Kiarie, Jason Kabi, Lorna Mugambi, Ciira wa Maina 2023 IEEE AFRICON, September. Paper Machine learning is being adopted in many walks of life to solve various problems. This is being driven by development of robust machine learning algorithms, availability of large datasets and low-cost computation resources. Some machine learning applications require deployment of devices off-the-grid for data collection and processing. Such applications require development of systems that can operate autonomously during their deployment. This paper presents how some open-source boards have been leveraged for off-grid data collection and machine learning. Advancement in technology has seen development of low-cost and low-power open-source boards that can be interfaced with a wide array of sensors for data collection and can perform computation processes. The boards are finding wide applications in data collection and machine learning initiatives. A wide array of open source boards exists in the market. The boards can generally be divided into micro controllers, single board computers and field programmable gate arrays. These boards have different properties in terms of processing capabilities, power consumption, and communication interfaces and features. For off-grid data collection and machine learning tasks, resources such as power and network for communication are limited in most cases. These factors should be considered when choosing boards for off-grid deployment tasks. The boards chosen should optimise the use of these resources while meeting the processing capabilities required for the tasks at hand. |
![]() |
Yuri Njathi, Lians Wanjiku, Lorna Mugambi, Jason Kabi, Gabriel Kiarie, Ciira wa Maina 2023 IEEE AFRICON, September. Paper Using camera traps to acquire wildlife images is becoming more common within conservancies. The information provided by these camera traps enhances understanding of wildlife behaviour and population patterns. The detection and counting of animals present in each of the captured images is valuable information as it can be used to guide conservation efforts. Manual annotation of these wildlife images is a tedious painful process. It is becoming more common to use tools that either use AI to annotate camera trap datasets or use AI to aid in annotation. These AI tools are usually trained on species endemic to a particular region. The ability to fine-tune such models to species endemic to one's particular region is important to save much of the time conservationists manually look through the misclassified images. In this paper, we present a case study where we used a YOLOv5 object detection model trained to detect the presence and count the number of impala and other animals from a dataset collected by researchers at the Dedan Kimathi University of Technology Conservancy. We analyze the results of the AI's performance with respect to a manually annotated dataset. The model was able to annotate 72% of the dataset at a human level of accuracy. The work here shows promise with regard to time spent labelling camera trap images by leveraging the presence of particular species to auto-annotate a majority of the dataset. |
![]() |
Lorna Mugambi, Ciira wa Maina, Liesl Zühlke 2022 IST-Africa Conference (IST-Africa), May, 2022. Paper Rheumatic Heart Disease is a cardiovascular disease highly prevalent in developing countries partially because of inadequate healthcare infrastructure to treat Group A streptococcus pharyngitis and thereafter diagnose and document every case of Acute Rheumatic Fever, the immune-mediated antecedent of rheumatic heart disease. Secondary antibiotic treatment with penicillin injections after a diagnosis of Acute Rheumatic Fever and Rheumatic Heart Disease is used to prevent further attacks of Strep A, preferably prior to any heart valve damage. Echocardiographic screening for early detection of Rheumatic Heart Disease has been proposed as a method to improve outcomes but it is time-consuming, costly and few people are skilled enough to reach a correct diagnosis. Machine Learning is an emerging tool in analysing medical images; our aim is to automate the screening process of diagnosing rheumatic heart disease. In this paper, we present a web application to be used to label echocardiography data. These labelled data can then be used to develop machine learning models that can classify echocardiographic views of the heart and damaged valves from the echocardiograms. |
![]() |
Lorna Mugambi, Gabriel Kiarie, Jason Kabi, Ciira wa Maina Mendeley Data, March 2022. Dataset This dataset has camera trap images of wildlife species from a conservancy in Kenya and their annotation. They are based on the Raspberry Pi 2, Raspberry Pi Zero and the OpenMV Cam H7 devices. The camera traps were deployed in the conservancy from June 2021 to December 2021. We have 6 categories of grazing mammals in this dataset; Burchell's zebra, Defassa waterbuck, bushbuck, Common warthog, impala and the Syke's monkey. |