Covid-19 Prediction Using Enhanced KNN Imputation for Data Pre-Processing

Abstract
In the wake of the recent Coronavirus Disease 2019 (COVID-19) pandemic, health care systems all over the world have been heavily affected. The rapid detection of COVID-19 has emerged as a top priority for global health systems to prevent its spread. Although Reverse Transcription-Polymerase Chain Reaction (RT-PCR) is still the method of choice for COVID-19 detection, the potential of blood test data in predictive modelling is currently being utilized gradually. In this study, we investigated the effectiveness of machine learning models for detecting COVID-19 from blood test data, with a particular emphasis on the pre-processing step involving Enhanced K-Nearest Neighbours (KNN) Imputation. By utilizing Enhanced KNN Imputation, our methodology sought to provide a more robust and precise imputation of missing values in blood test datasets. Support Vector Machine – Recursive Feature Elimination (SVM-RFE) based feature selection has been utilized to identify the most significant features. Then, we trained 5 different machine learning classifiers using both traditionally imputed and Enhanced KNN imputed data. Based on the experimental results, Random Forest model outperformed other classifiers using dataset imputed with Enhanced KNN imputation with an accuracy of 80% with all the features. The same methodology has been carried out with the exclusion of GENDER feature and as a result SVM model achieved an accuracy of 84%. The study suggests that the combination of Enhanced KNN Imputation and machine learning could be a valuable tool for COVID-19 detection, potentially aiding in faster and more accurate diagnosis.
Keywords: COVID-19, Machine learning, KNN imputation, Pre-processing, Random forest, SVM.

Author(s): Hari Priya N*, Rajeswari S
Volume: 5 Issue: 1 Pages: 714-728
DOI: https://doi.org/10.47857/irjms.2024.v05i01.0345