HANDLING MISSING VALUES IN NUMERIC DATASET USING MACHINE LEARNING TECHNIQUES: A REVIEW
Keywords:
Data mining, Data classification, K-Nearest Neighbor, Decision Tree, Support Vector Machine, Naïve BayesAbstract
Data mining is essential for pre-processing task to ensure the quality of the final product. These tasks include data preparation, cleaning, integration, transformation, reduction, and discretization. Missing values are a common problem that regularly occurs throughout the data cleaning process in various research fields. To complete missing values, eliminate noise and remove inconsistencies is an important process in the preparation of the data. This paper focuses on a review of several classification methods, including their benefits and shortcomings. It is used in a variety of industries, including internet marketing, healthcare, social networking, finance, and insurance. The accuracy of data imputation for machine learning classifiers such as Bayesian Networks, Decision Trees and K-Nearest Neighbors (KNN), as well as Support Vector Machines, is compared in this paper. Based on the findings, Bayesian appears to provide the most promising results when compared to the other classifiers.