Handling Missing Values in Numeric dataset Using Machine Learning Techniques:A Review

Authors

  • Kamaljeet Kaur
  • Er. Amrit Kaur
  • Dr. Navjot Kaur

Keywords:

Data mining, Data classification, K-Nearest Neighbor, Decision Tree, Support Vector Machine, Naïve Bayes.

Abstract

Data mining is essential for pre-processing task to ensure the quality of the finalproduct. These tasks include data preparation, cleaning, integration, transformation, reduction, and discretization. Missing values are a common problem that regularly occurs throughout the data cleaning process in various research fields.To complete missing values, eliminate noise and remove inconsistencies is an important process in the preparation of the data. This paper focuses on a review of several classification methods, including their benefits and shortcomings.It is used in a variety of industries, including internet marketing, healthcare, social networking, finance, and insurance. The accuracy of data imputation for machine learning classifiers such as Bayesian Networks, Decision Trees and K-Nearest Neighbors (KNN), as well as Support Vector Machines, is compared in this paper. Based on the findings, Bayesian appears to provide the most promising results when compared to the other classifiers. 

References

Agarwal Vivek, Research on Data Preprocessing and Categorization Technique for Smartphone Review Analysis, International Journal of Computer Applications, 131(4):30–36, 2015.

Jiawei Han, Micheline Kamber, Jian Pei Data Mining Concepts and Techniques, Third Edition.

Pragati Shrivastava, Hitesh Gupta, “A Review of Density-Based clustering in Spatial Data,” IJACR, vol. 2, pp. 200-202, September 2012.

Purwar A, Singh SK. Hybrid prediction model with missing value imputation for medical data. Expert Syst Appl 2015; 42:5621–31.

Hasan, M.K., Alam, M.A., Roy, S., Dutta, A., Jawad, M.T. and Das, S., 2021. Missing value imputation affects the performance of machine learning: A review and analysis of the literature (2010–2021). Informatics in Medicine Unlocked, 27, p.100799.

Somasundaram, R.S. and Nedunchezhian, R., 2011. Evaluation of three simple imputation methods for enhancing preprocessing of data with missing values. International Journal of Computer Applications, 21(10), pp.14-19.

Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B. and Tabona, O., 2021. A survey on missing data in machine learning. Journal of Big Data, 8(1), pp.1-37.

Donders, A.R.T., Van Der Heijden, G.J., Stijnen, T. and Moons, K.G., 2006. A gentle introduction to imputation of missing values. Journal of clinical epidemiology, 59(10), pp.1087-1091.

Acock, A.C., 2005. Working with missing values. Journal of Marriage and family, 67(4), pp.1012-1028.

Pathak, A. and Pathak, S., 2020. Study on decision tree and KNN algorithm for intrusion detection system. International Journal of Engineering Research & Technology, 9(5), pp.376-381.

García-Laencina, P.J., Sancho-Gómez, J.L. and Figueiras-Vidal, A.R., 2010. Pattern classification with missing data: a review. Neural Computing and Applications, 19(2), pp.263-282.

Jadhav, S.D. and Channe, H.P., 2016. Comparative study of K-NN, naive Bayes and decision tree classification techniques. International Journal of Science and Research (IJSR), 5(1), pp.1842-1845.

WIDYANANDA, W., PURNOMO, M.F.E., ASWIN, M., MUDJIRAHARDJO, P. and PRAMONO, S.H., 2022. DATASET MISSING VALUE HANDLING AND CLASSIFICATION USING DECISION TREE C5. 0 AND K-NN IMPUTATION: STUDY CASE CAR EVALUATION DATASET. Journal of Theoretical and Applied Information Technology, 100(12).

Saar-Tsechansky, M. and Provost, F., 2007. Handling missing values when applying classification models.

Rahman, M.M. and Davis, D.N., 2013. Machine learning-based missing value imputation method for clinical datasets. In IAENG transactions on engineering technologies (pp. 245-257). Springer, Dordrecht.

Sharma, S., Agrawal, J., Agarwal, S. and Sharma, S., 2013, December. Machine learning techniques for data mining: A survey. In 2013 IEEE international conference on computational intelligence and computing research (pp. 1-6). IEEE.

Kang, H., 2013. The prevention and handling of the missing data. Korean journal of anesthesiology, 64(5), pp.402-406.

Pigott, T.D., 2001. A review of methods for missing data. Educational research and evaluation, 7(4), pp.353-383.

Cheema, J.R., 2014. A review of missing data handling methods in education research. Review of Educational Research, 84(4), pp.487-508.

Dong, Y. and Peng, C.Y.J., 2013. Principled missing data methods for researchers. SpringerPlus, 2, pp.1-17.

Lin, T.H., 2010. A comparison of multiple imputation with EM algorithm and MCMC method for quality of life missing data. Quality & quantity, 44, pp.277-287.

Kaur, H. and Kumari, V., 2022. Predictive modelling and analytics for diabetes using a machine learning approach. Applied computing and informatics, 18(1/2), pp.90-100.

Essam, Y., Huang, Y.F., Ng, J.L., Birima, A.H., Ahmed, A.N. and El-Shafie, A., 2022. Predicting streamflow in Peninsular Malaysia using support vector machine and deep learning algorithms. Scientific Reports, 12(1), p.3883.

Alhroob, A., Alzyadat, W., Almukahel, I. and Altarawneh, H., 2020. Missing data prediction using correlation genetic algorithm and SVM approach. International Journal of Advanced Computer Science and Applications, 11(2).

Sivapriya, T.R., Kamal, A.N.B. and Thavavel, V., 2012. Imputation and classification of missing data using least square support vector machines–a new approach in dementia diagnosis. Int. J. Adv. Res. Artif. Intell, 1(4), pp.29-33.

Patel, H.H. and Prajapati, P., 2018. Study and analysis of decision tree based classification algorithms. International Journal of Computer Sciences and Engineering, 6(10), pp.74-78.

Rahman, M.M. and Davis, D.N., 2013. Machine learning-based missing value imputation method for clinical datasets. In IAENG Transactions on Engineering Technologies: Special Volume of the World Congress on Engineering 2012 (pp. 245-257). Springer Netherlands.

Gupta, B., Rawat, A., Jain, A., Arora, A. and Dhami, N., 2017. Analysis of various decision tree algorithms for classification in data mining. International Journal of Computer Applications, 163(8), pp.15-19.

Yudianto, M.R.A., Agustin, T., James, R.M., Rahma, F.I., Rahim, A. and Utami, E., 2021. Rainfall Forecasting to Recommend Crops Varieties Using Moving Average and Naive Bayes Methods. International Journal of Modern Education & Computer Science, 13(3).

Wickramasinghe, I. and Kalutarage, H., 2021. Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation. Soft Computing, 25(3), pp.2277-2293.

Downloads

Published

2024-02-28

How to Cite

Kamaljeet Kaur, Er. Amrit Kaur, & Dr. Navjot Kaur. (2024). Handling Missing Values in Numeric dataset Using Machine Learning Techniques:A Review. Journal Punjab Academy of Sciences, 23, 217–227. Retrieved from http://jpas.in/index.php/home/article/view/70