DOI: 10.5176/978-981-08-6308-1_62
Authors: Marvin L. Brown, Chien-Hua Mike Lin
Abstract:
The purpose of this paper is to investigate the impact of missing data and data imputation on the data mining phase of knowledge discovery when neural networks employing an s-Sigmoid Transfer function are utilized. While studies have been conducted independently in the areas knowledge discovery, missing data and data imputation, only a few have integrated all three dimensions. This research explores the impact of data missingness at various increasing levels in KDD (knowledge discovery databases) models that contain various volumes of case frequencies that employ neural networks as the data mining algorithm. Four of the most commonly utilized data imputation methods – Case Deletion, Mean Substitution, Regression Imputation, and Multiple Imputation, are used to determine their effectiveness in dealing with the issue of data missingness.
