Authors: Chuen-Min Huang, Yi-Hua Li
With the effectiveness of Hierarchical Agglomerative Clustering (HAC) method has been well recognized, the limitation in processing large data sets makes it lose its superiority when efficiency issue is taken into serious consideration. In this study, we propose a Revised Hierarchical Agglomerative Clustering (RHAC) method based on the notation of K-way to reduce tree height and time complexity. The Latent Semantic Analysis (LSA) is used to improve the precision ratio of clustering. Three major experiments including dimension reduction, average link distance and precision comparison are conducted. Our study shows that the precision of RHAC is higher than 0.99 and the entropy is less than 0.003. The effect of utilizing LSA on precision improvement of clusters is also positive. It is discovered that the size of the data sets size doesn’t influence RHAC efficiency during the run time. The result shows that the performance of RHAC is better than that of HAC.