Authors: Xiaoping Fan, Chao Wu and Yike Guo
Maximal information coefficient (MIC) is a novel measure of strength of bivariate associations. Since the original algorithm for MIC calculation is of high computational complexity, the authors of MIC therefore proposed an approximation algorithm with much higher efficiency. However, we found that the approximation algorithm underestimates MIC values for certain data sets. To resolve this problem, in this paper, we created two hybrid algorithms with more balanced efficiency and accuracy. We also implemented the original algorithm on Hadoop and Spark frameworks to enhance the efficiency. We tested and analyzed both solutions thoroughly. The work provides new options of MIC calculation, according to the specific requirement of accuracy and efficiency.
Keywords: Maximal information coefficient (MIC) Approximation algorithm Original algorithm
Hybrid algorithm Hadoop Spark