Authors: Md. Manzoor Murshed and Dr. Suely Oliveira
Database privacy or computer disclosure control is publishing anonymized data about individuals in such a way, that sensitive information about them cannot be revealed. Although data utility for the research community is the main reason for data publishing, several consequences of privacy violation make data privacy an important and urgent research topic. The major challenge is to store personal sensitive information in public databases in such a manner that balances society’s needs and also can guarantee privacy of the individual whose data are in the database. Sweeney  proposed the k-anonymity principle, which is mainly to hide every individual in a group of size k with respect to the non-sensitive attributes so that linking and identifying someone in other databases becomes difficult. k-anonymity not only ensures anonymity but also tries to minimize the information loss resulting from the generalization and suppression to guarantee it. Clustering and anonymizing similar data together can ensure minimum information loss. We proposed an efficient clustering algorithm MOKA  for k-anonymization that tries to minimize information loss and at the same time guarantees good and quality data for data mining and other related research. The main idea behind the proposed algorithm is to group similar and logically related data and records together at the same cluster as much as possible which naturally guarantees less information loss during generalization. In this research we are proposing a Genetic algorithm to improve the quality of the clusters generated by our MOKA algorithm.
Keywords: k-anonymity; Clustering; Genetic algorithm; Information Hiding; Computer disclosure control; Data Security.