Authors: Mark Kevin Salloway, Xiaodong Deng, Shih Ling Kao and Chuen Seng Tan
With the increasing interest in the secondary usage of medical data for clinical monitoring and public health research, longitudinal datasets are becoming increasingly common where unique identifiers are repeated in multiple rows. Unfortunately, these datasets do not fit well with a simplistic approach for data masking, requiring more advanced procedures to ensure the replacement of repeated unmasked identifiers with masked ones are done appropriately. In this paper, we describe a plug-and-play tool for masking such datasets to reduce the barrier to exchange datasets with proper safeguards in place. This platform developed for masking can be extended with data analytic capabilities and features in the future.
Keywords: privacy protection; masking; data sharing; public health