Authors: Revathy Padmanaban and Rajeswari Mukesh
Abstract: Hadoop has become one of the key player in offering data analytics and data processing support for any organization that handles different shades of data management. Considering the current security offerings of Hadoop, companies are concerned of building a single large cluster and onboarding multiple projects on to the same common Hadoop cluster. Security vulnerability and privacy invasion due to malicious attackers or inner users are the main argument points in any Hadoop implementation. In particular, various types of security vulnerability occur due to the mode of data placement in Hadoop Cluster. When sensitive information is accessed by an unauthorized user or misused by an authorized person, they can compromise privacy. In this paper, we intend to address the approach of data placement across distributed DataNodes in a secure way by considering the sensitivity and security of the underlying data. Our data placement strategy aims to adaptively distribute the data across the cluster using advanced machine learning techniques to realize a more secured data/infrastructure. The data placement strategy discussed in this paper is highly extensible and scalable to suit different sort of sensitivity/security requirements.
Keywords: Big Data; Hadoop; Security Measures; Data Block Placement; Sensitive Data placement; Multi-tenancy in Hadoop