A Survey of Data Anonymization Techniques for Privacy-Preserving Mining in Bigdata
- 1 SASTRA Deemed University, India
Abstract
Bigdata era is seeing the data burst occurring in a multitude of angles that are better expressed in terms of the 4Vs (Volume, Velocity, Velocity, Veracity). While trying to infer information from data, care should be exercised as not to reveal the identity of the data owner, which breaches the privacy rights. Leakage of information can happen right from the data collection point, at the data storage area, followed by the distribution of data to data users/miners and finally with published results. A cross-matching of all these points with the 4Vs (growing still) of big data, puts a huge challenge on how to extract the maximum possible information, without compromising on the privacy of the data owner. Anonymization of the original data should be done at one or more of the above-mentioned stages before the data are given for the mining process. This work makes a survey of the various anonymization techniques followed to transform the data in such a way that the privacy of the data owner is not compromised. Also, the sample data drawn should resemble and represent the original dataset in the maximum possible number of dimensions. The results of the various methodologies have been analyzed and the observations have been presented.
DOI: https://doi.org/10.3844/jcssp.2020.194.201
Copyright: © 2020 Helen Wilfred Raj and Santhi Balachandran. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 4,034 Views
- 2,203 Downloads
- 1 Citations
Download
Keywords
- Privacy-Preserving
- Anonymization
- Perturbation
- Generalization
- Dimensionality Reduction