Review Article Open Access

A Survey of Data Anonymization Techniques for Privacy-Preserving Mining in Bigdata

Helen Wilfred Raj1 and Santhi Balachandran1
  • 1 SASTRA Deemed University, India

Abstract

Bigdata era is seeing the data burst occurring in a multitude of angles that are better expressed in terms of the 4Vs (Volume, Velocity, Velocity, Veracity). While trying to infer information from data, care should be exercised as not to reveal the identity of the data owner, which breaches the privacy rights. Leakage of information can happen right from the data collection point, at the data storage area, followed by the distribution of data to data users/miners and finally with published results. A cross-matching of all these points with the 4Vs (growing still) of big data, puts a huge challenge on how to extract the maximum possible information, without compromising on the privacy of the data owner. Anonymization of the original data should be done at one or more of the above-mentioned stages before the data are given for the mining process. This work makes a survey of the various anonymization techniques followed to transform the data in such a way that the privacy of the data owner is not compromised. Also, the sample data drawn should resemble and represent the original dataset in the maximum possible number of dimensions. The results of the various methodologies have been analyzed and the observations have been presented.

Journal of Computer Science
Volume 16 No. 2, 2020, 194-201

DOI: https://doi.org/10.3844/jcssp.2020.194.201

Submitted On: 15 July 2019 Published On: 31 December 2019

How to Cite: Raj, H. W. & Balachandran, S. (2020). A Survey of Data Anonymization Techniques for Privacy-Preserving Mining in Bigdata. Journal of Computer Science, 16(2), 194-201. https://doi.org/10.3844/jcssp.2020.194.201

  • 4,034 Views
  • 2,203 Downloads
  • 1 Citations

Download

Keywords

  • Privacy-Preserving
  • Anonymization
  • Perturbation
  • Generalization
  • Dimensionality Reduction