A Cluster Feature-Based Incremental Clustering Approach to Mixed Data
Abstract
Problem statement: The main objective of this study is to develop an incremental clustering algorithm that can handle numerical as well as categorical attributes in a given dataset. The authors have previously reported a cluster feature-based algorithm, CFICA that can handle only numerical data. Appraoch: Since many of the real life data mining applications work with datasets that contain both numeric and categorical attributes, there is a need for modifying the earlier algorithm to handle such mixed datasets. The core idea is to propose a new distance measure based on the weight age which is automatically generated and apply it to incremental clustering algorithms. The incremental data points are handled in two phases. In the first phase, k-means clustering algorithm is employed for initial clustering of the static databse.In the second phase, the designed distance measure is used to generate the appropriate cluster for the incremental data points. The combination of the two has proved to be more effective in handling mixed datasets. Clustering accuracy, clustering error and the computational time of the proposed approach have been evaluated with different k values and the thresholds. Variation of threshold values showed better results in terms of accuracy for different datasets. Results: The clustering error in this approach reduced considerably with different k values and thresholds. Conclusion: The results ensure the efficiency of the proposed approach in handling real mixed datasets composed of numerical and categorical attributes only.
DOI: https://doi.org/10.3844/jcssp.2011.1875.1880
Copyright: © 2011 A. M. Sowjanya and M. Shashi. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,415 Views
- 2,897 Downloads
- 6 Citations
Download
Keywords
- Data mining
- cluster feature
- centroid
- farthest neighbor points
- mixed attributes
- numerical attributes
- categorical attributes
- incremental clustering
- k-means