High Dimensional Data Clustering with Hub Based DEC

Main Article Content

Ghatage Trupti B.
Takmare Sachin B.

Abstract

Clustering is an important topic in various fields like machine learning and data mining. In many real applications, we often face very high dimensional data. Many dimensions are not always helpful or may even worsen the performance of the subsequent clustering algorithms. To deal with this problem one way is to employ first dimensionality reduction and then apply clustering. But if we consider the requirement of clustering in the process of dimensionality reduction and vice versus then the performance of clustering will be improved. Discriminative Embedded Clustering (DEC) is an algorithm that combines clustering and subspace learning. Hubness is the tendency of high dimensional data to have hubs. Hubs are situated near cluster centeres; therefore major hubs can be successfully used as cluster prototypes or guide during centroid based configurations. Use of hubness for clustering leads to improvement over centroid-based approaches. In this paper we propose a system for clustering high dimensional data using Discriminative Embedding Method with Hub based clustering.

Article Details

How to Cite
[1]
Ghatage Trupti B. and Takmare Sachin B., “High Dimensional Data Clustering with Hub Based DEC”, Int. J. Comput. Eng. Res. Trends, vol. 3, no. 2, pp. 62–66, Feb. 2016.
Section
Research Articles

References

J. Han and M. Kamber, “Data Mining: Concepts and Techniques”, second ed. Morgan Kaufmann, 2006.

K. Kailing, H. P. Kriegel, P. Kroegerp, and S. Wanka, “Ranking Interesting Subspaces for Clustering High Dimensional Data,” Proc. Seventh European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 241-252, 2003.

C.C. Aggarwal and P.S. Yu, “Finding Generalized Projected Clusters in High Dimensional Spaces,” Proc. 26th ACM SIGMOD Int’l Conf. Management of Data, pp. 70-81, 2000.

Chenping Ho, Feiping Nie, Dongyun Yi, and Dacheng Tao, “Discriminative Embedded Clustering: A Framework for Grouping High-Dimensional Data”, IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 6, pp.1287-1299, June 2015.

M. Radovanovic, A. Nanopoulos, and M. Ivanovic, “Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data”, J. Machine Learning Research, vol. 11, pp. 2487-2531, 2010.

Nenad Tomasev, Milos Radovanovic, Dunja Mladenic, and Mirjana Ivanovi, “The Role of Hubness in Clustering High-Dimensional Data”, IEEE Trans. Knowledge and Data Eng., vol. 26, no. 3, pp.739-751, March 2014.

J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, Aug. 2000.

L. Parsons, E. Haque, and H. Liu, “Subspace clustering for high dimensional data: A review”, ACM SIGKDD Explorations Newslett., vol. 6, no. 1, pp. 90-105, 2004.

F. De La Torre and T. Kanade, “Discriminative cluster analysis”, in Proc. ICML, 2006, pp. 241-248.

R. O. Duda, P. E. Hart, and D. G. Stork, “Pattern Classification”, 2nd ed.New York, NY, USA: Wiley, 2000.