High Dimensional Data Clustering with Hub Based DEC
Main Article Content
Abstract
Clustering is an important topic in various fields like machine learning and data mining. In many real applications, we often face very high dimensional data. Many dimensions are not always helpful or may even worsen the performance of the subsequent clustering algorithms. To deal with this problem one way is to employ first dimensionality reduction and then apply clustering. But if we consider the requirement of clustering in the process of dimensionality reduction and vice versus then the performance of clustering will be improved. Discriminative Embedded Clustering (DEC) is an algorithm that combines clustering and subspace learning. Hubness is the tendency of high dimensional data to have hubs. Hubs are situated near cluster centeres; therefore major hubs can be successfully used as cluster prototypes or guide during centroid based configurations. Use of hubness for clustering leads to improvement over centroid-based approaches. In this paper we propose a system for clustering high dimensional data using Discriminative Embedding Method with Hub based clustering.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
IJCERT Policy:
The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.
By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.
References
J. Han and M. Kamber, “Data Mining: Concepts and Techniques”, second ed. Morgan Kaufmann, 2006.
K. Kailing, H. P. Kriegel, P. Kroegerp, and S. Wanka, “Ranking Interesting Subspaces for Clustering High Dimensional Data,” Proc. Seventh European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 241-252, 2003.
C.C. Aggarwal and P.S. Yu, “Finding Generalized Projected Clusters in High Dimensional Spaces,” Proc. 26th ACM SIGMOD Int’l Conf. Management of Data, pp. 70-81, 2000.
Chenping Ho, Feiping Nie, Dongyun Yi, and Dacheng Tao, “Discriminative Embedded Clustering: A Framework for Grouping High-Dimensional Data”, IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 6, pp.1287-1299, June 2015.
M. Radovanovic, A. Nanopoulos, and M. Ivanovic, “Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data”, J. Machine Learning Research, vol. 11, pp. 2487-2531, 2010.
Nenad Tomasev, Milos Radovanovic, Dunja Mladenic, and Mirjana Ivanovi, “The Role of Hubness in Clustering High-Dimensional Data”, IEEE Trans. Knowledge and Data Eng., vol. 26, no. 3, pp.739-751, March 2014.
J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, Aug. 2000.
L. Parsons, E. Haque, and H. Liu, “Subspace clustering for high dimensional data: A review”, ACM SIGKDD Explorations Newslett., vol. 6, no. 1, pp. 90-105, 2004.
F. De La Torre and T. Kanade, “Discriminative cluster analysis”, in Proc. ICML, 2006, pp. 241-248.
R. O. Duda, P. E. Hart, and D. G. Stork, “Pattern Classification”, 2nd ed.New York, NY, USA: Wiley, 2000.