Clustering high-dimensional data derived from Feature Selection Algorithm
Mohammad Raziuddin, T. Venkata Ramana, , ,
Affiliations Clustering high-dimensional data derived from Feature Selection AlgorithmProfessor & HOD, CSE, SLC's IET
Clustering high-dimensional data is the cluster analysis of data with anywhere from a few dozen to
many thousands of dimensions. Such high-dimensional data spaces are often encountered in areas such as
medicine, where DNA microarray technology can produce a large number of measurements at once, and the
clustering of text documents, where, if a word-frequency vector is used, the number of dimensions equals the size.
Feature selection is the process of identifying a subset of the most useful features that produces compatible results
as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and
effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the
effectiveness is related to the quality of the subset of features. Based on these criteria, a FAST clustering-based
feature Selection algorithm (FAST) is proposed and experimentally evaluated. Features in different clusters are
relatively independent; the clustering-based strategy of FAST has a high probability of producing a subset of useful
and independent features.
Mohammad Raziuddin,T. Venkata Ramana."Clustering high-dimensional data derived from Feature Selection Algorithm". International Journal of Computer Engineering In Research Trends (IJCERT) ,ISSN:2349-7084 ,Vol.2, Issue 09,pp.525-530, September - 2015, URL :https://ijcert.org/ems/ijcert_papers/V2I901.pdf,
 Almuallim H. and Dietterich T.G., Algorithms for Identifying Relevant Features, In Proceedings of the 9th Canadian Conference on AI, pp 38-45, 1992.
 Almuallim H. and Dietterich T.G., Learning boolean concepts in the presence of many irrelevant features, Artificial Intelligence, 69(1-2), pp 279- 305, 1994.
 Arauzo-Azofra A., Benitez J.M. and Castro J.L., A feature set measure based on relief, In Proceedings of the fifth international conference on Recent Advances in Soft Computing, pp 104-109, 2004.
 Baker L.D. and McCallum A.K., Distributional clustering of words for text classification, In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval, pp 96- 103, 1998.
 Battiti R., Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on Neural Networks, 5(4), pp 537- 550, 1994.
 Bell D.A. and Wang, H., formalism for relevance and its application in feature subset selection, Machine Learning, 41(2), pp 175-195, 2000.
 Biesiada J. and Duch W., Features election for high-dimensionaldatała Pearson redundancy based filter, AdvancesinSoftComputing, 45, pp 242C249, 2008.
 Butterworth R., Piatetsky-Shapiro G. and Simovici D.A., On Feature Selection through Clustering, In Proceedings of the Fifth IEEE international Conference on Data Mining, pp 581- 584, 2005.
 Cardie, C., Using decision trees to improve case-based learning, In Proceedings of Tenth International Conference on Machine Learning, pp 25-32, 1993.
 Chanda P., Cho Y., Zhang A. and Ramanathan M., Mining of Attribute Interactions Using Information Theoretic Metrics, In Proceedings of IEEE international Conference on Data Mining Workshops, pp 350-355, 2009.
 Chikhi S. and Benhammada S., ReliefMSS: a variation on a feature ranking ReliefF algorithm. Int. J. Bus. Intell. Data Min. 4(3/4), pp 375-390, 2009.
 Cohen W., Fast Effective Rule Induction, In Proc. 12th international Conf. Machine Learning (ICML’95), pp 115-123, 1995.
 Dash M. and Liu H., Feature Selection for Classification, Intelligent Data Analysis, 1(3), pp 131-156, 1997.
 Dash M., Liu H. and Motoda H., Consistency based feature Selection, In Proceedings of the Fourth Pacific Asia Conference on Knowledge Discovery and Data Mining, pp 98-109, 2000.
 Das S., Filters, wrappers and a boosting-based hybrid for feature Selection, In Proceedings of the Eighteenth International Conference on Machine Learning, pp 74-81, 2001.
 Dash M. and Liu H., Consistency-based search in feature selection. Artificial Intelligence, 151(1-2), pp 155-176, 2003.
 Demsar J., Statistical comparison of classifiers over multiple data sets, J. Mach. Learn. Res., 7, pp 1-30, 2006.
 Dhillon I.S., Mallela S. and Kumar R., A divisive information theoretic feature clustering algorithm for text classification, J. Mach. Learn. Res., 3, pp 1265-1287, 2003.
 Dougherty, E. R., Small sample issues for microarray-based classification. Comparative and Functional Genomics, 2(1), pp 28-34, 2001.
 Fayyad U. and Irani K., Multi-interval discretization of continuous-valued attributes for classification learning, In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp 1022-1027, 1993.
We have kept IJCERT is a free peer-reviewed scientific journal to endorse conservation. We have not put up a paywall to readers, and we do not charge for publishing. But running a monthly journal costs is a lot. While we do have some associates, we still need support to keep the journal flourishing. If our readers help fund it, our future will be more secure.