PROGRESSIVE DUPLICATE DETECTION

Main Article Content

Mr .BETKAR AKSHAY SURESH
Mrs. N.SUJATHA

Abstract

One of the difficult issues confronted in a few applications with individual subtle elements administration, client alliance administration, information mining, and so on is copy location. This overview manages the different copy record identification strategies in both little and substantial datasets. To identify the deception with less time of execution furthermore without exasperating the dataset quality, strategies like Progressive Blocking and Progressive Neighborhood are utilized. Progressive sorted neighborhood method likewise called as PSNM is utilized as a part of this model for finding or recognizing the copy in a parallel methodology. Progressive Blocking calculation takes a shot at huge datasets where discovering duplication requires massive time. These calculations are utilized to improve copy location framework. The productivity can be multiplied over the ordinary copy recognition technique utilizing this calculation. A few distinct strategies for information examination are considered here with different methodologies for copy discovery.

Article Details

How to Cite
[1]
Mr .BETKAR AKSHAY SURESH and Mrs. N.SUJATHA, “PROGRESSIVE DUPLICATE DETECTION”, Int. J. Comput. Eng. Res. Trends, vol. 3, no. 6, pp. 284–288, Jun. 2016.
Section
Research Articles

References

"Data Mining Curriculum". ACM SIGKDD. 2006-04- 30. Retrieved 2014-01-27.

Fayyad, Usama; Piatetsky-Shapiro, Gregory; Smyth, Padhraic (1996). "From Data Mining to Knowledge Discovery in Databases" (PDF). Retrieved 17 December 2008.

Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). "The Elements of Statistical Learning: Data Mining, Inference, and Prediction". Retrieved 2012-08- 07.

Witten, Ian H.; Frank, Eibe; Hall, Mark A. (30 January 2011). Data Mining: Practical Machine Learning Tools and Techniques (3 ed.). Elsevier. ISBN 978-0-12- 374856-0.

Think Before You Dig: Privacy Implications of Data Mining & Aggregation, NASCIO Research Brief, September 2004

Clifton, Christopher (2010). "Encyclopædia Britannica: Definition of Data Mining". Retrieved 2010-12-09.

M. A. Hern{ndez and S. J. Stolfo, “Realworld data is dirty: Data cleansing and the merge/purge problem,” Data Mining and Knowledge Discovery, vol. 2, no. 1, 1998

Thorsten Papenbrock, Arvid Heise, and Felix Naumann,’ Progressive Duplicate Detection’ IEEE Transactions on Knowledge and Data Engineering(TKDE),vol . 25, no. 5, 2014.

A.K. Elmagarmid, P. G. Ipeirotis, and V. S.Verykios, “Duplicate record detection: Asurvey,” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 19, no. 1, 2007.

S. E. Whang, D. Marmaros, and H. GarciaMolina, “Pay-as-you-go entity resolution,” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 25, no. 5, 2012.

U. Draisbach, F. Naumann, S. Szott, and O. Wonneberg, “Adaptive windows for duplicatedetection,” in Proceedings of the International Conference on Data Engineering (ICDE), 2012.