PROGRESSIVE DUPLICATE DETECTION
Main Article Content
Abstract
One of the difficult issues confronted in a few applications with individual subtle elements administration, client alliance administration, information mining, and so on is copy location. This overview manages the different copy record identification strategies in both little and substantial datasets. To identify the deception with less time of execution furthermore without exasperating the dataset quality, strategies like Progressive Blocking and Progressive Neighborhood are utilized. Progressive sorted neighborhood method likewise called as PSNM is utilized as a part of this model for finding or recognizing the copy in a parallel methodology. Progressive Blocking calculation takes a shot at huge datasets where discovering duplication requires massive time. These calculations are utilized to improve copy location framework. The productivity can be multiplied over the ordinary copy recognition technique utilizing this calculation. A few distinct strategies for information examination are considered here with different methodologies for copy discovery.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
IJCERT Policy:
The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.
By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.
References
"Data Mining Curriculum". ACM SIGKDD. 2006-04- 30. Retrieved 2014-01-27.
Fayyad, Usama; Piatetsky-Shapiro, Gregory; Smyth, Padhraic (1996). "From Data Mining to Knowledge Discovery in Databases" (PDF). Retrieved 17 December 2008.
Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). "The Elements of Statistical Learning: Data Mining, Inference, and Prediction". Retrieved 2012-08- 07.
Witten, Ian H.; Frank, Eibe; Hall, Mark A. (30 January 2011). Data Mining: Practical Machine Learning Tools and Techniques (3 ed.). Elsevier. ISBN 978-0-12- 374856-0.
Think Before You Dig: Privacy Implications of Data Mining & Aggregation, NASCIO Research Brief, September 2004
Clifton, Christopher (2010). "Encyclopædia Britannica: Definition of Data Mining". Retrieved 2010-12-09.
M. A. Hern{ndez and S. J. Stolfo, “Realworld data is dirty: Data cleansing and the merge/purge problem,” Data Mining and Knowledge Discovery, vol. 2, no. 1, 1998
Thorsten Papenbrock, Arvid Heise, and Felix Naumann,’ Progressive Duplicate Detection’ IEEE Transactions on Knowledge and Data Engineering(TKDE),vol . 25, no. 5, 2014.
A.K. Elmagarmid, P. G. Ipeirotis, and V. S.Verykios, “Duplicate record detection: Asurvey,” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 19, no. 1, 2007.
S. E. Whang, D. Marmaros, and H. GarciaMolina, “Pay-as-you-go entity resolution,” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 25, no. 5, 2012.
U. Draisbach, F. Naumann, S. Szott, and O. Wonneberg, “Adaptive windows for duplicatedetection,” in Proceedings of the International Conference on Data Engineering (ICDE), 2012.