Context Based XML Data and Diversification for Keyword Search Queries
Main Article Content
Abstract
In searching process user enter particular candidate searching keyword and with the help of searching algorithm respective searching query is executed on targeted dataset and result is return as an output of that algorithm. In this case it is expected that meaningful keyword has to be entered by user to get appropriate result set. In case of confusing bunch of keywords or ambiguity in it or short and indistinctness in it causes an irrelevant searching result. Also searching algorithms works on exact result fetching which can be irrelevant in case problem in input query and keyword. This problem statement is focused in this system. By considering the keyword and its relevant context in XML data , searching should be done using automatically diversification process of XML keyword search. In this way system may satisfy user, as user gets the analytical result set based on context of searching keywords. For more efficiency and to deal with big data, HADOOP platform is used. baseline efficient algorithms are proposed to incrementally compute top-k qualified query candidates as the diversified search intentions. Compare selection criteria are targeted: the k selected query candidates are most relevant to the given query while they have to cover maximal number of distinct results on real and synthetic data sets demonstrates the effectiveness diversification model and the efficiency of algorithms
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
IJCERT Policy:
The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.
By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.
References
J. G. Carbonell and J. Goldstein, “The use of MMR, diversitybased reranking for reordering documents and producing summaries,”in Proc. SIGIR, 1998, pp. 335–336.
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong, “Diversifying search results,” in Proc. 2nd ACM Int. Conf. Web Search Data Mining, 2009, pp. 5–14.
H. Chen and D. R. Karger, “Less is more: Probabilistic models for retrieving fewer relevant documents,” in Proc. SIGIR, 2006, pp. 429–436.
C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. B€uttcher, and I. MacKinnon, “Novelty and diversity in information retrieval evaluation,” in Proc. SIGIR, 2008, pp. 659– 666.
F. Radlinski and S. T. Dumais, “Improving personalized web search using result diversification,” in Proc. SIGIR, 2006, pp. 691– 692.
Z. Liu, P. Sun, and Y. Chen, “Structured search result differentiation,” J. Proc. VLDB Endowment, vol. 2, no. 1, pp. 313– 324, 2009.
E. Demidova, P. Fankhauser, X. Zhou, and W. Nejdl, “DivQ:Diversification for keyword search over structured databases,” inProc. SIGIR, 2010, pp. 331–338.
N. Sarkas, N. Bansal, G. Das, and N. Koudas, “Measure-driven keyword-query expansion,” J. Proc. VLDB Endowment, vol. 2,no. 1, pp. 121– 132, 2009.
N. Bansal, F. Chiang, N. Koudas, and F. W. Tompa, “Seeking stable clusters in the logosphere,” in Proc. 33rd Int. Conf. Very Large Data Bases, 2007, pp. 806–817.
S. Brin, R. Motwani, and C. Silverstein, “Beyond market baskets:Generalizing association rules to correlations,” in Proc. SIGMOD Conf., 1997, pp. 265–276.
W. DuMouchel and D. Pregibon, “Empirical bayes screening for multi-item associations,” in Proc. 7th ACM SIGKDD Int. Conf