Query Aware Determinization of Uncertain Objects
Main Article Content
Abstract
The main aim of this paper is to think about the trouble of determining probabilistic data to allow such data to be stored in legacy systems that agree only deterministic input. Probabilistic data may be produced by mechanized data analysis methods such as entity resolution, information extraction, and speech processing etc. The target is to make a deterministic depiction of probabilistic data that optimizes the excellence of the end-application built on deterministic data. We discover such a determinization problem in the background of two dissimilar data processing jobs – selection and triggers queries. Here approaches such as thresholding or top-1 selection usually used for determinization lead to suboptimal presentation for such applications. As an alternative, we expand a query-aware strategy and demonstrate its rewards over existing solutions through a complete empirical evaluation over real and synthetic datasets.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
IJCERT Policy:
The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.
By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.
References
D. V. Kalashnikov, S. Mehrotra, J. Xu, and N. Venkatasubramanian, “A semantics-based approach for speech annotation of images,” TKDE’11.
J. Li and J. Wang, “Automatic linguistic indexing of pictures by a statistical modeling approach,” PAMI’03.
C. Wangand, F. Jing, L. Zhang, and H. Zhang, “Image annotation refinement using random walk with restarts,” ACM Multimedia’06.
B. Minescu, G. Damnati, F. Bechet, and R. de Mori, “Conditional use of word lattices, confusion networks and 1-best string hypotheses in a sequential interpretation strategy,” ICASSP’07.
R. Nuray-Turan, D. V. Kalashnikov, S. Mehrotra, and Y. Yu, “Attribute and object selection queries on objects with probabilistic attributes,” ACM TODS’11.
J. Li and A. Deshpande, “Consensus answers for queries over proba-bilistic databases,” PODS’09.
M. B. Ebarhimi and A. A. Ghorbani, “A novel approach for frequent phrase mining in web search engine query streams,” CNSR ’07.
S. Bhatia, D. Majumdar, and P. Mitra, “Query suggestions in the absence of query logs,” SIGIR ’11.
C. Manning and H. Schutze, Foundations of Statistical Natural Lan-guage Processing. MIT Press, 1999.
D. V. Kalashnikov and S. Mehrotra, “Domainindependent data cleaning via analysis of entityrelationship graph,” ACM TODS’06.
K. Schnaitter, S. Abiteboul, T. Milo, and N. Polyzotis, “On-line index selection for shifting workloads,” SMDB’07.
P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann, “Predictable performance for unpredictable workloads,” VLDB’09.
R. Cheng, J. Chen, and X. Xie, “Cleaning uncertain data with quality guarantees,” PVLDB’08. V. Jojic, S. Gould, and D. Koller, “Accelerated dual decomposition for map inference,” ICML ’1