Utilization Of Cross-Terms To Enhance The Language Model For Information Retrieval

Main Article Content

Huda Mohammed Barakat
Maizatul Akmar Ismail
Sri Devi Ravana

Abstract

Traditional retrieval models were effective in the early stage of the Web; however, with the huge amount of information that is available on the Web today further optimization is required to enhance the performance of these models in extracting the most relevant information. Utilization of the term proximity is one of the techniques that have been introduced for this purpose by many researchers. It assumes that the words in the user query are correlated and thus proximity between them should be considered in the matching process. Density-based proximity is an effectual type of term proximity measures which is still not fully considered in the retrieval models. In this paper we investigate the application of a recent density-based measure called Cross-Terms which has achieved significant scores when applied on the effective BM25 retrieval model. We applied cross-terms on another effective retrieval model that is the Language Modeling Approach. The performance of the enhanced language model was measured and evaluated through several experiments and metrics. Experiments results show that the cross-terms measure was able to improve the performance of the basic language model in all the applied evaluation metrics. Performance improvement reached (+4%) with the MAP metric and (+8%) with P@5 and P@20 metrics.

Downloads

Download data is not yet available.

Article Details

How to Cite
Barakat, H. M., Ismail, M. A., & Ravana, S. D. (2013). Utilization Of Cross-Terms To Enhance The Language Model For Information Retrieval. Malaysian Journal of Computer Science, 26(3), 196–210. Retrieved from https://ejournal.um.edu.my/index.php/MJCS/article/view/6772
Section
Articles