Relevance Judgments Exclusive of Human Assessors in Large Scale Information Retrieval Evaluation Experimentation

Main Article Content

Prabha Rajagopal
Sri Devi Ravana
Maizatul Akmar Ismail

Abstract

Inconsistent judgments by various human assessors’ compromises the reliability of the relevance judgments generated for large scale test collections. An automated method that creates a similar set of relevance judgments (pseudo relevance judgments) that eliminate the human efforts and errors introduced in creating relevance judgments is investigated in this study. Traditionally, the participating systems in TREC are measured by using a chosen metrics and ranked according to its performance scores. In order to generate these scores, the documents retrieved by these systems for each topic are matched with the set of relevance judgments (often assessed by humans). In this study, the number of occurrences of each document per topic from the various runs will be used with an assumption, the higher the number of occurrences of a document, the possibility of the document being relevant is higher. The study proposesa method with a pool depth of 100 using the cutoff percentage of >35% that could provide an alternate way of generating consistent relevance judgments without the involvement of human assessors.

Downloads

Download data is not yet available.

Article Details

How to Cite
Rajagopal, P., Ravana, S. D., & Ismail, M. A. (2014). Relevance Judgments Exclusive of Human Assessors in Large Scale Information Retrieval Evaluation Experimentation. Malaysian Journal of Computer Science, 27(2), 80–94. Retrieved from https://ejournal.um.edu.my/index.php/MJCS/article/view/6795
Section
Articles