2012-12-13

Practice Final Solution for Question 2.

Originally Posted By: prasad
Group Name:

Prasad Deshpande
Mangesh Dahale
Kiruthika Sivaraman

Q.2
Pseudo Relevance Feedback:
1. In PRF, IR system simply assumes that all of the top k documents retrieved by the initial query are relevant
2. Then the terms are selected from these documents, added to the original query and the expanded query is used to generate the final retrieval result.

For Example, If the initial query is the top k results will have all the documents that speaks about enforcement of laws regarding dogs and also documents which speaks about use of dogs in law enforcement. Adding the expansion term "pooper" makes the meaning clearer as law often requires owners to clean up after their dogs poop.

BM25F:

1.The idea behind BM25F is to split document in our collection into different collection e.g. title, body
2.Then we calculate BM25 score for just the title words similarly, BM25 score for just body text. Then take the weighted sum of those scores.
3.This idea is implemented by computing an adjusted term frequency that depends upon the field in which a term occurs. This adjusted term frequeny ft,d,s is

f'(t,d,s) = f(t,d,s)/(1-bs)+bs(ld,s / ls)
where
ld,s = length of field s in document d
ls = average length of field s across all document
bs = Field specific parameter

Below is formula for calculating score BMF25 where f(t,d) = f(t,d,s)
'''Originally Posted By: prasad''' Group Name:<br><br>Prasad Deshpande<br>Mangesh Dahale<br>Kiruthika Sivaraman<br><br>Q.2<br>Pseudo Relevance Feedback:<br>1. In PRF, IR system simply assumes that all of the top k documents retrieved by the initial query are relevant<br>2. Then the terms are selected from these documents, added to the original query and the expanded query is used to generate the final retrieval result.<br><br>For Example, If the initial query is the top k results will have all the documents that speaks about enforcement of laws regarding dogs and also documents which speaks about use of dogs in law enforcement. Adding the expansion term &quot;pooper&quot; makes the meaning clearer as law often requires owners to clean up after their dogs poop.<br><br>BM25F:<br><br>1.The idea behind BM25F is to split document in our collection into different collection e.g. title, body<br>2.Then we calculate BM25 score for just the title words similarly, BM25 score for just body text. Then take the weighted sum of those scores.<br>3.This idea is implemented by computing an adjusted term frequency that depends upon the field in which a term occurs. This adjusted term frequeny ft,d,s is<br><br> f'(t,d,s) = f(t,d,s)/(1-bs)+bs(ld,s / ls)<br>where<br> ld,s = length of field s in document d<br> ls = average length of field s across all document<br> bs = Field specific parameter<br> <br> Below is formula for calculating score BMF25 where f(t,d) = f(t,d,s)
X