-- Practice Final Answers
Originally Posted By: dhruvjalota
Question 9: Explain one method to estimate P1 and one method to estimate P2 in the divergence-from-randomness approach to coming up with a relevance measure.
Answer:
The basic starting formula for DFR is the following: (1-P2).(-logP1)
P1 represents the probability that a random document d contains exactly f_t,d occurrences of t.
The second factor, P2 is related to eliteness and compensates for this rapid change. A document is said to be elite for the term t when it is "about" the topic associated with the term.
To determine the relevance of a document to a query with this model we calculate: Sum[t_belongs_q](q_t)(1-P_2,t).(-logP_1,t) where P_1,t and P_2,t are the P1 and P2 associated with the particular term t.
To estimate each, we use Binomial co-efficient: [N+lt_lt-1] = [N+ lt - 1]!/[N-1]!(lt)! (*)
To compute P1 suppose d is found to contain f_t,d occurrences of t.
Using the binomial coefficient, we get:
[(N-1)+ (lt - f_t,d) - 1)_lt-f_t,d] = [(N-1)+(lt-f_t,d)-1]!/(N-2)!(lt-f_t,d)! (**)
We can estimate P1 as the ratio (**)/(*)
Thus, estimates for P1 and -logP1:
P1=(1/[1+(lt/N)])([lt/N]/[1+(lt/N)])_f_t,d
and
-logP1=log(1+[lt/N])+f_t,d.log(1+[N/lt])
For P2, using law of succession:
estimate P2 = {(f_t,d)/[(f_t,d)+1]}
Using this, estimate (1-P2).(-logP1) =
{log(1+[lt/N])/(f_t,d+1)}+(f_t,d)log(1+[N/lt])
'''Originally Posted By: dhruvjalota'''
Question 9: Explain one method to estimate P1 and one method to estimate P2 in the divergence-from-randomness approach to coming up with a relevance measure.<br><br>Answer:<br>The basic starting formula for DFR is the following: (1-P2).(-logP1)<br>P1 represents the probability that a random document d contains exactly f_t,d occurrences of t.<br>The second factor, P2 is related to eliteness and compensates for this rapid change. A document is said to be elite for the term t when it is "about" the topic associated with the term.<br>To determine the relevance of a document to a query with this model we calculate: Sum[t_belongs_q](q_t)(1-P_2,t).(-logP_1,t) where P_1,t and P_2,t are the P1 and P2 associated with the particular term t.<br>To estimate each, we use Binomial co-efficient: [N+lt_lt-1] = [N+ lt - 1]!/[N-1]!(lt)! (*)<br><br>To compute P1 suppose d is found to contain f_t,d occurrences of t.<br>Using the binomial coefficient, we get:<br>[(N-1)+ (lt - f_t,d) - 1)_lt-f_t,d] = [(N-1)+(lt-f_t,d)-1]!/(N-2)!(lt-f_t,d)! (**)<br>We can estimate P1 as the ratio (**)/(*)<br><br>Thus, estimates for P1 and -logP1:<br>P1=(1/[1+(lt/N)])([lt/N]/[1+(lt/N)])_f_t,d<br>and <br>-logP1=log(1+[lt/N])+f_t,d.log(1+[N/lt])<br><br>For P2, using law of succession:<br>estimate P2 = {(f_t,d)/[(f_t,d)+1]}<br><br>Using this, estimate (1-P2).(-logP1) = <br>{log(1+[lt/N])/(f_t,d+1)}+(f_t,d)log(1+[N/lt])