2012-12-10

Practice Final Solution for Question 8 and 1.

Originally Posted By: ShaileshPadave
Team Members :
Shailesh Padave
Sweta Shah
Sanya Valsan
Harini Padmanaban

Solution for Problem 8)
Traffic Rank is a document ranking technique based on the web traffic that navigate through that page.
Pij is the proportion of all web traffic on link from page i to page j.
Pij =0 when no link from i to j.
The traffic rank of page j -> ∑(over i)Pij.
Three constraints are as follows :
1. Pij >= 0
2. ∑(over ij) Pij = 1
3. ∑(over i) Pij - ∑(over i)Pji = 0 means the flow into page equal the flow out each page.
Problem is maximized using entropy function -∑(over ij)for Pij .log Pij subject to all three constraints stated above.

Solution for Problem 1)
For REBUILD,
The time for rebuild is given by,
c.d(new) = c.( d(old) – d(delete) + d(insert))
where:
d(new) = new documents
d(old) = old documents
d(delete) = deleted documents
d(insert) = inserted documents
Let
d(old)=100 d(delete)= 80 d(insert)=2
Using equation for rebuild,
C. d(new) = c.(100-80+2)
= 22c
For REMERGE,
The time for remerge is given by,
c.d(new) = c.d(insert) + c/4. (d(old)+d(insert))
= c.2 + c/4.(100+2)
= 22.5c
Thus, we can see that rebuild is better than remerge for the above mentioned numbers.
Assumptions,
If there are more deletions then REBUILD is better than REMERGE
If does not take into considerations, REMERGE overhead to garbage collect posting that refer to the deleted documents.
'''Originally Posted By: ShaileshPadave''' Team Members :<br>Shailesh Padave<br>Sweta Shah<br>Sanya Valsan<br>Harini Padmanaban<br><br>Solution for Problem 8)<br>Traffic Rank is a document ranking technique based on the web traffic that navigate through that page.<br>Pij is the proportion of all web traffic on link from page i to page j.<br>Pij =0 when no link from i to j.<br>The traffic rank of page j -&gt; &sum;(over i)Pij.<br>Three constraints are as follows :<br>1. Pij &gt;= 0<br>2. &sum;(over ij) Pij = 1<br>3. &sum;(over i) Pij - &sum;(over i)Pji = 0 means the flow into page equal the flow out each page.<br>Problem is maximized using entropy function -&sum;(over ij)for Pij .log Pij subject to all three constraints stated above.<br>&emsp;<br>Solution for Problem 1)<br>For REBUILD,<br>The time for rebuild is given by,<br>c.d(new) = c.( d(old) &ndash; d(delete) + d(insert))<br>where:<br>d(new) = new documents<br>d(old) = old documents<br>d(delete) = deleted documents<br>d(insert) = inserted documents<br>Let <br>d(old)=100 d(delete)= 80 d(insert)=2<br>Using equation for rebuild,<br>C. d(new) = c.(100-80+2)<br> = 22c<br>For REMERGE,<br>The time for remerge is given by,<br>c.d(new) = c.d(insert) + c/4. (d(old)+d(insert))<br> = c.2 + c/4.(100+2)<br> = 22.5c <br> Thus, we can see that rebuild is better than remerge for the above mentioned numbers.<br>Assumptions, <br>If there are more deletions then REBUILD is better than REMERGE<br>If does not take into considerations, REMERGE overhead to garbage collect posting that refer to the deleted documents.
X