2012-11-05

MidTerm 2 Review Question 4 .

Originally Posted By: Govind
Team Members: Govind Kalyankar, Akash patel, seungbeom ma
Merge Based indexing algorithm:

buildIndex mergeBased (inputTokenizer, memoryLimit) ≡ //Merge-based indexing algorithm, creating a set of independent sub-indices
// The final index is generated by combining the sub-indices via a multi-way merge operation.
1 n ← 0 // initialize the number of index partitions
2 position ← 0
3 memoryConsumption ← 0
4 while inputTokenizer.hasNext() do
5 T ← inputTokenizer.getNext()
6 obtain dictionary entry for T; create new entry if necessary
7 append new posting position to T’s postings list
8 position ← position + 1
9 memoryConsumption ← memoryConsumption + 1
10 if memoryConsumption ≥ memoryLimit then
11 createIndexPartition()
12 if memoryConsumption > 0 then
13 createIndexPartition()
14 merge index partitions I0 . . . In−1, resulting in the final on-disk index Ifinal
15 return
createIndexPartition () ≡
16 create empty on-disk inverted file In
17 sort in-memory dictionary entries in lexicographical order
18 for each term T in the dictionary do
19 add T’s postings list to In
20 delete all in-memory postings lists
21 reset the in-memory dictionary
22 memoryConsumption ← 0
23 n ← n + 1
24 return


mergeIndexPartitions (hI0, . . . , In−1i) ≡ //Merging a set of n index partitions I0 . . . In−1 into an index Ifinal. This is the final step
//in merge-based index construction.
1 create empty inverted file Ifinal
2 for k ← 0 to n − 1 do
3 open index partition Ik for sequential processing
4 currentIndex ← 0
5 while currentIndex 6= nil do
6 currentIndex ← nil
7 for k ← 0 to n − 1 do
8 if Ik still has terms left then
9 if (currentIndex = nil) ∨ (Ik.currentTerm Govind — Mon Nov 05, 2012 10:01 pm <hr>
'''Originally Posted By: Govind''' Team Members: Govind Kalyankar, Akash patel, seungbeom ma<br>Merge Based indexing algorithm:<br><br>buildIndex mergeBased (inputTokenizer, memoryLimit) &equiv; //Merge-based indexing algorithm, creating a set of independent sub-indices <br> // The final index is generated by combining the sub-indices via a multi-way merge operation.<br>1 n &larr; 0 // initialize the number of index partitions<br>2 position &larr; 0<br>3 memoryConsumption &larr; 0<br>4 while inputTokenizer.hasNext() do<br>5 T &larr; inputTokenizer.getNext()<br>6 obtain dictionary entry for T; create new entry if necessary<br>7 append new posting position to T&rsquo;s postings list<br>8 position &larr; position + 1<br>9 memoryConsumption &larr; memoryConsumption + 1<br>10 if memoryConsumption &ge; memoryLimit then<br>11 createIndexPartition()<br>12 if memoryConsumption &gt; 0 then<br>13 createIndexPartition()<br>14 merge index partitions I0 . . . In&minus;1, resulting in the final on-disk index Ifinal<br>15 return<br>createIndexPartition () &equiv;<br>16 create empty on-disk inverted file In<br>17 sort in-memory dictionary entries in lexicographical order<br>18 for each term T in the dictionary do<br>19 add T&rsquo;s postings list to In<br>20 delete all in-memory postings lists<br>21 reset the in-memory dictionary<br>22 memoryConsumption &larr; 0<br>23 n &larr; n + 1<br>24 return<br><br><br>mergeIndexPartitions (hI0, . . . , In&minus;1i) &equiv; //Merging a set of n index partitions I0 . . . In&minus;1 into an index Ifinal. This is the final step<br> //in merge-based index construction.<br>1 create empty inverted file Ifinal<br>2 for k &larr; 0 to n &minus; 1 do<br>3 open index partition Ik for sequential processing<br>4 currentIndex &larr; 0<br>5 while currentIndex 6= nil do<br>6 currentIndex &larr; nil<br>7 for k &larr; 0 to n &minus; 1 do<br>8 if Ik still has terms left then<br>9 if (currentIndex = nil) &or; (Ik.currentTerm Govind &mdash; Mon Nov 05, 2012 10:01 pm <hr>
X