PHP Search Engine

2012-11-05

MidTerm 2 Review Question 4 .

Originally Posted By: Govind

Team Members: Govind Kalyankar, Akash patel, seungbeom ma
Merge Based indexing algorithm:

buildIndex mergeBased (inputTokenizer, memoryLimit) ≡ //Merge-based indexing algorithm, creating a set of independent sub-indices
// The final index is generated by combining the sub-indices via a multi-way merge operation.
1 n ← 0 // initialize the number of index partitions
2 position ← 0
3 memoryConsumption ← 0
4 while inputTokenizer.hasNext() do
5 T ← inputTokenizer.getNext()
6 obtain dictionary entry for T; create new entry if necessary
7 append new posting position to T’s postings list
8 position ← position + 1
9 memoryConsumption ← memoryConsumption + 1
10 if memoryConsumption ≥ memoryLimit then
11 createIndexPartition()
12 if memoryConsumption > 0 then
13 createIndexPartition()
14 merge index partitions I0 . . . In−1, resulting in the final on-disk index Ifinal
15 return
createIndexPartition () ≡
16 create empty on-disk inverted file In
17 sort in-memory dictionary entries in lexicographical order
18 for each term T in the dictionary do
19 add T’s postings list to In
20 delete all in-memory postings lists
21 reset the in-memory dictionary
22 memoryConsumption ← 0
23 n ← n + 1
24 return

mergeIndexPartitions (hI0, . . . , In−1i) ≡ //Merging a set of n index partitions I0 . . . In−1 into an index Ifinal. This is the final step
//in merge-based index construction.
1 create empty inverted file Ifinal
2 for k ← 0 to n − 1 do
3 open index partition Ik for sequential processing
4 currentIndex ← 0
5 while currentIndex 6= nil do
6 currentIndex ← nil
7 for k ← 0 to n − 1 do
8 if Ik still has terms left then
9 if (currentIndex = nil) ∨ (Ik.currentTerm Govind — Mon Nov 05, 2012 10:01 pm <hr>

'''Originally Posted By: Govind''' Team Members: Govind Kalyankar, Akash patel, seungbeom ma Merge Based indexing algorithm: buildIndex mergeBased (inputTokenizer, memoryLimit) &equiv; //Merge-based indexing algorithm, creating a set of independent sub-indices // The final index is generated by combining the sub-indices via a multi-way merge operation. 1 n ← 0 // initialize the number of index partitions 2 position ← 0 3 memoryConsumption ← 0 4 while inputTokenizer.hasNext() do 5 T ← inputTokenizer.getNext() 6 obtain dictionary entry for T; create new entry if necessary 7 append new posting position to T’s postings list 8 position ← position + 1 9 memoryConsumption ← memoryConsumption + 1 10 if memoryConsumption ≥ memoryLimit then 11 createIndexPartition() 12 if memoryConsumption > 0 then 13 createIndexPartition() 14 merge index partitions I0 . . . In−1, resulting in the final on-disk index Ifinal 15 return createIndexPartition () &equiv; 16 create empty on-disk inverted file In 17 sort in-memory dictionary entries in lexicographical order 18 for each term T in the dictionary do 19 add T’s postings list to In 20 delete all in-memory postings lists 21 reset the in-memory dictionary 22 memoryConsumption ← 0 23 n ← n + 1 24 return mergeIndexPartitions (hI0, . . . , In−1i) &equiv; //Merging a set of n index partitions I0 . . . In−1 into an index Ifinal. This is the final step //in merge-based index construction. 1 create empty inverted file Ifinal 2 for k ← 0 to n − 1 do 3 open index partition Ik for sequential processing 4 currentIndex ← 0 5 while currentIndex 6= nil do 6 currentIndex ← nil 7 for k ← 0 to n − 1 do 8 if Ik still has terms left then 9 if (currentIndex = nil) &or; (Ik.currentTerm Govind — Mon Nov 05, 2012 10:01 pm <hr>