2012-12-10

Practice Final Solution for Question 5.

Originally Posted By: Akash_Patel
Team Members: Govind Kalyankar, Akash patel, seungbeom ma

A) Document Partitioning :
In a document-partitioned index, each index server is responsible for a subset of the documents.

Query Processing in Document Partitioning:
There is a main server (the book calls a receptionist) that when presented a query forwards it to each partition server. These n servers then compute the top k query results on their partition and forward the answer back to the reception who merges the results, selecting the top m.

B) Term Partitioning:
Term partitioning addresses this problem by splitting the collection into sets of terms and assigning nodes to each of these sets.
Document partitioning works best when the index data on the individual nodes can be stored in main memory or on SSD.
Adding more machines doesn't affect the minimum of what each machine must do, so this is a bottleneck for the document partitioning approach.
Query Processing in term partitioning

Suppose a query q contains terms t1,...tq. Then the receptionist will forward the query to node v(t1) responsible for the term t1.
After creating a set of document score accumulators from t1's posting list, v(t1) forwards the query, along with the accumulator set to the node v(t2) which continues the process and so on.
Finally, v(tq), sends the final accumulator set to the receptionist where the top m results are selected.
'''Originally Posted By: Akash_Patel''' Team Members: Govind Kalyankar, Akash patel, seungbeom ma<br><br>A) Document Partitioning :<br>In a document-partitioned index, each index server is responsible for a subset of the documents.<br><br>Query Processing in Document Partitioning:<br>There is a main server (the book calls a receptionist) that when presented a query forwards it to each partition server. These n servers then compute the top k query results on their partition and forward the answer back to the reception who merges the results, selecting the top m. <br><br>B) Term Partitioning:<br>Term partitioning addresses this problem by splitting the collection into sets of terms and assigning nodes to each of these sets.<br>Document partitioning works best when the index data on the individual nodes can be stored in main memory or on SSD.<br>Adding more machines doesn't affect the minimum of what each machine must do, so this is a bottleneck for the document partitioning approach.<br>Query Processing in term partitioning<br><br>Suppose a query q contains terms t1,...tq. Then the receptionist will forward the query to node v(t1) responsible for the term t1.<br>After creating a set of document score accumulators from t1's posting list, v(t1) forwards the query, along with the accumulator set to the node v(t2) which continues the process and so on.<br>Finally, v(tq), sends the final accumulator set to the receptionist where the top m results are selected.
X