Yioop - PHP Search Engine

Practice Midterm #2 Answers.

Originally Posted By: jnewth

Problem 4: With regard to hash-table chaining, what are the move-to-front and insert-at-back heuristics?

This is described in the book on page 121 and in the notes of October 17, 2011.

We are building a dictionary that maps terms to posting lists. These are heuristics for improving our hash chain lookup. If we use a hash table, there will likely be the case that multiple terms hash to the same location. To resolve this, we assemble the posting lists in chains. Then, to find a term's posting list, you have to first hash the term to find the right chain, then index along the chain until you find the posting list of the term you are looking for. To speed this up, we want to make more frequently searched terms near the start of the chain.

Two methods of doing this are Insert-at-back and Move-to-front.

Insert-at-back suggests that, when building the dictionary, if a term is seen early during dictionary construction, it is likely a frequent term in the corpus and likely a term frequently searched for. In addition, if a term is not seen until later on during construction, it probably isn't a frequently searched for term. We want frequently searched for terms to be close to the start and infrequently searched for terms to be in the back, so we always insert-at-back.

Move-to-front suggests that, when building a dictionary, if a term is searched for (for example, we have a new posting for a given term and want to add it to that terms posting list) we will hash the term, find the chain, then search along the chain till we find a posting list for the term, and then add that posting to that term's posting list. Then, we move that posting list we just found to the front of the hash. You can imagine if this runs for a while, those terms that are frequently searched for will constantly being moved towards the front, and infrequently searched for terms will gradually drift to the back.

'''Originally Posted By: jnewth''' Problem 4: With regard to hash-table chaining, what are the move-to-front and insert-at-back heuristics? This is described in the book on page 121 and in the notes of October 17, 2011. We are building a dictionary that maps terms to posting lists. These are heuristics for improving our hash chain lookup. If we use a hash table, there will likely be the case that multiple terms hash to the same location. To resolve this, we assemble the posting lists in chains. Then, to find a term's posting list, you have to first hash the term to find the right chain, then index along the chain until you find the posting list of the term you are looking for. To speed this up, we want to make more frequently searched terms near the start of the chain. Two methods of doing this are Insert-at-back and Move-to-front. Insert-at-back suggests that, when building the dictionary, if a term is seen early during dictionary construction, it is likely a frequent term in the corpus and likely a term frequently searched for. In addition, if a term is not seen until later on during construction, it probably isn't a frequently searched for term. We want frequently searched for terms to be close to the start and infrequently searched for terms to be in the back, so we always insert-at-back. Move-to-front suggests that, when building a dictionary, if a term is searched for (for example, we have a new posting for a given term and want to add it to that terms posting list) we will hash the term, find the chain, then search along the chain till we find a posting list for the term, and then add that posting to that term's posting list. Then, we move that posting list we just found to the front of the hash. You can imagine if this runs for a while, those terms that are frequently searched for will constantly being moved towards the front, and infrequently searched for terms will gradually drift to the back.