-- Clarification re: Queue Servers
Queue Servers manage both urls and portions of indexes. The name server keeps track of what queue servers are present for a given crawl. To keep a crawl consistent (i.e., working), you should not change the number of queue servers in it after the crawl starts. You can change the number of fetchers though. Fetchers know which queue servers exist by talking to the name server. When fetchers discover new urls they compute the host for the url and hash that and send it to a queue server based on that hash, with different queue servers being responsible for different hash ranges. Since urls to download for a given host are always being handled by the same queue server, robots.txt files will be obeyed including things like crawl-delay, etc.
Queue Servers manage both urls and portions of indexes. The name server keeps track of what queue servers are present for a given crawl. To keep a crawl consistent (i.e., working), you should not change the number of queue servers in it after the crawl starts. You can change the number of fetchers though. Fetchers know which queue servers exist by talking to the name server. When fetchers discover new urls they compute the host for the url and hash that and send it to a queue server based on that hash, with different queue servers being responsible for different hash ranges. Since urls to download for a given host are always being handled by the same queue server, robots.txt files will be obeyed including things like crawl-delay, etc.