2014-06-05

No Statistics or results.

Originally Posted By: Steve Rider
Hi There.
I have probably done something stupid.. but I am having problem with Yioop. I have installed as per the instructions on a Ubuntu 12.04 LTS, with the yioop-v0.98-182-ged707da package.

I have set up a customised crawl pointing to a single website, it "appears" to have done the crawl as the 'Currently Processing' section show it's visited 163 sites with 3096 extracted urls. When I stop the crawl and I click on statistics it's reported it's download 163 pages etc but everything else is 0 and the when I do a search I get no results back. I have tried site:all but nothing is returned.

I have rebuilt and reinstall the software and even tried using an earlier version of 0.98 but still no success. I tell a lie, it did work once and I got the results and statics but when I tried again on another site nothing, when I repointed back to the working site nothing again.. I have tried both the gui and command line and it only worked once.

Any help would be great.
Cheers
Steve.
'''Originally Posted By: Steve Rider''' Hi There.<br>I have probably done something stupid.. but I am having problem with Yioop. I have installed as per the instructions on a Ubuntu 12.04 LTS, with the yioop-v0.98-182-ged707da package.<br><br>I have set up a customised crawl pointing to a single website, it &quot;appears&quot; to have done the crawl as the 'Currently Processing' section show it's visited 163 sites with 3096 extracted urls. When I stop the crawl and I click on statistics it's reported it's download 163 pages etc but everything else is 0 and the when I do a search I get no results back. I have tried site:all but nothing is returned.<br><br>I have rebuilt and reinstall the software and even tried using an earlier version of 0.98 but still no success. I tell a lie, it did work once and I got the results and statics but when I tried again on another site nothing, when I repointed back to the working site nothing again.. I have tried both the gui and command line and it only worked once.<br><br>Any help would be great.<br>Cheers<br>Steve.

-- No Statistics or results
Can you describe what you did when you stopped your crawl, also what messages
you saw as it was stopping? Similarly, for what you did after you clicked Statistics
for a particular crawl in Manage Crawls. The most common error
is to prematurely shut down the fetcher or queue server before the final index is built.
If you didn't close the index properly that might also screw up statistics. If your
crawl was larger (at least 50 thousand pages) and it got messed up, you could use
php arc_tool.php reindex bundle_name
or
php arc_tool.php rebuild bundle_name
to fix a mis-closed index. Hope this helps.
Can you describe what you did when you stopped your crawl, also what messages<br>you saw as it was stopping? Similarly, for what you did after you clicked Statistics<br>for a particular crawl in Manage Crawls. The most common error<br>is to prematurely shut down the fetcher or queue server before the final index is built.<br>If you didn't close the index properly that might also screw up statistics. If your<br>crawl was larger (at least 50 thousand pages) and it got messed up, you could use<br>php arc_tool.php reindex bundle_name<br>or<br>php arc_tool.php rebuild bundle_name<br>to fix a mis-closed index. Hope this helps.

-- No Statistics or results
Originally Posted By: Steve Rider
Thank you for the quick reply.

I am now getting statistics and results it still hit and miss but its more now hits than misses.
To get it working I need to delete everything from the cache folder before I do another crawl and I can only have one crawl which isn't a major problem. I also think my problem was clicking stop crawl too many times, also manually refreshing the page cause it to break too.. if I just click stop crawl once and leave it, it seems to work better.

when I tried to run both of those tools, it reports it can't find the bundle name in archives, when I check this folder it is empty. Is this correct?

Thanks again for your help.
'''Originally Posted By: Steve Rider''' Thank you for the quick reply.<br><br>I am now getting statistics and results it still hit and miss but its more now hits than misses. <br>To get it working I need to delete everything from the cache folder before I do another crawl and I can only have one crawl which isn't a major problem. I also think my problem was clicking stop crawl too many times, also manually refreshing the page cause it to break too.. if I just click stop crawl once and leave it, it seems to work better. <br><br>when I tried to run both of those tools, it reports it can't find the bundle name in archives, when I check this folder it is empty. Is this correct? <br><br>Thanks again for your help.
2014-06-09

-- No Statistics or results
Can you give exactly what you typed?
Can you give exactly what you typed?
2014-06-11

-- No Statistics or results
Originally Posted By: Steve Rider
Sorry about the delay.

I type "php arc_tool.php reindex IndexData1402386713" and I get an error saying "IndexData1402386713 ... is not an IndexArchiveBundle so cannot be re-index."

So I thought ok, I just type "php arc_tool.php reindex bundle_name" and I get the error /var/www/yioop_data//archives/bundle_name ... is not an IndexArchiveBundle so cannot be re-index"

The IndexData1402386713 is a folder in the /yioop_data/cache folder.. I check the contents of the archive folder as that was in the other error and it is empty.

Thanks again for your help and support
'''Originally Posted By: Steve Rider''' Sorry about the delay.<br><br>I type &quot;php arc_tool.php reindex IndexData1402386713&quot; and I get an error saying &quot;IndexData1402386713 ... is not an IndexArchiveBundle so cannot be re-index.&quot;<br><br>So I thought ok, I just type &quot;php arc_tool.php reindex bundle_name&quot; and I get the error /var/www/yioop_data//archives/bundle_name ... is not an IndexArchiveBundle so cannot be re-index&quot;<br><br>The IndexData1402386713 is a folder in the /yioop_data/cache folder.. I check the contents of the archive folder as that was in the other error and it is empty.<br><br>Thanks again for your help and support
2014-06-15

-- No Statistics or results
I agree this is a little confusing. To do a reindex, Yioop needs to have flushed at least once
the contents of the crawl index so far to disk. Flushing to disk happens when you
(1) correctly close an index
(2) Every 50 thousand or so urls (sometimes less depending on the memory)
Even on one machine you can index 50 thousand pages in a few hours, so if you have an index smaller than
that, it is probably easier to just redo it. In any case, because your index bundle hadn't reach 50 thousand
pages in size, only the directory structure was present, no data (as it was in RAM). That's why you couldn't
reindex it.

Hope this helps,
Chris
I agree this is a little confusing. To do a reindex, Yioop needs to have flushed at least once <br>the contents of the crawl index so far to disk. Flushing to disk happens when you<br>(1) correctly close an index<br>(2) Every 50 thousand or so urls (sometimes less depending on the memory)<br>Even on one machine you can index 50 thousand pages in a few hours, so if you have an index smaller than<br>that, it is probably easier to just redo it. In any case, because your index bundle hadn't reach 50 thousand<br>pages in size, only the directory structure was present, no data (as it was in RAM). That's why you couldn't<br>reindex it.<br><br>Hope this helps,<br>Chris

-- No Statistics or results
Originally Posted By: Steve Rider
Hi Chris.. Thank you again for your reply and helping me with this issue.

Cheers
Steve
'''Originally Posted By: Steve Rider''' Hi Chris.. Thank you again for your reply and helping me with this issue.<br><br>Cheers<br>Steve
X