2013-06-06

indexing problem.

Originally Posted By: s1c
Hi

I've followed instruction from this website and installed yioop on my debian machine.
I want to create crawl form my mediawiki website which is on my second machine (both in local network)
So I left "dissalowed sites/sites with quotas" empty
I added http://my_wiki/ into the seed sites than "save option" and "back"
I typed the name of my crawl: wiki_crawl and "start new crawl"
After 20 mins I had:
Code: Timestamp: 1370530279

Time started: Thu, 06 Jun 2013 07:51:19 -0700

Indexer Peak Memory: 9143248

Scheduler Peak Memory: 489335216

Fetcher Peak Memory: 9607232

Web App Peak Memory: 8253296

Visited Urls/Hour: 0.00

Visited Urls Count: 1

Total Urls Seen: 1

Most Recent Fetcher: 0-localhost-1370441701 @ Thu, 06 Jun 2013 07:51:48 -0700

Most Recent Urls

http://my_wiki/robots.txt


my robots.txt looks like this:
Code: User-Agent: *
Disallow:

Where could be a problem?
'''Originally Posted By: s1c''' Hi<br><br>I've followed instruction from this website and installed yioop on my debian machine.<br>I want to create crawl form my mediawiki website which is on my second machine (both in local network)<br>So I left &quot;dissalowed sites/sites with quotas&quot; empty<br>I added http://my_wiki/ into the seed sites than &quot;save option&quot; and &quot;back&quot;<br>I typed the name of my crawl: wiki_crawl and &quot;start new crawl&quot;<br>After 20 mins I had:<br>Code: Timestamp: 1370530279<br><br>Time started: Thu, 06 Jun 2013 07:51:19 -0700<br><br>Indexer Peak Memory: 9143248<br><br>Scheduler Peak Memory: 489335216<br><br>Fetcher Peak Memory: 9607232<br><br>Web App Peak Memory: 8253296<br><br>Visited Urls/Hour: 0.00<br><br>Visited Urls Count: 1<br><br>Total Urls Seen: 1<br><br>Most Recent Fetcher: 0-localhost-1370441701 @ Thu, 06 Jun 2013 07:51:48 -0700<br><br>Most Recent Urls<br><br>http://my_wiki/robots.txt<br><br><br>my robots.txt looks like this:<br>Code: User-Agent: *<br>Disallow: <br><br>Where could be a problem?

-- indexing problem
I need to know some background to help you...

(1) Which version of Yioop are you using? 0.941?
(2) Which instructions did you follow? Under the ubuntu instructions
http://www.seekquarry.com/?c=main&p=install#ubuntu
there are some remarks on what is needed to get Yioop to work with
Debian. Did you follow those?
(3) Are you directly crawling your mediawiki site or are you trying
to index an mediawiki xml dump of the site?

Best,
Chris
I need to know some background to help you...<br><br>(1) Which version of Yioop are you using? 0.941?<br>(2) Which instructions did you follow? Under the ubuntu instructions<br>http://www.seekquarry.com/?c=main&p=install#ubuntu<br>there are some remarks on what is needed to get Yioop to work with<br>Debian. Did you follow those?<br>(3) Are you directly crawling your mediawiki site or are you trying<br>to index an mediawiki xml dump of the site?<br><br>Best,<br>Chris

-- indexing problem
Originally Posted By: s1c
cpollett wrote:
I need to know some background to help you...

(1) Which version of Yioop are you using? 0.941?
(2) Which instructions did you follow? Under the ubuntu instructions
http://www.seekquarry.com/?c=main&p=install#ubuntu
there are some remarks on what is needed to get Yioop to work with
Debian. Did you follow those?
(3) Are you directly crawling your mediawiki site or are you trying
to index an mediawiki xml dump of the site?

Best,
Chris


1) Yioop 0.941
2) I proceed exactly according to instructions from the website you gave
For debian squeeze installation process was successful. For debian wheezy (latest release) I couldn't configure Yioop. Page hangs when trying to add the machine and I have to restart apache. After restart when I click on "Menage machines" it hangs again.
So I'm using debian squeez (which is little bit older release).
3) I'm crawling my mediawiki site.
'''Originally Posted By: s1c''' cpollett wrote:<br>I need to know some background to help you...<br><br>(1) Which version of Yioop are you using? 0.941?<br>(2) Which instructions did you follow? Under the ubuntu instructions<br>http://www.seekquarry.com/?c=main&p=install#ubuntu<br>there are some remarks on what is needed to get Yioop to work with<br>Debian. Did you follow those?<br>(3) Are you directly crawling your mediawiki site or are you trying<br>to index an mediawiki xml dump of the site?<br><br>Best,<br>Chris<br><br><br>1) Yioop 0.941<br>2) I proceed exactly according to instructions from the website you gave<br>For debian squeeze installation process was successful. For debian wheezy (latest release) I couldn't configure Yioop. Page hangs when trying to add the machine and I have to restart apache. After restart when I click on &quot;Menage machines&quot; it hangs again.<br>So I'm using debian squeez (which is little bit older release).<br>3) I'm crawling my mediawiki site.
2013-06-11

-- indexing problem
Originally Posted By: s1c
any ideas what could be wrong?
'''Originally Posted By: s1c''' any ideas what could be wrong?
2013-06-13

-- indexing problem
So it is working on the slightly older version of debian, just not the latest? I can add this to my debugging queue
before doing my next release.

Best,
Chris
So it is working on the slightly older version of debian, just not the latest? I can add this to my debugging queue<br>before doing my next release.<br><br>Best,<br>Chris
2013-12-07

-- indexing problem
Originally Posted By: andreab82
Same problem here :-/

I have just installed the latest version (0.98) on Debian wheezy following http://www.seekquarry.com/?c=main&p=install#ubuntu
I haven't done point 3 (Suhosin hardening patch) since yioop version is greated than 0.941.

As soon as I add a machine it simply hangs...
'''Originally Posted By: andreab82''' Same problem here :-/<br><br>I have just installed the latest version (0.98) on Debian wheezy following http://www.seekquarry.com/?c=main&p=install#ubuntu<br>I haven't done point 3 (Suhosin hardening patch) since yioop version is greated than 0.941.<br><br>As soon as I add a machine it simply hangs...

-- indexing problem
Okay. I will try to set up a version of Debian wheezy some time over the weekend. To understand it is dying when you use Manage Machines, Add Machine? Does it still crawl from the command line?

Best,
Chris
Okay. I will try to set up a version of Debian wheezy some time over the weekend. To understand it is dying when you use Manage Machines, Add Machine? Does it still crawl from the command line?<br><br>Best,<br>Chris

-- indexing problem
Originally Posted By: andreab82
Hey, thanks for the quick answer.

Sorry, I'm pretty new to Yioop. How do you crawl from the command line?
If you tell me the command I can run it (either tomorrow or Monday as I'm leaving for the week-end).

Anyway, here is a simple script I wrote to install Yioop v0.98 on a clean Debian/Ubuntu 12.04 VM.

It might be useful to someone.

-----------------------------------------------------------------------------------------------------------------------------------

#!/bin/bash

apt-get -y install curl apache2 php5 php5-cli php5-sqlite php5-curl php5-gd

sed -i 's/post_max_size = 8MB/post_max_size = 32MB/g' /etc/php5/apache2/php.ini
sed -i 's/post_max_size = 8MB/post_max_size = 32MB/g' /etc/php5/cli/php.ini

#not strictly necessary
sed -i 's/DocumentRoot \/var\/www/DocumentRoot \/var\/www\/yioop/g' /etc/apache2/sites-available/default
sed -i 's/Directory \/var\/www\//Directory \/var\/www\/yioop\//g' /etc/apache2/sites-available/default

wget -O yioop.zip 'http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=bf48e34ffa1fc707b13107a4981e4ef9c6952048&hb=f5fe1a90318da6cf242484f15bb3661b02d64fca&t=zip'

unzip yioop.zip
rm yioop.zip
mv yioop-v0.98 /var/www/yioop/

mkdir /var/www/yioop_data
chmod 777 /var/www/yioop_data

chmod 777 /var/www/yioop/configs/config.php

echo "" > /var/www/yioop/configs/local_config.php

chown -R www-data:www-data /var/www/yioop*

service apache2 restart

----------------------------------------------------------------------------------------------------
'''Originally Posted By: andreab82''' Hey, thanks for the quick answer.<br><br>Sorry, I'm pretty new to Yioop. How do you crawl from the command line?<br>If you tell me the command I can run it (either tomorrow or Monday as I'm leaving for the week-end).<br><br>Anyway, here is a simple script I wrote to install Yioop v0.98 on a clean Debian/Ubuntu 12.04 VM.<br><br>It might be useful to someone.<br><br>-----------------------------------------------------------------------------------------------------------------------------------<br><br>#!/bin/bash<br><br>apt-get -y install curl apache2 php5 php5-cli php5-sqlite php5-curl php5-gd<br><br>sed -i 's/post_max_size = 8MB/post_max_size = 32MB/g' /etc/php5/apache2/php.ini<br>sed -i 's/post_max_size = 8MB/post_max_size = 32MB/g' /etc/php5/cli/php.ini<br><br>#not strictly necessary<br>sed -i 's/DocumentRoot \/var\/www/DocumentRoot \/var\/www\/yioop/g' /etc/apache2/sites-available/default<br>sed -i 's/Directory \/var\/www\//Directory \/var\/www\/yioop\//g' /etc/apache2/sites-available/default<br><br>wget -O yioop.zip 'http://www.seekquarry.com/viewgit/?a=archive&p=yioop&h=bf48e34ffa1fc707b13107a4981e4ef9c6952048&hb=f5fe1a90318da6cf242484f15bb3661b02d64fca&t=zip'<br><br>unzip yioop.zip<br>rm yioop.zip<br>mv yioop-v0.98 /var/www/yioop/<br><br>mkdir /var/www/yioop_data<br>chmod 777 /var/www/yioop_data<br><br>chmod 777 /var/www/yioop/configs/config.php<br><br>echo &quot;&quot; &gt; /var/www/yioop/configs/local_config.php<br><br>chown -R www-data:www-data /var/www/yioop*<br><br>service apache2 restart<br><br>----------------------------------------------------------------------------------------------------

-- indexing problem
Open two shell windows. In one switch to the Yioop bin directory and type:
php queue_server.php terminal
in the other switch to the Yioop bin directory and type:
php fecther.php terminal
Then go to Manage Crawls in the web interface and start a crawl.

Best,
Chris
Open two shell windows. In one switch to the Yioop bin directory and type:<br>php queue_server.php terminal<br>in the other switch to the Yioop bin directory and type:<br>php fecther.php terminal<br>Then go to Manage Crawls in the web interface and start a crawl.<br><br>Best,<br>Chris
2013-12-08

-- indexing problem
I've recreated this problem. It seems it is the same problem as listed in the thread:
viewtopic.php?f=1&t=1414
Commenting out the line listed there can serve as a temporary but not robust fix.

Best,
Chris
I've recreated this problem. It seems it is the same problem as listed in the thread:<br>viewtopic.php?f=1&t=1414<br>Commenting out the line listed there can serve as a temporary but not robust fix.<br><br>Best,<br>Chris
[ Next ]
X