2016-09-11

Crawler Halting.

Hi there...
First off, very excited to be testing Yioop software for a possible project I've been wanting to revisit.
Dev server I'm testing with is Xeon D-1521 @ 2.4Ghz , 32G RAM and 2x480G SSD RAID1 drives. Debian 8 64 bit OS.
Whatever I load into the Crawl section starts by gathering robots.txt from URL's and then just seemingly stops processing. I thought at first I was doing something wrong between "Allowed To Crawl Sites" and "Seed Sites" but pretty sure I understand those definitions. I removed my custom entries and went back to the "Yioop Defaults" and figured I'd use that for a test crawl for a bit to see things functional. Unfortunately that doesn't work neither for me.
Is this a configuration issue or something with PHP?
Thanks for any assistance or thoughts ... appreciate it.
Paul
Hi there... First off, very excited to be testing Yioop software for a possible project I've been wanting to revisit. Dev server I'm testing with is Xeon D-1521 @ 2.4Ghz , 32G RAM and 2x480G SSD RAID1 drives. Debian 8 64 bit OS. Whatever I load into the Crawl section starts by gathering robots.txt from URL's and then just seemingly stops processing. I thought at first I was doing something wrong between "Allowed To Crawl Sites" and "Seed Sites" but pretty sure I understand those definitions. I removed my custom entries and went back to the "Yioop Defaults" and figured I'd use that for a test crawl for a bit to see things functional. Unfortunately that doesn't work neither for me. Is this a configuration issue or something with PHP? Thanks for any assistance or thoughts ... appreciate it. Paul

-- Crawler Halting
Hi Paul,
Thanks for your interest in Yioop. Just on the off chance that this is an issue with the code that has already been solved. Can you try to get the most recent source off of
 https://www.seekquarry.com/viewgit/?a=summary&p=yioop
and try the same test. I am hoping to get out the next stable version of Yioop by the end of the month, and I did do a number of fixes to issues from the 3.4.0 version.
Best,
Chris
(Edited: 2016-09-11)
Hi Paul, Thanks for your interest in Yioop. Just on the off chance that this is an issue with the code that has already been solved. Can you try to get the most recent source off of [[https://www.seekquarry.com/viewgit/?a=summary&p=yioop|https://www.seekquarry.com/viewgit/?a=summary&p=yioop]] and try the same test. I am hoping to get out the next stable version of Yioop by the end of the month, and I did do a number of fixes to issues from the 3.4.0 version. Best, Chris

-- Crawler Halting
Thanks very much .... I downloaded the latest GIT version and seem to be making progress with a test crawl - at least the scheduler is hitting 100% CPU on one of the cores while it sorts out what looks like to be a bunch of sitemap.xml it discovered so far. Will continue testing and let you know if I run into anything else...many thanks!
Thanks very much .... I downloaded the latest GIT version and seem to be making progress with a test crawl - at least the scheduler is hitting 100% CPU on one of the cores while it sorts out what looks like to be a bunch of sitemap.xml it discovered so far. Will continue testing and let you know if I run into anything else...many thanks!

-- Crawler Halting
Ok so the crawl seemed to work - I aborted it and decided to restrict it to some URL's I'm interested in. When I update the "options" for the Crawl and only enter a few domains (limiting the crawl to one domain and seeding it with one domain) then I run into another issue that I don't believe I encountered previously. The problem is that when I save these changes I"m logged out and have to log back in - tried it several times.
If you have any ideas I appreciate it - thanks! Paul
(Edited: 2016-09-11)
Ok so the crawl seemed to work - I aborted it and decided to restrict it to some URL's I'm interested in. When I update the "options" for the Crawl and only enter a few domains (limiting the crawl to one domain and seeding it with one domain) then I run into another issue that I don't believe I encountered previously. The problem is that when I save these changes I"m logged out and have to log back in - tried it several times. If you have any ideas I appreciate it - thanks! Paul

-- Crawler Halting
Hmm. Okay, I'll try to recreate that problem. If I can, I'll try to fix the problem in the next day or so.
Best,
Chris
Hmm. Okay, I'll try to recreate that problem. If I can, I'll try to fix the problem in the next day or so. Best, Chris
2016-09-14

-- Crawler Halting
Thanks again ... just curious if you were able to replicate the issue I seen? :)
Thanks again ... just curious if you were able to replicate the issue I seen? :)

-- Crawler Halting
Hey Paul,
I am having trouble re-creating this issue. I tested going to Manage Crawls, clicking options, editing the seed info, and clicking save. It doesn't log me out though, it seems to behave normal. Is this what you were doing. I might need more info to re-create this. What OS and apache config are you using?
Best,
Chris
Hey Paul, I am having trouble re-creating this issue. I tested going to Manage Crawls, clicking options, editing the seed info, and clicking save. It doesn't log me out though, it seems to behave normal. Is this what you were doing. I might need more info to re-create this. What OS and apache config are you using? Best, Chris

-- Crawler Halting
Thanks Chris ...
Ok... let me dig into this further on my side then to make sure it's not something else. It was setup on Debian 8 64 bit and a pretty standard Apache configuration. Unfortunately it's in my dev environment and I removed it for something else in meantime ... I'll rebuild it and test again ... also, on an off chance it's browser related or something I'll try from another browser (I typically am using Safari on Mac).
Thanks Chris ... Ok... let me dig into this further on my side then to make sure it's not something else. It was setup on Debian 8 64 bit and a pretty standard Apache configuration. Unfortunately it's in my dev environment and I removed it for something else in meantime ... I'll rebuild it and test again ... also, on an off chance it's browser related or something I'll try from another browser (I typically am using Safari on Mac).
2016-09-15

-- Crawler Halting
Ok so I built a new environment using Ubuntu 16.04LTS ... realized after installing that I should have probably stuck with Debian 8.x because latest Ubuntu introduces PHP7 and other stuff.
But, regardless, brand new install and same issue when modifying the crawl. Tested with Safari and also Firefox under Mac and IE under Windows - in case it's a browser issue.
I didn't have the problem using the previous build - the non git version on main website for download. Although with that version of course i ran into that weird bug where it wouldn't crawl at all.
Any thoughts?
Thanks, Paul
Ok so I built a new environment using Ubuntu 16.04LTS ... realized after installing that I should have probably stuck with Debian 8.x because latest Ubuntu introduces PHP7 and other stuff. But, regardless, brand new install and same issue when modifying the crawl. Tested with Safari and also Firefox under Mac and IE under Windows - in case it's a browser issue. I didn't have the problem using the previous build - the non git version on main website for download. Although with that version of course i ran into that weird bug where it wouldn't crawl at all. Any thoughts? Thanks, Paul

-- Crawler Halting
Hi Paul,
I have a couple of machines running Ubuntu 16.04LTS and I just tried to recreate it without success. I don't suppose you could make a short mp4 screen capture video of what you are doing and drag and drop it into the discussion so I can see what's going on?
Best, Chris
Hi Paul, I have a couple of machines running Ubuntu 16.04LTS and I just tried to recreate it without success. I don't suppose you could make a short mp4 screen capture video of what you are doing and drag and drop it into the discussion so I can see what's going on? Best, Chris
[ Next ]
X