2018-05-22

What is the ROBOT File (Bot) which is responsible for downloading WEB pages with cURL and how to modify YIOOP ​​so that it searches the WEB continuously instead of sporadically ???.

Hello and thank you for your answer.
1 - I would now like to know the ROBOT PHP File Path in the YIOOP Script ??? Clearly, I would like to know exactly where is the ROBOT file that downloads WEB pages with cURL ???
2 - My second question is to know why YIOOP does not explore Web pages continuously (because as mentioned on here ([()]), YIOOP only sporadically explores the WEB ???) ??? So, do you think that existing search engines like GOOGLE for example work in the same way by exploring the WE sporadically or continuously ??? If you think Google works by continually exploring the web, please tell me how to edit YIOOP so that it also explores the web in the best way exactly as does GOOGLE ???
Thank you.
Hello and thank you for your answer. '''1 -''' I would now like to know the '''<u>ROBOT PHP File Path</u>''' in the YIOOP Script ??? '''''Clearly, I would like to know exactly where is the ROBOT file that downloads WEB pages with cURL ???''''' '''2 -''' My second question is to know why YIOOP does not explore Web pages continuously (because as mentioned on here ([[https://www.yioop.com/bot]]), YIOOP only sporadically explores the WEB ???) ??? So, do you think that existing search engines like GOOGLE for example work in the same way by exploring the WE sporadically or continuously ??? If you think Google works by continually exploring the web, please tell me how to edit YIOOP so that it also explores the web in the best way exactly as does GOOGLE ??? Thank you.
2018-05-30

-- What is the ROBOT File (Bot) which is responsible for downloading WEB pages with cURL and how to modify YIOOP ​​so that it searches the WEB continuously instead of sporadically ???
The main program that downloads web pages is:
src/executables/Fetcher.php
the code that actually does the downloading is in:
src/library/FetchUrl.php (getPages method and getPage method).
Yioop, as it is written, starts crawling when you start a crawl under Manage Crawls and stops crawling when you click stop or it has exhausted all the urls it could find that match your search criteria. Up until 2015, I did progressively longer and longer crawls, trying to improve the crawling code. By the end of 2015, I had a billion page crawl. I wanted to then both improve the code to make the indexing faster and also wait until SSD prices came down, so that I could hold a billion pages all on SSD, so that the search results would usably fast. Since then, I have been doing smaller scale crawls on findcan.ca to test on improvements on how the software scrapes data from individual pages. As SSD prices are not dropping as fast as I'd like, my next goal is to try for a future version of Yioop to improve index compression. I agree that a continuous crawling set up could be cool.
Best,
Chris
The main program that downloads web pages is: src/executables/Fetcher.php the code that actually does the downloading is in: src/library/FetchUrl.php (getPages method and getPage method). Yioop, as it is written, starts crawling when you start a crawl under Manage Crawls and stops crawling when you click stop or it has exhausted all the urls it could find that match your search criteria. Up until 2015, I did progressively longer and longer crawls, trying to improve the crawling code. By the end of 2015, I had a billion page crawl. I wanted to then both improve the code to make the indexing faster and also wait until SSD prices came down, so that I could hold a billion pages all on SSD, so that the search results would usably fast. Since then, I have been doing smaller scale crawls on findcan.ca to test on improvements on how the software scrapes data from individual pages. As SSD prices are not dropping as fast as I'd like, my next goal is to try for a future version of Yioop to improve index compression. I agree that a continuous crawling set up could be cool. Best, Chris
X