Old Yioop Free Code Entry

This is the entry Yioop had over at Free Code before its demise, June 18, 2014. This entry had last been updated in Dec, 2013.


Yioop! is a PHP search engine. Yioop! can be configured as either a general purpose search engine for the whole Web or it can be configured to provide search results for a set of URLs or domains. Yioop can crawl pages or can directly index archives such as ARC and WARC. It supports indexing several file formats such as HTML, Atom, PDF, DOC, PPT, RTF, RSS, XML, SVG, PNG, JPG, BMP, GIF, and sitemaps. The Yioop! crawler can be deployed on one or many machines. It supports having one or more to crawl scheduler processes, as well as multiple fetchers and mirrors. Crawling respects robots.txt including Crawl-delay. Yioop! crawls are stored in a Web archive format that is easy to move around. Crawling can be done on one machine and the results deployed elsewhere. Yioop! supports mixing of crawls. Yioop! comes with a search front end that can be localized as desired using a GUI. This GUI supports RTL languages. Management of crawls can also be done using this GUI. Yioop! can be configured in a straightforward manner to make use of file caching or memcache if available.
Tags Open Source search engine Indexing
Licenses GPLv3
Operating Systems Linux Windows Mac OS X
Implementation Apache gd SQLite curl PHP MySQL PDO
Translations English Japanese Korean Vietnamese French Arabic Persian


Release Notes: This version improves crawl stability and has been used in a page crawl of 1/3 billion pages. The indexing plugin API was improved to allow plugins to have configure screens.... (more)
     24 Jul 2013 18:21
Release Notes: This version includes a new hybrid inverted index/suffix tree indexing scheme that should make calculating search results from future crawls faster (doesn't affect old crawls).... (more)
     05 Apr 2013 03:19
Release Notes: This release adds a simple language called Page Rules for controlling how data is extracted from webpages during the summary creation phase of indexing. It also adds the... (more)
     05 Jan 2013 02:08
Release Notes: This release supports materializing as new indexes query-based combinations (crawl mixes) of old search indexes. This should make query performance of crawl mixes much... (more)
     17 Sep 2012 09:06
Release Notes: This release adds an activity to manage search media sources. For now, one can add Video and RSS sources. When configured, RSS feeds download hourly and are integrated into... (more)
     26 Jun 2012 07:57
Release Notes: This release adds image and video subsearch abilities and improves the formatting of Yioop on smart phones. Exact phrase searching is now done using only posting list information,... (more)
     03 May 2012 20:23
Release Notes: This release adds initial support for word suggestions as a user types in queries. The bigramming used to speed common two word queries now works with n word grams. N... (more)
    Version 0.84
     19 Mar 2012 23:30
Release Notes: The crawler now has its own DNS caching mechanism independent of cURL's. Yioop now has a detection mechanism for when websites are becoming congested. The user can also... (more)
     03 Feb 2012 19:31
Release Notes: This release improved scalability by allowing multiple machines to maintain portions of the "to crawl next" queue. Query processing can also be split amongst machines, with... (more)
     07 Dec 2011 19:36
Release Notes: This version supports starting, stopping, and viewing log files of the queue server and fetchers from a Web interface. One can now inject new URLs into an active crawl via... (more)
Release Notes: Character n-grams are now supported for many languages that did not have a stemmer. Language detection was improved and better UTF-8 preparation was provided for downloads.... (more)
     01 Oct 2011 20:32
Release Notes: This version adds a function API to get search results out of Yioop! It also improves the Open RSS Responses that Yioop! generates and allows them to contain images. The... (more)
     09 Sep 2011 17:53
Release Notes: This release adds support for if: conditions in crawl mixes and for general searching. It improves the crawl status UI. It improves the performance of negation in queries.... (more)
     15 Aug 2011 20:20
Release Notes: This version adds support for crawling xlsx, pptx, and epub. The HTML processor now supports the base tag. The code for the front-end (not crawler) has been changed to work... (more)
     30 Jul 2011 23:38
Release Notes: This version moves Yioop! from using a bag of words index to a positional index. Proximity scores are calculated when a multi-word query is done. The host of an inlink is... (more)
     19 May 2011 21:15
Release Notes: This version expands the capabilities of the crawl mixing. Now you can take a query and specify that the first result should come from a particular crawl such as an open... (more)
     28 Jan 2011 02:15
Release Notes: This version provides preliminary support for archive crawling of arc, media wiki, and open directory RDF files. It allows re-crawls of previously created Yioop! WebArchives.... (more)