-- Conjuctive Query Issue
First off, turning on caching of the whole page has no effect on what search results you get. The two values Max Page Summary Length in Bytes and Byte Range to Download under Page Options have the largest effect in what is indexed. At query time, Yioop tries to detect if the query or parts of the query were for phrases that were stored as single terms in the index. If it finds sufficient results of this type, it doesn't bother to do a conjunctive query for that phrase, it just looks up that whole phrase as a term. The idea was to try to make queries faster if the index was very big (hundreds of millions of downloaded pages on a spin hard drive). I agree this can yield weird results and is unnecessary for smallish crawls of 10's of millions of pages or if you have everything on SSD or RAM. I just added a flag so you can turn this off and get pure conjunctive query behavior if desired. To do this, check out the version of Yioop in the git repository and add:
nsdefine('USE_CHECK_FOR_PHRASE_QUERIES', false);
to your src/configs/LocalConfig.php file. If you don't have such a file, just create it, for example, the file might contain:
<?php
namespace seekquarry\yioop\configs;
nsdefine('USE_CHECK_FOR_PHRASE_QUERIES', false);
(
Edited: 2019-04-19)
First off, turning on caching of the whole page has no effect on what search results you get. The two values Max Page Summary Length in Bytes and Byte Range to Download under Page Options have the largest effect in what is indexed. At query time, Yioop tries to detect if the query or parts of the query were for phrases that were stored as single terms in the index. If it finds sufficient results of this type, it doesn't bother to do a conjunctive query for that phrase, it just looks up that whole phrase as a term. The idea was to try to make queries faster if the index was very big (hundreds of millions of downloaded pages on a spin hard drive). I agree this can yield weird results and is unnecessary for smallish crawls of 10's of millions of pages or if you have everything on SSD or RAM. I just added a flag so you can turn this off and get pure conjunctive query behavior if desired. To do this, check out the version of Yioop in the git repository and add:
nsdefine('USE_CHECK_FOR_PHRASE_QUERIES', false);
to your src/configs/LocalConfig.php file. If you don't have such a file, just create it, for example, the file might contain:
<?php
namespace seekquarry\yioop\configs;
nsdefine('USE_CHECK_FOR_PHRASE_QUERIES', false);