I am pleased to announce today Version 6 of Yioop Software!
It's available for download from:
Seekquarry Downloads.
What's New in Version 6
- Crawler and Search Engine
- Trending keywords now available under More and Tools link.
- Support for multiple simultaneous crawls by assigning machines to channels and then scheduling crawls to those channels.
- Support for general repeating crawls. These crawls have a repeat frequency and two indexes: one for searching for crawling and Yioop automatically switches between the two every repeat period.
- Support for crawling to some fix depth directly rather than using a regex in allowable sites to crawl.
- Dropdown to allow admins to control how Yioop should follow robots.txt files.
- Under Page Options can now test how pages will be processed by URI, File Upload, or Direct Input.
- Safe search check box added to Settings and enabled by default.
- Fixes issues with HTTP/2 crawling on Linux.
- Improves Mirror server handling.
- Removes Memcache support as cache option for search results
- Indexing and Library Functionality
- Width, Height, EXIF, and XMP meta data now indexed for images and media:image-small, media:image-medium, media:image-large meta words added.
- Improved language and safe website detection. Now also supports mul locale tag.
- Adds stopWordsRemover method to all supported locales' Tokenizer class.
- New LinearAlgebra class added to make it easier to do term vector manipulations both for summarizers and in using Yioop as a Library under Composer.
- All summarizers rewritten. Each sentence for each summarizer now gets a score before being added to summary.
This score is also used in ranking search results.
- A Test link for Search Sources added to allow easy testing if source being correctly downloaded.
- Adds new Scrape Podcast search source to allow downloading of podcasts to wiki pages.
- Web Scraper order of application now determined by a priority field.
- Web Scrapers now enhanced so can now extract fields like THUMB_URL or other meta words, such as for video duration.
I.e., replaces functionality that previously only poorly served by video search sources.
- Removes video search sources from search sources.
- Add Library class with init method to make it easier to initialize Yioop when used with Composer.
- Under Page Options have a toggle to control whether phrase extraction rather than just term extraction always done. In most circumstances, not using phrase extraction gives faster and better indexing.
- Remove two copies of dictionary info, one in IndexShard and one in IndexDictionary, thus, making for smaller indexes.
- Cache pages now stored with summary in same object allowing more compression if keeping cache of whole pages
- Removes materialized metas and largely unused thesaurus functionality.
- Group and Wiki System
- Adds a seen media indicator in media lists, which can be user reset.
- Improved inter-group links.
- If wiki url has 360 in path, checks for 360 images and adds an enter VR button to view them.
- Media updater now has a job that allows periodic downloading of podcasts to a wiki page.
- Time zone, Cookie name, and Session token now set under Security rather than Appearance, time
before autologout now controllable by admin using dropdown.
(
Edited: 2019-06-29)
I am pleased to announce today Version 6 of Yioop Software!
It's available for download from:
[[https://www.seekquarry.com/p/Downloads|Seekquarry Downloads]].
=What's New in Version 6=
*'''Crawler and Search Engine'''
** Trending keywords now available under More and Tools link.
** Support for multiple simultaneous crawls by assigning machines to channels and then scheduling crawls to those channels.
** Support for general repeating crawls. These crawls have a repeat frequency and two indexes: one for searching for crawling and Yioop automatically switches between the two every repeat period.
** Support for crawling to some fix depth directly rather than using a regex in allowable sites to crawl.
** Dropdown to allow admins to control how Yioop should follow robots.txt files.
** Under Page Options can now test how pages will be processed by URI, File Upload, or Direct Input.
** Safe search check box added to Settings and enabled by default.
** Fixes issues with HTTP/2 crawling on Linux.
** Improves Mirror server handling.
** Removes Memcache support as cache option for search results
*'''Indexing and Library Functionality'''
** Width, Height, EXIF, and XMP meta data now indexed for images and media:image-small, media:image-medium, media:image-large meta words added.
** Improved language and safe website detection. Now also supports mul locale tag.
** Adds stopWordsRemover method to all supported locales' Tokenizer class.
** New LinearAlgebra class added to make it easier to do term vector manipulations both for summarizers and in using Yioop as a Library under Composer.
** All summarizers rewritten. Each sentence for each summarizer now gets a score before being added to summary.
This score is also used in ranking search results.
** A Test link for Search Sources added to allow easy testing if source being correctly downloaded.
** Adds new Scrape Podcast search source to allow downloading of podcasts to wiki pages.
** Web Scraper order of application now determined by a priority field.
** Web Scrapers now enhanced so can now extract fields like THUMB_URL or other meta words, such as for video duration.
I.e., replaces functionality that previously only poorly served by video search sources.
** Removes video search sources from search sources.
** Add Library class with init method to make it easier to initialize Yioop when used with Composer.
** Under Page Options have a toggle to control whether phrase extraction rather than just term extraction always done. In most circumstances, not using phrase extraction gives faster and better indexing.
** Remove two copies of dictionary info, one in IndexShard and one in IndexDictionary, thus, making for smaller indexes.
** Cache pages now stored with summary in same object allowing more compression if keeping cache of whole pages
** Removes materialized metas and largely unused thesaurus functionality.
*'''Group and Wiki System'''
** Adds a seen media indicator in media lists, which can be user reset.
** Improved inter-group links.
** If wiki url has 360 in path, checks for 360 images and adds an enter VR button to view them.
** Media updater now has a job that allows periodic downloading of podcasts to a wiki page.
** Time zone, Cookie name, and Session token now set under Security rather than Appearance, time
before autologout now controllable by admin using dropdown.