Skip to content

amaltsev/XAO-Indexer

Repository files navigation

NAME
    XAO::Indexer -- Full text data indexing for XAO::FS

SYNOPSIS
     my $keywords=$cgi->param('keywords');
     my $cn_index=$odb->fetch('/Indexes/customer_names');
     my $sr=$cn_index->search_by_string('name',$keywords);

DESCRIPTION
    XAO Indexer allows to build an optimised external index to collections
    of data stored in a XAO::FS database and then perform keyword based
    searches.

    It is being used with great success on collection of millions of records
    on some sites, probably most notably on <http://ISBNdb.com/> where it
    powers all the searches.

PROBLEM & SOLUTION
    Searches are limited to just keywords, but allow to find many keywords
    in a specific sequence or just many keywords that belong to a specific
    collection, but could be in different properties of different objects.

    To perform the same kind of search on just two properties of an object
    with two possible keywords a join similar to the following is required:

     ( (property1 match keyword1) and (property1 match keyword2) ) or
     ( (property1 match keyword1) and (property2 match keyword2) ) or
     ( (property2 match keyword1) and (property2 match keyword2) ) or
     ( (property2 match keyword1) and (property1 match keyword2) )

    With bigger number of keywords and properties the expression becomes too
    big to be efficiently handled by SQL server and in some cases probably
    to be even parsed normally by an SQL server.

    In addition, such keyword searches are not optimised in SQL databases
    usually and frequently involve full table scans.

    XAO Indexer solves this problem by pre-building a specially formatted
    index table that has results for specific keywords. As an additional
    benefit it allows to get results pre-sorted using some (possibly
    computed) criteria without any performance impact.

    It needs to be mentioned though, that XAO Indexer is not integrated with
    the collection it builds index for in any way. It has to be maintained
    and updated manually and can return IDs of objects that no longer exist
    in the database.

    The process of re-building indexes can take significant time depending
    on the content of source collection. In our tests it takes approximately
    5 minutes to build an index based on 60,000 records 5..50 fields per
    record spread over 3 or more related objects (products, categories and
    specifications).

  STRUCTURE
    XAO::Indexer is a stub module that only holds common documentation that
    you are reading now. Real functionality is provided by:

    XAO::DO::Data::Index
        This is a XAO FS Hash object that gets stored into some container in
        your database, usually /Indexes. It provides wrapper methods to all
        indexing functionality, see XAO::DO::Data::Index for details.

        Most of the time you will interact with this object in your code.
        Something like:

         my $keywords=$cgi->param('keywords');
         my $cn_index=$odb->fetch('/Indexes/customer_names');
         my $sr=$cn_index->search('name',$keywords);

    XAO::DO::Indexer::Base
        This is the core of XAO Indexer -- a base class for derived data
        collection specific indexers. Usually it is enough to override just
        a couple of its methods -- analyze_object(), get_collection() and
        get_orderings(). See XAO::DO::Indexer::Base for details.

    xao-indexer script
        Provides command-line functions to create, update and delete
        indexes. Provides also a simple search functionality intended for
        debugging purposes mainly.

AUTHORS
    Copyright (c) 2005 Andrew Maltsev

    Copyright (c) 2003-2004 Andrew Maltsev, XAO Inc.

    <[email protected]> -- http://ejelta.com/xao/

SEE ALSO
    Recommended reading: XAO::DO::Data::Index, XAO::DO::Indexer::Base,
    XAO::FS, XAO::Web.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published