bionic (3) Lucy::Index::Indexer.3pm.gz

Provided by: liblucy-perl_0.3.3-8_amd64 bug

NAME

       Lucy::Index::Indexer - Build inverted indexes.

SYNOPSIS

           my $indexer = Lucy::Index::Indexer->new(
               schema => $schema,
               index  => '/path/to/index',
               create => 1,
           );
           while ( my ( $title, $content ) = each %source_docs ) {
               $indexer->add_doc({
                   title   => $title,
                   content => $content,
               });
           }
           $indexer->commit;

DESCRIPTION

       The Indexer class is Apache Lucy's primary tool for managing the content of inverted indexes, which may
       later be searched using IndexSearcher.

       In general, only one Indexer at a time may write to an index safely.  If a write lock cannot be secured,
       new() will throw an exception.

       If an index is located on a shared volume, each writer application must identify itself by supplying an
       IndexManager with a unique "host" id to Indexer's constructor or index corruption will occur.  See
       Lucy::Docs::FileLocking for a detailed discussion.

       Note: at present, delete_by_term() and delete_by_query() only affect documents which had been previously
       committed to the index -- and not any documents added this indexing session but not yet committed.  This
       may change in a future update.

CONSTRUCTORS

   new( [labeled params] )
           my $indexer = Lucy::Index::Indexer->new(
               schema   => $schema,             # required at index creation
               index    => '/path/to/index',    # required
               create   => 1,                   # default: 0
               truncate => 1,                   # default: 0
               manager  => $manager             # default: created internally
           );

       •   schema - A Schema.  Required when index is being created; if not supplied, will be extracted from the
           index folder.

       •   index - Either a filepath to an index or a Folder.

       •   create - If true and the index directory does not exist, attempt to create it.

       •   truncate - If true, proceed with the intention of discarding all previous indexing data.  The old
           data will remain intact and visible until commit() succeeds.

       •   manager - An IndexManager.

METHODS

   add_doc(...)
           $indexer->add_doc($doc);
           $indexer->add_doc( { field_name => $field_value } );
           $indexer->add_doc(
               doc   => { field_name => $field_value },
               boost => 2.5,         # default: 1.0
           );

       Add a document to the index.  Accepts either a single argument or labeled params.

       •   doc - Either a Lucy::Document::Doc object, or a hashref (which will be attached to a
           Lucy::Document::Doc object internally).

       •   boost - A floating point weight which affects how this document scores.

   add_index(index)
       Absorb an existing index into this one.  The two indexes must have matching Schemas.

       •   index - Either an index path name or a Folder.

   optimize()
       Optimize the index for search-time performance.  This may take a while, as it can involve rewriting large
       amounts of data.

   commit()
       Commit any changes made to the index.  Until this is called, none of the changes made during an indexing
       session are permanent.

       Calling commit() invalidates the Indexer, so if you want to make more changes you'll need a new one.

   prepare_commit()
       Perform the expensive setup for commit() in advance, so that commit() completes quickly.  (If
       prepare_commit() is not called explicitly by the user, commit() will call it internally.)

   delete_by_term( [labeled params] )
       Mark documents which contain the supplied term as deleted, so that they will be excluded from search
       results and eventually removed altogether.  The change is not apparent to search apps until after
       commit() succeeds.

       •   field - The name of an indexed field. (If it is not spec'd as "indexed", an error will occur.)

       •   term - The term which identifies docs to be marked as deleted.  If "field" is associated with an
           Analyzer, "term" will be processed automatically (so don't pre-process it yourself).

   delete_by_query(query)
       Mark documents which match the supplied Query as deleted.

       •   query - A Query.

INHERITANCE

       Lucy::Index::Indexer isa Lucy::Object::Obj.