lunar (1) indexer.1.gz

Provided by: sphinxsearch_2.2.11-8build1_amd64 bug

NAME

       indexer - Sphinxsearch fulltext index generator

SYNOPSIS

       indexer [--config CONFIGFILE] [--rotate] [--noprogress | --quiet] [--all | INDEX | ...]

       indexer --buildstops OUTPUTFILE COUNT [--config CONFIGFILE] [--noprogress | --quiet]
               [--all | INDEX | ...]

       indexer --merge MAIN_INDEX DELTA_INDEX [--config CONFIGFILE] [--rotate] [--noprogress |
               --quiet]

DESCRIPTION

       Sphinx is a collection of programs that aim to provide high quality fulltext search.

       indexer is the first of the two principle tools as part of Sphinx. Invoked from either the
       command line directly, or as part of a larger script, indexer is solely responsible for
       gathering the data that will be searchable.

       The calling syntax for indexer is as follows:

           $ indexer [OPTIONS] [indexname1 [indexname2 [...]]]

       Essentially you would list the different possible indexes (that you would later make
       available to search) in sphinx.conf, so when calling indexer, as a minimum you need to be
       telling it what index (or indexes) you want to index.

       If sphinx.conf contained details on 2 indexes, mybigindex and mysmallindex, you could do
       the following:

           $ indexer mybigindex
           $ indexer mysmallindex mybigindex

       As part of the configuration file, sphinx.conf, you specify one or more indexes for your
       data. You might call indexer to reindex one of them, ad-hoc, or you can tell it to process
       all indexes - you are not limited to calling just one, or all at once, you can always pick
       some combination of the available indexes.

OPTIONS

       The majority of the options for indexer are given in the configuration file, however there
       are some options you might need to specify on the command line as well, as they can affect
       how the indexing operation is performed. These options are:

       --all
           Tells indexer to update every index listed in sphinx.conf, instead of listing
           individual indexes. This would be useful in small configurations, or cron-type or
           maintenance jobs where the entire index set will get rebuilt each day, or week, or
           whatever period is best.

           Example usage:

               $ indexer --config /home/myuser/sphinx.conf --all

       --buildstops outfile.txt NUM
           Reviews the index source, as if it were indexing the data, and produces a list of the
           terms that are being indexed. In other words, it produces a list of all the searchable
           terms that are becoming part of the index. Note; it does not update the index in
           question, it simply processes the data 'as if' it were indexing, including running
           queries defined with sql_query_pre or sql_query_post.  outputfile.txt will contain the
           list of words, one per line, sorted by frequency with most frequent first, and NUM
           specifies the maximum number of words that will be listed; if sufficiently large to
           encompass every word in the index, only that many words will be returned. Such a
           dictionary list could be used for client application features around "Did you mean..."
           functionality, usually in conjunction with --buildfreqs, below.

           Example:

               $ indexer myindex --buildstops word_freq.txt 1000

           This would produce a document in the current directory, word_freq.txt with the 1,000
           most common words in 'myindex', ordered by most common first. Note that the file will
           pertain to the last index indexed when specified with multiple indexes or --all (i.e.
           the last one listed in the configuration file)

       --buildfreqs
           Used in pair with --buildstops (and is ignored if --buildstops is not specified). As
           --buildstops provides the list of words used within the index, --buildfreqs adds the
           quantity present in the index, which would be useful in establishing whether certain
           words should be considered stopwords if they are too prevalent. It will also help with
           developing "Did you mean..." features where you can how much more common a given word
           compared to another, similar one.

           Example:

               $ indexer myindex --buildstops word_freq.txt 1000 --buildfreqs

           This would produce the word_freq.txt as above, however after each word would be the
           number of times it occurred in the index in question.

       --config CONFIGRILE, -c CONFIGFILE
           Use the given file as configuration. Normally, it will look for sphinx.conf in the
           installation directory (e.g./usr/local/sphinx/etc/sphinx.conf if installed into
           /usr/local/sphinx), followed by the current directory you are in when calling indexer
           from the shell. This is most of use in shared environments where the binary files are
           installed somewhere like /usr/local/sphinx/ but you want to provide users with the
           ability to make their own custom Sphinx set-ups, or if you want to run multiple
           instances on a single server. In cases like those you could allow them to create their
           own sphinx.conf files and pass them to indexer with this option.

           For example:

               $ indexer --config /home/myuser/sphinx.conf myindex

       --dump-rows FILE
           Dumps rows fetched by SQL source(s) into the specified file, in a MySQL compatible
           syntax. Resulting dumps are the exact representation of data as received by indexer
           and help to repeat indexing-time issues.

       --merge DST-INDEX SRC-INDEX
           Physically merge together two indexes. For example if you have a main+delta scheme,
           where the main index rarely changes, but the delta index is rebuilt frequently, and
           --merge would be used to combine the two. The operation moves from right to left - the
           contents of SRC-INDEX get examined and physically combined with the contents of
           DST-INDEX and the result is left in DST-INDEX. In pseudo-code, it might be expressed
           as: DST-INDEX += SRC-INDEX

           An example:

               $ indexer --merge main delta --rotate

           In the above example, where the main is the master, rarely modified index, and delta
           is the less frequently modified one, you might use the above to call indexer to
           combine the contents of the delta into the main index and rotate the indexes.

       --merge-dst-range ATTR MIN MAX
           Run the filter range given upon merging. Specifically, as the merge is applied to the
           destination index (as part of --merge, and is ignored if --merge is not specified),
           indexer will also filter the documents ending up in the destination index, and only
           documents will pass through the filter given will end up in the final index. This
           could be used for example, in an index where there is a 'deleted' attribute, where 0
           means 'not deleted'. Such an index could be merged with:

               $ indexer --merge main delta --merge-dst-range deleted 0 0

           Any documents marked as deleted (value 1) would be removed from the newly-merged
           destination index. It can be added several times to the command line, to add
           successive filters to the merge, all of which must be met in order for a document to
           become part of the final index.

       --merge-killlists, --merge-klists
           Used in pair with --merge. Usually when merging indexer uses kill-list of source index
           (i.e., the one which is merged into) as the filter to wipe out the matching docs from
           the destination index. At the same time the kill-list of the destination itself isn't
           touched at all. When using --merge-killlists, (or it shorter form --merge-klists) the
           indexer will not filter the dst-index docs with src-index killlist, but it will merge
           their kill-lists together, so the final result index will have the kill-list
           containing the merged source kill-lists.

       --noprogress
           Don't display progress details as they occur; instead, the final status details (such
           as documents indexed, speed of indexing and so on are only reported at completion of
           indexing. In instances where the script is not being run on a console (or 'tty'), this
           will be on by default.

           Example usage:

               $ indexer --rotate --all --noprogress

       --print-queries
           Prints out SQL queries that indexer sends to the database, along with SQL connection
           and disconnection events. That is useful to diagnose and fix problems with SQL
           sources.

       --quiet
           Tells indexer not to output anything, unless there is an error. Again, most used for
           cron-type, or other script jobs where the output is irrelevant or unnecessary, except
           in the event of some kind of error.

           Example usage:

               $ indexer --rotate --all --quiet

       --rotate
           Used for rotating indexes. Unless you have the situation where you can take the search
           function offline without troubling users, you will almost certainly need to keep
           search running whilst indexing new documents.  --rotate creates a second index,
           parallel to the first (in the same place, simply including .new in the filenames).
           Once complete, indexer notifies searchd via sending the SIGHUP signal, and searchd
           will attempt to rename the indexes (renaming the existing ones to include .old and
           renaming the .new to replace them), and then start serving from the newer files.
           Depending on the setting of seamless_rotate, there may be a slight delay in being able
           to search the newer indexes.

           Example usage:

               $ indexer --rotate --all

       --sighup-each
           is useful when you are rebuilding many big indexes, and want each one rotated into
           searchd as soon as possible. With --sighup-each, indexer will send a SIGHUP signal to
           searchd after succesfully completing the work on each index. (The default behavior is
           to send a single SIGHUP after all the indexes were built.)

       --verbose
           Guarantees that every row that caused problems indexing (duplicate, zero, or missing
           document ID; or file field IO issues; etc) will be reported. By default, this option
           is off, and problem summaries may be reported instead.

AUTHOR

       Andrey Aksenoff (shodan@sphinxsearch.com). This manual page is written by Alexey
       Vinogradov (klirichek@sphinxsearch.com), using the one written by Christian Hofstaedtler
       ch+debian-packages@zeha.at for the Debian system (but may be used by others). Permission
       is granted to copy, distribute and/or modify this document under the terms of the GNU
       General Public License, Version 2 any later version published by the Free Software
       Foundation.

       On Debian systems, the complete text of the GNU General Public License can be found in
       /usr/share/common-licenses/GPL.

SEE ALSO

       searchd(1), search(1), indextool(1), spelldump(1)

       Sphinx and it's programs are documented fully by the Sphinx reference manual available in
       /usr/share/doc/sphinxsearch.