Provided by: recollcmd_1.32.5-1ubuntu1_amd64 bug

NAME

       recollindex - indexing command for the Recoll full text search system

SYNOPSIS

       recollindex -h
       recollindex [ -c <cfdir>] [ -z|-Z ] [ -k ] [ --diagsfile <diagpath> ]
       recollindex [ -c <cfd>] -m [ -w <secs>] [ -D ] [ -x ] [ -C ] [ -n|-k ]
       recollindex [ -c <cfdir>] -i [ -Z -k -f -P ] [<path [path ...]>]
       recollindex [ -c <cfdir>] -r [ -Z -K -e -f ] [ -p pattern ] <dirpath>
       recollindex [ -c <cfdir>] -e [<path [path ...]>]
       recollindex [ -c <cfdir>] -l|-S|-E
       recollindex [ -c <cfdir>] -s <lang>
       recollindex [ -c <cfdir>] --webcache-compact
       recollindex   [   -c   <cfdir>]  --webcache-burst  <destdir>  recollindex  [  -c  <cfdir>]
       --notindexed [path [path ...]]

DESCRIPTION

       The recollindex command is the Recoll indexer.

       As indexing can sometimes take a long time, the command can be interrupted by  sending  an
       interrupt  (Ctrl-C, SIGINT) or terminate (SIGTERM) signal. Some time may elapse before the
       process exits, because it needs to properly flush and close the index. This  can  also  be
       done from the recoll GUI (menu entry: File/Stop_Indexing). After such an interruption, the
       index will be somewhat inconsistent because some operations which are  normally  performed
       at  the  end  of  the  indexing pass will have been skipped (for example, the stemming and
       spelling databases will be inexistent or out of date). You just need to  restart  indexing
       at  a  later  time  to  restore consistency. The indexing will restart at the interruption
       point (the full file tree will be traversed,  but  files  that  were  indexed  up  to  the
       interruption and for which the index is still up to date will not need to be reindexed).

       The  -c  option  specifies  the  configuration  directory  name, overriding the default or
       $RECOLL_CONFDIR.

       There are several modes of operation.

       The normal mode  will  index  the  set  of  files  described  in  the  configuration  file
       recoll.conf.   This  will  incrementally update the database with files that changed since
       the last run. If option -z is given, the database  will  be  erased  before  starting.  If
       option  -Z  is  given, the database will not be reset, but all files will be considered as
       needing reindexing (in place reset).

       As of version 1.21, recollindex usually does not  process  again  files  which  previously
       failed  to index (for example because of a missing helper program). If option -k is given,
       recollindex will try again to process all failed files. Please note that  recollindex  may
       also  decide  to  retry  failed  files  if  the  auxiliary  checking script defined by the
       "checkneedretryindexscript" configuration variable indicates that this should happen.

       If option --diagsfile is given, the path given as parameter will be truncated and indexing
       diagnostics  will  be  written  to  it.  Each line in the file will have a diagnostic type
       (reason for the file not to be indexed), the file path, and a possible additional piece of
       information,  which  can  be  the  MIME type or the archive internal path depending on the
       issue. The following diagnostic types are currently defined:

              Skipped : the path matches an element of skippedPaths or skippedNames.

              NoContentSuffix : the file name suffix is found in the noContentSuffixes list.

              MissingHelper : a helper program is missing.

              Error : general error (see the log).

              NoHandler: no handler is defined for the MIME type.

              ExcludedMime : the MIME type is part of the excludedmimetypes list.

              NotIncludedMime : the onlymimetypes list is not empty and the the MIME type is  not
              in it.

       If  option  -m  is  given, recollindex is started for real time monitoring, using the file
       system monitoring package it was configured for (either fam, gamin, or inotify). This mode
       must  have  been  explicitly  configured when building the package, it is not available by
       default. The program will normally detach from  the  controlling  terminal  and  become  a
       daemon.  If option -D is given, it will stay in the foreground. Option -w <seconds> can be
       used to specify that the program should sleep  for  the  specified  time  before  indexing
       begins.  The  default  value is 60. The daemon normally monitors the X11 session and exits
       when it is reset.  Option -x disables this X11 session monitoring (daemon will stay  alive
       even  if  it  cannot  connect  to the X11 server). You need to use this too if you use the
       daemon without an X11 context. You can use option -n to skip the initial incrementing pass
       which  is  normally  performed  before  monitoring starts. Once monitoring is started, the
       daemon normally monitors the configuration and restarts from scratch if a change is  made.
       You can disable this with option -C

       recollindex  -i  will  index  individual  files  into the database. The stem expansion and
       aspell databases will not be updated.  The  skippedPaths  and  skippedNames  configuration
       variables  will  be  used,  so that some files may be skipped. You can tell recollindex to
       ignore skippedPaths and skippedNames by setting the -f option. This  allows  fully  custom
       file  selection  for  a  given  subtree,  for  which  you  would  add the top directory to
       skippedPaths, and use any custom tool to generate the file list (ie: a tool from a  source
       code control system). When run this way, the indexer normally does not perform the deleted
       files purge pass, because it cannot be sure to have seen all the existing files.  You  can
       force a purge pass with -P.

       recollindex  -e will erase data for individual files from the database. The stem expansion
       databases will not be updated.

       Options -i and -e can be combined. This will first perform the purge, then the indexing.

       With options -i or -e , if no file names are given on the command line, they will be  read
       from stdin, so that you could for example run:

       find /path/to/dir -print | recollindex -e -i

       to  force  the  reindexing  of a directory tree (which has to exist inside the file system
       area defined by topdirs in recoll.conf). You could mostly accomplish the same thing with

       find /path/to/dir -print | recollindex -Z -i

       The latter will perform a less thorough job of purging stale sub-documents though.

       recollindex -r mostly works like -i , but the parameter is a single directory, which  will
       be  recursively  updated.  This mostly does nothing more than find topdir | recollindex -i
       but it may be more convenient to use when  started  from  another  program.  This  retries
       failed  files  by default, use option -K to change. One or multiple -p options can be used
       to set shell-type selection patterns (e.g.: *.pdf).

       recollindex -l will list the names of available language stemmers.

       recollindex -s will build the stem expansion database for a given language, which  may  or
       may  not be part of the list in the configuration file. If the language is not part of the
       configuration, the stem expansion database will be deleted at the end of the  next  normal
       indexing  run. You can get the list of stemmer names from the recollindex -l command. Note
       that this is mostly for experimental use, the normal way to add a stemming language is  to
       set  it in the configuration, either by editing "recoll.conf" or by using the GUI indexing
       configuration dialog.
       At the time of this writing, the following  languages  are  recognized  (out  of  Xapian's
       stem.h):

       •      danish

       •      dutch

       •      english Martin Porter's 2002 revision of his stemmer

       •      english_lovins Lovin's stemmer

       •      english_porter Porter's stemmer as described in his 1980 paper

       •      finnish

       •      french

       •      german

       •      italian

       •      norwegian

       •      portuguese

       •      russian

       •      spanish

       •      swedish

       recollindex  -S will rebuild the phonetic/orthographic index. This feature uses the aspell
       package, which must be installed on the system.

       recollindex -E will check the configuration file for  topdirs  and  other  relevant  paths
       existence (to help catch typos).

       recollindex  --webcache-compact  will  recover  the  space wasted by erased page instances
       inside the Web cache. It may temporarily need to use twice the disk space used by the  Web
       cache.

       recollindex  --webcache-burst  <destdir>  will  extract  all entries from the Web cache to
       files created inside <destdir>. Each cache entry is extracted as two files, for  the  data
       and metadata.

       recollindex --notindexed [path [path ...]]  will check each path and print out those which
       are absent from the index (with an "ABSENT" prefix), or caused an indexing error (with  an
       "ERROR"  prefix).  If  no paths are given on the command line, the command will read them,
       one per line, from stdin.

SEE ALSO

       recoll(1) recoll.conf(5)

                                          8 January 2006                           RECOLLINDEX(1)