Provided by: recoll_1.23.7-1_amd64 bug

NAME

       recoll.conf - main personal configuration file for Recoll

DESCRIPTION

       This file defines the index configuration for the Recoll full-text search system.

       The      system-wide     configuration     file     is     normally     located     inside
       /usr/[local]/share/recoll/examples. Any parameter set in the common file may be overridden
       by setting it in the personal configuration file, by default: $HOME/.recoll/recoll.conf

       Please  note  while  we  try  to  keep  this  manual  page  reasonably up to date, it will
       frequently lag the current state of the software. The best source of information about the
       configuration are the comments in the system-wide configuration file.

       A short extract of the file might look as follows:

              # Space-separated list of directories to index.
              topdirs =  ~/docs /usr/share/doc

              [~/somedirectory-with-utf8-txt-files]
              defaultcharset = utf-8

       There are three kinds of lines:

              •      Comment or empty

              •      Parameter affectation

              •      Section definition

       Empty lines or lines beginning with # are ignored.

       Affectation lines are in the form 'name = value'.

       Section lines allow redefining a parameter for a directory subtree. Some of the parameters
       used for indexing are looked up hierarchically from the more to the less specific. Not all
       parameters can be meaningfully redefined, this is specified for each in the next section.

       The  tilde  character  (~)  is  expanded  in  file  names  to  the name of the user's home
       directory.

       Where values are lists, white space is used for separation,  and  elements  with  embedded
       spaces can be quoted with double-quotes.

OPTIONS

       topdirs = string
              Space-separated  list  of  files  or directories to recursively index. Default to ~
              (indexes $HOME). You can use symbolic links in the list,  they  will  be  followed,
              independently of the value of the followLinks variable.

       skippedNames = string
              Files  and  directories  which  should  be  ignored.  White space separated list of
              wildcard patterns (simple ones, not paths, must contain  no  /  ),  which  will  be
              tested  against  file  and  directory names.  The list in the default configuration
              does not exclude hidden directories (names beginning with a dot), which means  that
              it may index quite a few things that you do not want. On the other hand, email user
              agents like Thunderbird usually store  messages  in  hidden  directories,  and  you
              probably   want   this   indexed.  One  possible  solution  is  to  have  '.*'   in
              'skippedNames', and add things like '~/.thunderbird' '~/.evolution'  to  'topdirs'.
              Not  even  the  file  names  are  indexed  for  patterns  in  this  list,  see  the
              'noContentSuffixes' variable for an alternative approach  which  indexes  the  file
              names. Can be redefined for any subtree.

       noContentSuffixes = string
              List  of  name  endings (not necessarily dot-separated suffixes) for which we don't
              try MIME type identification, and don't uncompress or index content. Only the names
              will  be  indexed.  This complements the now obsoleted recoll_noindex list from the
              mimemap file, which will go away in a future release  (the  move  from  mimemap  to
              recoll.conf  allows  editing  the  list  through  the  GUI). This is different from
              skippedNames because these are name ending matches only  (not  wildcard  patterns),
              and  the  file  name  itself  gets  indexed  normally.  This  can  be redefined for
              subdirectories.

       skippedPaths = string
              Paths we should not go into.  Space-separated  list  of  wildcard  expressions  for
              filesystem paths. Can contain files and directories. The database and configuration
              directories  will  automatically  be  added.  The  expressions  are  matched  using
              'fnmatch(3)'  with  the  FNM_PATHNAME  flag  set  by  default.  This means that '/'
              characters must be matched explicitly. You can set 'skippedPathsFnmPathname'  to  0
              to   disable   the   use   of  FNM_PATHNAME  (meaning  that  '/*/dir3'  will  match
              '/dir1/dir2/dir3').  The default value contains the usual mount point for removable
              media  to  remind you that it is a bad idea to have Recoll work on these (esp. with
              the monitor: media gets indexed  on  mount,  all  data  gets  erased  on  unmount).
              Explicitly adding '/media/xxx' to the topdirs will override this.

       skippedPathsFnmPathname = bool
              Set to 0 to override use of FNM_PATHNAME for matching skipped paths.

       daemSkippedPaths = string
              skippedPaths  equivalent  specific to real time indexing. This enables having parts
              of the tree which are initially indexed but not monitored. If  daemSkippedPaths  is
              not set, the daemon uses skippedPaths.

       zipSkippedNames = string
              Space-separated  list  of  wildcard  expressions  for  names that should be ignored
              inside zip archives. This is used directly by the zip handler, and has  a  function
              similar   to   skippedNames,   but   works  independently.  Can  be  redefined  for
              subdirectories.     Supported     by     recoll     1.20     and     newer.     See
              https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members

       followLinks = bool
              Follow  symbolic  links during indexing. The default is to ignore symbolic links to
              avoid multiple indexing of linked files. No effort is  made  to  avoid  duplication
              when  this  option  is set to true. This option can be set individually for each of
              the 'topdirs' members by using sections. It can not be changed below the  'topdirs'
              level. Links in the 'topdirs' list itself are always followed.

       indexedmimetypes = string
              Restrictive  list  of  indexed  mime  types.  Normally  not  set (in which case all
              supported types are indexed). If it is set, only the types from the list will  have
              their  contents  indexed.  The names will be indexed anyway if indexallfilenames is
              set (default). MIME type names should be  taken  from  the  mimemap  file.  Can  be
              redefined for subtrees.

       excludedmimetypes = string
              List  of  excluded  MIME  types.  Lets you exclude some types from indexing. Can be
              redefined for subtrees.

       compressedfilemaxkbs = int
              Size limit for compressed files.  We  need  to  decompress  these  in  a  temporary
              directory for identification, which can be wasteful in some cases. Limit the waste.
              Negative means no limit. 0 results in no processing of any compressed file. Default
              50 MB.

       textfilemaxmbs = int
              Size limit for text files. Mostly for skipping monster logs. Default 20 MB.

       indexallfilenames = bool
              Index  the file names of unprocessed files Index the names of files the contents of
              which we don't index because of an excluded or unsupported MIME type.

       usesystemfilecommand = bool
              Use a system command for file MIME type guessing as  a  final  step  in  file  type
              identification  This  is  generally  useful, but will usually cause the indexing of
              many bogus 'text' files. See 'systemfilecommand' for the command used.

       systemfilecommand = string
              Command used to guess MIME types if the internal methods fails  This  should  be  a
              "file  -i"  workalike.   The  file  path  will  be added as a last parameter to the
              command line. 'xdg-mime' works better than the traditional 'file' command,  and  is
              now the configured default (with a hard-coded fallback to 'file')

       processwebqueue = bool
              Decide  if  we process the Web queue. The queue is a directory where the Recoll Web
              browser plugins create the copies of visited pages.

       textfilepagekbs = int
              Page size for text files. If this is set, text/plain files  will  be  divided  into
              documents  of  approximately  this size. Will reduce memory usage at index time and
              help with loading data in the preview window at  query  time.  Particularly  useful
              with  very  big  files, such as application or system logs. Also see textfilemaxmbs
              and compressedfilemaxkbs.

       membermaxkbs = int
              Size limit for archive members. This is passed to the filters in the environment as
              RECOLL_FILTER_MAXMEMBERKB.

       indexStripChars = bool
              Decide  if  we store character case and diacritics in the index. If we do, searches
              sensitive to case and diacritics can be performed, but the index  will  be  bigger,
              and  some  marginal weirdness may sometimes occur. The default is a stripped index.
              When  using  multiple  indexes  for  a  search,  this  parameter  must  be  defined
              identically for all. Changing the value implies an index reset.

       nonumbers = bool
              Decides  if  terms  will  be  generated  for  numbers.  For example "123", "1.5e6",
              192.168.1.4, would not be indexed if nonumbers is set ("value123" would still  be).
              Numbers  are often quite interesting to search for, and this should probably not be
              set except for special situations, ie, scientific documents with  huge  amounts  of
              numbers  in them, where setting nonumbers will reduce the index size. This can only
              be set for a whole index, not for a subtree.

       dehyphenate = bool
              Determines if we index 'coworker' also when the input is 'co-worker'.  This is  new
              in  version  1.22,  and on by default. Setting the variable to off allows restoring
              the previous behaviour.

       nocjk = bool
              Decides if specific East Asian (Chinese Korean Japanese) characters/word  splitting
              is  turned  off. This will save a small amount of CPU if you have no CJK documents.
              If your document base does  include  such  text  but  you  are  not  interested  in
              searching it, setting nocjk may be a significant time and space saver.

       cjkngramlen = int
              This  lets  you  adjust the size of n-grams used for indexing CJK text. The default
              value of 2 is probably appropriate in most cases. A value of  3  would  allow  more
              precision and efficiency on longer words, but the index will be approximately twice
              as large.

       indexstemminglanguages = string
              Languages for which to create stemming expansion data. Stemmer names can  be  found
              by executing 'recollindex -l', or this can also be set from a list in the GUI.

       defaultcharset = string
              Default  character set. This is used for files which do not contain a character set
              definition (e.g.: text/plain). Values found inside files, e.g. a 'charset'  tag  in
              HTML  documents, will override it. If this is not set, the default character set is
              the one defined by the NLS environment ($LC_ALL, $LC_CTYPE, $LANG),  or  ultimately
              iso-8859-1  (cp-1252 in fact).  If for some reason you want a general default which
              does not match your LANG and  is  not  8859-1,  use  this  variable.  This  can  be
              redefined for any sub-directory.

       unac_except_trans = string
              A  list  of  characters,  encoded  in UTF-8, which should be handled specially when
              converting text to unaccented lowercase. For example, in Swedish, the letter a with
              diaeresis  has  full alphabet citizenship and should not be turned into an a.  Each
              element in the space-separated list has the special character as first element  and
              the  translation  following.  The  handling  of  both  the lowercase and upper-case
              versions of a character should be specified, as appartenance to the list will turn-
              off  both standard accent and case processing. The value is global and affects both
              indexing and querying.  Examples:

              Swedish:

              unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå

              German:

              unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl

              In French, you probably want to decompose oe and ae and nobody would type a  German
              ß

              unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl

              The  default  for  all until someone protests follows. These decompositions are not
              performed by unac, but it is unlikely that someone would type the composed forms in
              a search.

              unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl

       maildefcharset = string
              Overrides  the  default  character  set for email messages which don't specify one.
              This is mainly useful for readpst (libpst) dumps, which are utf-8 but  do  not  say
              so.

       localfields = string
              Set  fields on all files (usually of a specific fs area). Syntax is the usual: name
              = value ; attr1 = val1 ; [...]  value is empty so this needs an initial semi-colon.
              This  is  useful,  e.g.,  for  setting  the rclaptg field for application selection
              inside mimeview.

       testmodifusemtime = bool
              Use mtime instead of ctime to test if a file has been modified. The time is used in
              addition to the size, which is always used.  Setting this can reduce re-indexing on
              systems where extended attributes are used (by some  other  application),  but  not
              indexed,  because  changing  extended attributes only affects ctime.  Notes: - This
              may prevent detection of change in some marginal  file  rename  cases  (the  target
              would  need  to  have  the  same  size  and mtime).  - You should probably also set
              noxattrfields to 1 in this case, except  if  you  still  prefer  to  perform  xattr
              indexing,  for  example  if  the local file update pattern makes it of value (as in
              general, there is  a  risk  for  pure  extended  attributes  updates  without  file
              modification to go undetected). Perform a full index reset after changing this.

       noxattrfields = bool
              Disable  extended  attributes conversion to metadata fields. This probably needs to
              be set if testmodifusemtime is set.

       metadatacmds = string
              Define commands to gather external metadata, e.g. tmsu tags.  There can be  several
              entries,  separated  by  semi-colons,  each defining which field name the data goes
              into and the command to use. Don't forget the initial  semi-colon.  All  the  field
              names  must be different. You can use aliases in the "field" file if necessary.  As
              a not too pretty hack conceded  to  convenience,  any  field  name  beginning  with
              "rclmulti"  will  be taken as an indication that the command returns multiple field
              values inside a text blob formatted as a recoll configuration  file  ("fieldname  =
              fieldvalue" lines). The rclmultixx name will be ignored, and field names and values
              will be parsed from the data.  Example: metadatacmds =  ;  tags  =  tmsu  tags  %f;
              rclmulti1 = cmdOutputsConf %f

       cachedir = dfn
              Top  directory  for  Recoll  data.  Recoll  data  directories  are normally located
              relative    to    the    configuration    directory    (e.g.    ~/.recoll/xapiandb,
              ~/.recoll/mboxcache).  If  'cachedir'  is set, the directories are stored under the
              specified value instead (e.g. if cachedir is  ~/.cache/recoll,  the  default  dbdir
              would be ~/.cache/recoll/xapiandb).  This affects dbdir, webcachedir, mboxcachedir,
              aspellDicDir, which can still be individually specified to override cachedir.  Note
              that  if  you  have  multiple  configurations, each must have a different cachedir,
              there is no automatic computation of a subpath under cachedir.

       maxfsoccuppc = int
              Maximum file system occupation  over  which  we  stop  indexing.  The  value  is  a
              percentage,  corresponding  to  what  the  "Capacity"  df  output column shows. The
              default value is 0, meaning no checking.

       xapiandb = dfn
              Xapian database directory location. This will be created on first indexing. If  the
              value  is  not  an absolute path, it will be interpreted as relative to cachedir if
              set, or the configuration directory (-c argument or $RECOLL_CONFDIR).   If  nothing
              is specified, the default is then ~/.recoll/xapiandb/

       idxstatusfile = fn
              Name  of  the  scratch  file where the indexer process updates its status. Default:
              idxstatus.txt inside the configuration directory.

       mboxcachedir = dfn
              Directory location for storing mbox message offsets cache files. This  is  normally
              'mboxcache'  under  cachedir if set, or else under the configuration directory, but
              it may be useful to share a directory between different configurations.

       mboxcacheminmbs = int
              Minimum mbox file size over which we cache the offsets. There is really no sense in
              caching offsets for small files. The default is 5 MB.

       webcachedir = dfn
              Directory  where  we  store  the  archived  web pages. This is only used by the web
              history  indexing  code  Default:  cachedir/webcache  if  cachedir  is  set,   else
              $RECOLL_CONFDIR/webcache

       webcachemaxmbs = int
              Maximum  size  in  MB  of  the  Web  archive.  This is only used by the web history
              indexing code.  Default: 40 MB.  Reducing the size will not physically truncate the
              file.

       webqueuedir = fn
              The  path  to  the  Web  indexing  queue.  This  is  hard-coded  in  the  plugin as
              ~/.recollweb/ToIndex so there should be no need or possibility to change it.

       aspellDicDir = dfn
              Aspell   dictionary   storage   directory   location.   The    aspell    dictionary
              (aspdict.(lang).rws)  is  normally stored in the directory specified by cachedir if
              set, or under the configuration directory.

       filtersdir = dfn
              Directory location for executable input handlers. If RECOLL_FILTERSDIR  is  set  in
              the  environment,  we use it instead. Defaults to $prefix/share/recoll/filters. Can
              be redefined for subdirectories.

       iconsdir = dfn
              Directory location for icons. The only reason to change this would be if  you  want
              to    change    the   icons   displayed   in   the   result   list.   Defaults   to
              $prefix/share/recoll/images

       idxflushmb = int
              Threshold (megabytes of new data) where we flush from memory to disk index. Setting
              this  allows  some  control  over memory usage by the indexer process. A value of 0
              means no explicit flushing, which  lets  Xapian  perform  its  own  thing,  meaning
              flushing  every  $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as
              memory usage depends on average document size, not only document count, the  Xapian
              approach  is is not very useful, and you should let Recoll manage the flushes.  The
              default value of idxflushmb is 10 MB, and may be a bit low. If you are looking  for
              maximum  speed,  you  may  want  to experiment with values between 20 and 80. In my
              experience, values beyond 100 are always counterproductive. If you find  otherwise,
              please drop me a note.

       filtermaxseconds = int
              Maximum  external  filter  execution time in seconds. Default 1200 (20mn). Set to 0
              for no limit. This is mainly to avoid infinite loops in postscript files (loop.ps)

       filtermaxmbytes = int
              Maximum virtual  memory  space  for  filter  processes  (setrlimit(RLIMIT_AS)),  in
              megabytes.  Note that this includes any mapped libs (there is no reliable Linux way
              to limit the data space only), so we need to be a bit generous here. Anything  over
              2000 will be ignored on 32 bits machines.

       thrQSizes = string
              Stage  input  queues configuration. There are three internal queues in the indexing
              pipeline stages (file  data  extraction,  terms  generation,  index  update).  This
              parameter  defines  the  queue  depths  for each stage (three integer values). If a
              value of -1 is given for a given stage, no queue is used, and the thread will go on
              performing the next stage. In practise, deep queues have not been shown to increase
              performance. Default: a value of 0 for the first  queue  tells  Recoll  to  perform
              autoconfiguration  based  on the detected number of CPUs (no need for the two other
              values in this case).  Use thrQSizes = -1 -1 -1 to disable multithreading entirely.

       thrTCounts = string
              Number of threads used for each indexing stage. The three  stages  are:  file  data
              extraction,  terms  generation,  index  update).  The  use  of  the  counts is also
              controlled by some special values in thrQSizes: if the first queue depth is 0,  all
              counts  are  ignored  (autoconfigured); if a value of -1 is used for a queue depth,
              the corresponding thread count is ignored. It makes no sense to use a  value  other
              than  1 for the last stage because updating the Xapian index is necessarily single-
              threaded (and protected by a mutex).

       loglevel = int
              Log file verbosity 1-6. A value of 2 will print only errors and  warnings.  3  will
              print information like document updates, 4 is quite verbose and 6 very verbose.

       logfilename = fn
              Log file destination. Use 'stderr' (default) to write to the console.

       idxloglevel = int
              Override loglevel for the indexer.

       idxlogfilename = fn
              Override logfilename for the indexer.

       daemloglevel = int
              Override  loglevel  for  the  indexer  in real time mode. The default is to use the
              idx... values if set, else the log... values.

       daemlogfilename = fn
              Override logfilename for the indexer in real time mode. The default is to  use  the
              idx... values if set, else the log... values.

       idxrundir = dfn
              Indexing  process  current  directory. The input handlers sometimes leave temporary
              files in the current directory, so it makes sense to have recollindex chdir to some
              temporary  directory.  If the value is empty, the current directory is not changed.
              If the value is (literal) tmp, we  use  the  temporary  directory  as  set  by  the
              environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the value is an absolute path
              to a directory, we go there.

       checkneedretryindexscript = fn
              Script used to heuristically check  if  we  need  to  retry  indexing  files  which
              previously  failed.   The  default script checks the modified dates on /usr/bin and
              /usr/local/bin. A relative path will be looked up in the filters dirs, then in  the
              path. Use an absolute path to do otherwise.

       recollhelperpath = string
              Additional  places  to  search for helper executables. This is only used on Windows
              for now.

       idxabsmlen = int
              Length of abstracts we store while indexing. Recoll stores  an  abstract  for  each
              indexed  file.  The text can come from an actual 'abstract' section in the document
              or will just be the beginning of the document. It is stored in the index so that it
              can  be  displayed  inside the result lists without decoding the original file. The
              idxabsmlen parameter defines the size of the stored abstract. The default value  is
              250 bytes. The search interface gives you the choice to display this stored text or
              a synthetic abstract built by extracting text  around  the  search  terms.  If  you
              always  prefer  the synthetic abstract, you can reduce this value and save a little
              space.

       idxmetastoredlen = int
              Truncation length of stored metadata fields. This does  not  affect  indexing  (the
              whole  field  is processed anyway), just the amount of data stored in the index for
              the purpose of displaying fields inside result lists or previews. The default value
              is 150 bytes which may be too low if you have custom fields.

       aspellLanguage = string
              Language  definitions  to  use  when creating the aspell dictionary. The value must
              match a set of aspell language definition files. You can type  "aspell  dicts"   to
              see  a  list  The default if this is not set is to use the NLS environment to guess
              the value.

       aspellAddCreateParam = string
              Additional option and parameter to aspell dictionary creation command. Some  aspell
              packages  may  need  an  additional  option  (e.g.  on Debian Jessie: --local-data-
              dir=/usr/lib/aspell). See Debian bug 772415.

       aspellKeepStderr = bool
              Set this to have a look at aspell dictionary  creation  errors.  There  are  always
              many, so this is mostly for debugging.

       noaspell = bool
              Disable  aspell  use.  The  aspell  dictionary  generation  takes  time,  and  some
              combinations of aspell  version,  language,  and  local  terms,  result  in  aspell
              crashing, so it sometimes makes sense to just disable the thing.

       monauxinterval = int
              Auxiliary  database  update  interval.  The  real  time  indexer  only  updates the
              auxiliary databases (stemdb, aspell) periodically, because it would be  too  costly
              to do it for every document change. The default period is one hour.

       monixinterval = int
              Minimum interval (seconds) between processings of the indexing queue. The real time
              indexer does not  process  each  event  when  it  comes  in,  but  lets  the  queue
              accumulate,  to  diminish  overhead  and to aggregate multiple events affecting the
              same file. Default 30 S.

       mondelaypatterns = string
              Timing parameters for the real time indexing. Definitions for  files  which  get  a
              longer  delay  before  reindexing is allowed. This is for fast-changing files, that
              should only be reindexed once in a while. A list of wildcardPattern:seconds  pairs.
              The  patterns  are  matched  with  fnmatch(pattern,  path, 0) You can quote entries
              containing white space with double quotes (quote the whole entry, not the pattern).
              The default is empty.  Example: mondelaypatterns = *.log:20 "*with spaces.*:30"

       monioniceclass = int
              ionice  class  for  the  real  time  indexing  process  On  platforms where this is
              supported. The default value is 3.

       monioniceclassdata = string
              ionice class parameter for the real time indexing process. On platforms where  this
              is supported. The default is empty.

       autodiacsens = bool
              auto-trigger diacritics sensitivity (raw index only). IF the index is not stripped,
              decide if we automatically trigger diacritics sensitivity if the  search  term  has
              accented  characters  (not  in  unac_except_trans).  Else you need to use the query
              language and the "D" modifier to specify diacritics sensitivity. Default is no.

       autocasesens = bool
              auto-trigger case sensitivity (raw index only). IF the index is not  stripped  (see
              indexStripChars),  decide if we automatically trigger character case sensitivity if
              the search term has upper-case characters in any but the first position.  Else  you
              need  to  use  the  query  language  and the "C" modifier to specify character-case
              sensitivity. Default is yes.

       maxTermExpand = int
              Maximum query expansion count for a single term (e.g.: when using wildcards).  This
              only  affects  queries,  not indexing. We used to not limit this at all (except for
              filenames where the limit was too low at 1000), but it is unreasonable with  a  big
              index. Default 10000.

       maxXapianClauses = int
              Maximum  number  of  clauses  we  add  to  a single Xapian query. This only affects
              queries, not indexing.  In  some  cases,  the  result  of  term  expansion  can  be
              multiplicative, and we want to avoid eating all the memory. Default 50000.

       snippetMaxPosWalk = int
              Maximum number of positions we walk while populating a snippet for the result list.
              The  default  of  1,000,000  may  be  insufficient  for  very  big  documents,  the
              consequence would be snippets with possibly meaning-altering missing words.

       pdfocr = bool
              Attempt  OCR  of  PDF files with no text content if both tesseract and pdftoppm are
              installed. The default is off because OCR is so very slow.

       pdfattach = bool
              Enable PDF attachment  extraction  by  executing  pdftk  (if  available).  This  is
              normally  disabled,  because  it  does slow down PDF indexing a bit even if not one
              attachment is ever found.

       mhmboxquirks = string
              Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the  directory
              where the email mbox files are stored.

SEE ALSO

       recollindex(1) recoll(1)

                                         14 November 2012                          RECOLL.CONF(5)