Provided by: recollcmd_1.36.1-1build2_amd64 bug

NAME

       recoll.conf - main personal configuration file for Recoll

DESCRIPTION

       This file defines the index configuration for the Recoll full-text search system.

       The  system-wide  configuration  file  is normally located inside /usr/[local]/share/recoll/examples. Any
       parameter set in the common file may be overridden by setting it in the personal configuration  file,  by
       default: $HOME/.recoll/recoll.conf

       Please  note  while  I  try  to  keep  this manual page reasonably up to date, it will frequently lag the
       current state of the software. The best source of information about the configuration are the comments in
       the  system-wide configuration file or the user manual which you can access from the recoll GUI help menu
       or on the recoll web site.

       A short extract of the file might look as follows:

              # Space-separated list of directories to index.
              topdirs =  ~/docs /usr/share/doc

              [~/somedirectory-with-utf8-txt-files]
              defaultcharset = utf-8

       There are three kinds of lines:

              •      Comment or empty

              •      Parameter affectation

              •      Section definition

       Empty lines or lines beginning with # are ignored.

       Affectation lines are in the form 'name = value'.

       Section lines allow redefining a parameter for a directory subtree.  Some  of  the  parameters  used  for
       indexing  are  looked  up  hierarchically  from  the more to the less specific. Not all parameters can be
       meaningfully redefined, this is specified for each in the next section.

       The tilde character (~) is expanded in file names to the name of the user's home directory.

       Where values are lists, white space is used for separation, and elements  with  embedded  spaces  can  be
       quoted with double-quotes.

OPTIONS

       topdirs = string
              Space-separated  list  of files or directories to recursively index. Default to ~ (indexes $HOME).
              You can use symbolic links in the list, they will be followed, independently of the value  of  the
              followLinks variable.

       monitordirs = string
              Space-separated  list  of  files or directories to monitor for updates. When running the real-time
              indexer, this allows monitoring only a subset of the whole indexed  area.  The  elements  must  be
              included in the tree defined by the 'topdirs' members.

       skippedNames = string
              Files  and  directories  which should be ignored.  White space separated list of wildcard patterns
              (simple ones, not paths, must contain no / ), which will be  tested  against  file  and  directory
              names.  The list in the default configuration does not exclude hidden directories (names beginning
              with a dot), which means that it may index quite a few things that you do not want. On  the  other
              hand,  email  user  agents  like Thunderbird usually store messages in hidden directories, and you
              probably want this indexed. One possible solution is to  have  ".*"  in  "skippedNames",  and  add
              things like "~/.thunderbird" "~/.evolution" to "topdirs".  Not even the file names are indexed for
              patterns in this list, see the "noContentSuffixes" variable  for  an  alternative  approach  which
              indexes the file names. Can be redefined for any subtree.

       skippedNames- = string
              List of name endings to remove from the default skippedNames list.

       skippedNames+ = string
              List of name endings to add to the default skippedNames list.

       onlyNames = string
              Regular  file  name  filter  patterns  If this is set, only the file names not in skippedNames and
              matching one of the patterns will be considered for indexing. Can be redefined per  subtree.  Does
              not apply to directories.

       noContentSuffixes = string
              List  of  name  endings  (not necessarily dot-separated suffixes) for which we don't try MIME type
              identification, and don't uncompress or index content.  Only  the  names  will  be  indexed.  This
              complements  the  now obsoleted recoll_noindex list from the mimemap file, which will go away in a
              future release (the move from mimemap to recoll.conf allows editing the  list  through  the  GUI).
              This  is  different  from  skippedNames  because  these are name ending matches only (not wildcard
              patterns),  and  the  file  name  itself  gets  indexed  normally.  This  can  be  redefined   for
              subdirectories.

       noContentSuffixes- = string
              List of name endings to remove from the default noContentSuffixes list.

       noContentSuffixes+ = string
              List of name endings to add to the default noContentSuffixes list.

       skippedPaths = string
              Absolute  paths  we  should not go into. Space-separated list of wildcard expressions for absolute
              filesystem paths. Must be defined at the top level of the configuration file, not in a subsection.
              Can  contain  files and directories. The database and configuration directories will automatically
              be added. The expressions are matched  using  'fnmatch(3)'  with  the  FNM_PATHNAME  flag  set  by
              default.   This   means   that   '/'   characters   must   be  matched  explicitly.  You  can  set
              'skippedPathsFnmPathname' to 0 to disable the use of FNM_PATHNAME  (meaning  that  '/*/dir3'  will
              match  '/dir1/dir2/dir3'). The default value contains the usual mount point for removable media to
              remind you that it is a bad idea to have Recoll work on these (esp. with the monitor:  media  gets
              indexed  on  mount,  all  data  gets  erased  on  unmount).  Explicitly adding '/media/xxx' to the
              'topdirs' variable will override this.

       skippedPathsFnmPathname = bool
              Set to 0 to override use of FNM_PATHNAME for matching skipped paths.

       nowalkfn = string
              File name which will cause its parent directory to be skipped. Any  directory  containing  a  file
              with this name will be skipped as if it was part of the skippedPaths list. Ex: .recoll-noindex

       daemSkippedPaths = string
              skippedPaths  equivalent  specific  to  real  time indexing. This enables having parts of the tree
              which are initially indexed but not monitored. If daemSkippedPaths is not  set,  the  daemon  uses
              skippedPaths.

       zipUseSkippedNames = bool
              Use skippedNames inside Zip archives. Fetched directly by the rclzip.py handler. Skip the patterns
              defined  by  skippedNames  inside  Zip  archives.  Can  be  redefined  for  subdirectories.    See
              https://www.lesbonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMembers.html

       zipSkippedNames = string
              Space-separated list of wildcard expressions for names that should be ignored inside zip archives.
              This is used directly by the zip  handler.  If  zipUseSkippedNames  is  not  set,  zipSkippedNames
              defines  the  patterns  to be skipped inside archives. If zipUseSkippedNames is set, the two lists
              are    concatenated    and    used.    Can    be    redefined     for     subdirectories.      See
              https://www.lesbonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMembers.html

       followLinks = bool
              Follow  symbolic  links during indexing. The default is to ignore symbolic links to avoid multiple
              indexing of linked files. No effort is made to avoid duplication when this option is set to  true.
              This  option  can  be set individually for each of the 'topdirs' members by using sections. It can
              not be changed below the 'topdirs' level. Links in the 'topdirs' list itself are always followed.

       indexedmimetypes = string
              Restrictive list of indexed mime types. Normally not set (in which case all  supported  types  are
              indexed).  If  it is set, only the types from the list will have their contents indexed. The names
              will be indexed anyway if indexallfilenames is set (default). MIME type names should be taken from
              the  mimemap file (the values may be different from xdg-mime or file -i output in some cases). Can
              be redefined for subtrees.

       excludedmimetypes = string
              List of excluded MIME types. Lets you exclude some types from indexing. MIME type names should  be
              taken  from  the mimemap file (the values may be different from xdg-mime or file -i output in some
              cases) Can be redefined for subtrees.

       nomd5types = string
              Don't compute md5 for these types. md5 checksums are used only for deduplicating results, and  can
              be  very  expensive  to  compute on multimedia or other big files. This list lets you turn off md5
              computation for selected types. It is global (no redefinition for subtrees).  At  the  moment,  it
              only  has  an  effect  for  external handlers (exec and execm). The file types can be specified by
              listing either MIME types (e.g. audio/mpeg) or handler names (e.g. rclaudio.py).

       compressedfilemaxkbs = int
              Size limit for compressed files. We  need  to  decompress  these  in  a  temporary  directory  for
              identification,  which  can be wasteful in some cases. Limit the waste. Negative means no limit. 0
              results in no processing of any compressed file. Default 50 MB.

       textfilemaxmbs = int
              Size limit for text files. Mostly for skipping monster logs. Default 20 MB.

       indexallfilenames = bool
              Index the file names of unprocessed files Index the names of files the contents of which we  don't
              index because of an excluded or unsupported MIME type.

       usesystemfilecommand = bool
              Use  a system command for file MIME type guessing as a final step in file type identification This
              is generally useful, but will  usually  cause  the  indexing  of  many  bogus  'text'  files.  See
              'systemfilecommand' for the command used.

       systemfilecommand = string
              Command  used  to  guess  MIME  types  if  the  internal  methods fails This should be a "file -i"
              workalike.  The file path will be added as a last parameter to the command line. "xdg-mime"  works
              better  than  the traditional "file" command, and is now the configured default (with a hard-coded
              fallback to "file")

       processwebqueue = bool
              Decide if we process the Web queue. The queue is a directory where the Recoll Web browser  plugins
              create the copies of visited pages.

       textfilepagekbs = int
              Page  size  for  text  files.  If  this is set, text/plain files will be divided into documents of
              approximately this size. Will reduce memory usage at index time and help with loading data in  the
              preview  window  at  query  time.  Particularly useful with very big files, such as application or
              system logs. Also see textfilemaxmbs and compressedfilemaxkbs.

       membermaxkbs = int
              Size  limit  for  archive  members.  This  is  passed  to  the  filters  in  the  environment   as
              RECOLL_FILTER_MAXMEMBERKB.

       indexStripChars = bool
              Decide  if  we  store  character case and diacritics in the index. If we do, searches sensitive to
              case and diacritics can be performed, but the index will be bigger, and  some  marginal  weirdness
              may  sometimes  occur.  The default is a stripped index. When using multiple indexes for a search,
              this parameter must be defined identically for all. Changing the value implies an index reset.

       indexStoreDocText = bool
              Decide if we store the documents' text content in the index. Storing the  text  allows  extracting
              snippets  from  it at query time, instead of building them from index position data.  Newer Xapian
              index formats have rendered our use of positions list unacceptably slow in some  cases.  The  last
              Xapian  index  format with good performance for the old method is Chert, which is default for 1.2,
              still supported but not default in 1.4 and will be dropped in 1.6.  The stored  document  text  is
              translated  from  its  original  format  to  UTF-8  plain  text,  but  not stripped of upper-case,
              diacritics, or punctuation signs. Storing it increases the index size  by  10-20%  typically,  but
              also  allows  for  nicer  snippets, so it may be worth enabling it even if not strictly needed for
              performance if you can afford the space.  The variable only has an effect when creating an  index,
              meaning  that  the  xapiandb  directory must not exist yet. Its exact effect depends on the Xapian
              version.  For Xapian 1.4, if the variable is set to 0, the Chert format will be used, and the text
              will  not  be  stored.  If the variable is 1, Glass will be used, and the text stored.  For Xapian
              1.2, and for versions after 1.5 and newer, the  index  format  is  always  the  default,  but  the
              variable  controls  if  the text is stored or not, and the abstract generation method. With Xapian
              1.5 and later, and the variable set to 0, abstract generation may be very slow, but  this  setting
              may still be useful to save space if you do not use abstract generation at all.

       nonumbers = bool
              Decides if terms will be generated for numbers. For example "123", "1.5e6", 192.168.1.4, would not
              be indexed if nonumbers is set ("value123" would still be). Numbers are often quite interesting to
              search  for,  and  this  should  probably not be set except for special situations, ie, scientific
              documents with huge amounts of numbers in them, where setting  nonumbers  will  reduce  the  index
              size. This can only be set for a whole index, not for a subtree.

       dehyphenate = bool
              Determines if we index 'coworker' also when the input is 'co-worker'. This is new in version 1.22,
              and on by default. Setting the variable to off allows restoring the previous behaviour.

       backslashasletter = bool
              Process backslash as normal letter. This may make sense for people wanting to index  TeX  commands
              as such but is not of much general use.

       underscoreasletter = bool
              Process  underscore  as  normal  letter.  This makes sense in so many cases that one wonders if it
              should not be the default.

       maxtermlength = int
              Maximum term length. Words longer than this will be discarded.  The default is 40 and used  to  be
              hard-coded, but it can now be adjusted. You need an index reset if you change the value.

       nocjk = bool
              Decides  if specific East Asian (Chinese Korean Japanese) characters/word splitting is turned off.
              This will save a small amount of CPU if you have no CJK documents.  If  your  document  base  does
              include  such  text but you are not interested in searching it, setting nocjk may be a significant
              time and space saver.

       cjkngramlen = int
              This lets you adjust the size of n-grams used for indexing CJK text. The default  value  of  2  is
              probably  appropriate  in  most  cases.  A value of 3 would allow more precision and efficiency on
              longer words, but the index will be approximately twice as large.

       indexstemminglanguages = string
              Languages for which to create stemming expansion data. Stemmer names can  be  found  by  executing
              'recollindex  -l',  or  this  can also be set from a list in the GUI. The values are full language
              names, e.g. english, french...

       defaultcharset = string
              Default character set. This is used for files which do not  contain  a  character  set  definition
              (e.g.:  text/plain).  Values  found  inside  files,  e.g.  a 'charset' tag in HTML documents, will
              override it. If this is not set, the  default  character  set  is  the  one  defined  by  the  NLS
              environment  ($LC_ALL, $LC_CTYPE, $LANG), or ultimately iso-8859-1 (cp-1252 in fact).  If for some
              reason you want a general default which does not match your LANG  and  is  not  8859-1,  use  this
              variable. This can be redefined for any sub-directory.

       unac_except_trans = string
              A  list of characters, encoded in UTF-8, which should be handled specially when converting text to
              unaccented lowercase. For example, in Swedish, the letter  a  with  diaeresis  has  full  alphabet
              citizenship  and should not be turned into an a.  Each element in the space-separated list has the
              special character as first element and  the  translation  following.  The  handling  of  both  the
              lowercase  and upper-case versions of a character should be specified, as appartenance to the list
              will turn-off both standard accent and case processing. The  value  is  global  and  affects  both
              indexing and querying.  Examples:
              Swedish:
              unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå
              German:
              unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl
              French: you probably want to decompose oe and ae and nobody would type a German ß
              unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
              The  default  for  all  until  someone protests follows. These decompositions are not performed by
              unac, but it is unlikely that someone would type the composed forms in a search.
              unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl

       maildefcharset = string
              Overrides the default character set for email messages which don't specify  one.  This  is  mainly
              useful for readpst (libpst) dumps, which are utf-8 but do not say so.

       localfields = string
              Set fields on all files (usually of a specific fs area). Syntax is the usual: name = value ; attr1
              = val1 ; [...]  value is empty so this needs an initial semi-colon.  This  is  useful,  e.g.,  for
              setting the rclaptg field for application selection inside mimeview.

       testmodifusemtime = bool
              Use  mtime  instead  of ctime to test if a file has been modified. The time is used in addition to
              the size, which is always used.  Setting this can reduce re-indexing  on  systems  where  extended
              attributes  are  used  (by  some  other  application),  but not indexed, because changing extended
              attributes only affects ctime.  Notes: - This may prevent detection of  change  in  some  marginal
              file  rename cases (the target would need to have the same size and mtime).  - You should probably
              also set noxattrfields to 1 in this case, except if you still prefer to  perform  xattr  indexing,
              for example if the local file update pattern makes it of value (as in general, there is a risk for
              pure extended attributes updates without file modification to go undetected). Perform a full index
              reset after changing this.

       noxattrfields = bool
              Disable  extended  attributes  conversion  to  metadata  fields.  This probably needs to be set if
              testmodifusemtime is set.

       metadatacmds = string
              Define commands to gather external metadata, e.g.  tmsu  tags.   There  can  be  several  entries,
              separated  by  semi-colons,  each  defining which field name the data goes into and the command to
              use. Don't forget the initial semi-colon. All the field names  must  be  different.  You  can  use
              aliases  in  the "field" file if necessary.  As a not too pretty hack conceded to convenience, any
              field name beginning with "rclmulti" will be taken as  an  indication  that  the  command  returns
              multiple  field  values  inside a text blob formatted as a recoll configuration file ("fieldname =
              fieldvalue" lines). The rclmultixx name will be ignored, and field names and values will be parsed
              from the data.  Example: metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf %f

       cachedir = dfn
              Top  directory  for  Recoll  data.  Recoll  data  directories are normally located relative to the
              configuration directory (e.g. ~/.recoll/xapiandb, ~/.recoll/mboxcache). If 'cachedir' is set,  the
              directories are stored under the specified value instead (e.g. if cachedir is ~/.cache/recoll, the
              default dbdir would be ~/.cache/recoll/xapiandb).  This affects dbdir, webcachedir,  mboxcachedir,
              aspellDicDir,  which  can  still be individually specified to override cachedir.  Note that if you
              have multiple configurations,  each  must  have  a  different  cachedir,  there  is  no  automatic
              computation of a subpath under cachedir.

       maxfsoccuppc = int
              Maximum  file  system  occupation  over  which  we  stop  indexing.  The  value  is  a percentage,
              corresponding to what the "Capacity" df output column shows. The default value is  0,  meaning  no
              checking.

       dbdir = dfn
              Xapian database directory location. This will be created on first indexing. If the value is not an
              absolute path, it will be interpreted as  relative  to  cachedir  if  set,  or  the  configuration
              directory  (-c  argument  or  $RECOLL_CONFDIR).   If  nothing  is  specified,  the default is then
              ~/.recoll/xapiandb/

       idxstatusfile = fn
              Name of the scratch file where the indexer process  updates  its  status.  Default:  idxstatus.txt
              inside the configuration directory.

       mboxcachedir = dfn
              Directory  location  for  storing  mbox  message offsets cache files. This is normally 'mboxcache'
              under cachedir if set, or else under the configuration directory, but it may be useful to share  a
              directory between different configurations.

       mboxcacheminmbs = int
              Minimum  mbox  file  size  over  which  we  cache the offsets. There is really no sense in caching
              offsets for small files. The default is 5 MB.

       mboxmaxmsgmbs = int
              Maximum mbox member message size in megabytes. Size over which we assume that the mbox  format  is
              bad or we misinterpreted it, at which point we just stop processing the file.

       webcachedir = dfn
              Directory  where  we  store  the archived web pages. This is only used by the web history indexing
              code Default: cachedir/webcache if cachedir is set, else $RECOLL_CONFDIR/webcache

       webcachemaxmbs = int
              Maximum size in MB of the Web archive. This is  only  used  by  the  web  history  indexing  code.
              Default: 40 MB.  Reducing the size will not physically truncate the file.

       webqueuedir = fn
              The  path  to  the  Web  indexing  queue.  This  used  to  be  hard-coded  in  the  old  plugin as
              ~/.recollweb/ToIndex so there would be no need or possibility to change it, but the  WebExtensions
              plugin  now  downloads  the  files  to  the  user  Downloads directory, and a script moves them to
              webqueuedir. The script reads this value from the config so it has become possible to change it.

       webdownloadsdir = fn
              The path to browser downloads directory. This is where the new browser  add-on  extension  has  to
              create the files. They are then moved by a script to webqueuedir.

       aspellDicDir = dfn
              Aspell  dictionary  storage  directory  location.  The  aspell  dictionary (aspdict.(lang).rws) is
              normally stored in the directory  specified  by  cachedir  if  set,  or  under  the  configuration
              directory.

       filtersdir = dfn
              Directory  location for executable input handlers. If RECOLL_FILTERSDIR is set in the environment,
              we use it instead. Defaults to $prefix/share/recoll/filters. Can be redefined for subdirectories.

       iconsdir = dfn
              Directory location for icons. The only reason to change this would be if you want  to  change  the
              icons displayed in the result list. Defaults to $prefix/share/recoll/images

       idxflushmb = int
              Threshold  (megabytes  of  new data) where we flush from memory to disk index. Setting this allows
              some control over memory usage by the indexer process. A value of 0 means  no  explicit  flushing,
              which  lets Xapian perform its own thing, meaning flushing every $XAPIAN_FLUSH_THRESHOLD documents
              created, modified or deleted: as memory usage depends on average document size, not only  document
              count,  the  Xapian  approach is is not very useful, and you should let Recoll manage the flushes.
              The program compiled value is 0. The configured default value (from this file) is now 50  MB,  and
              should  be  ok  in  many  cases.   You  can set it as low as 10 to conserve memory, but if you are
              looking for maximum speed, you may want to experiment with  values  between  20  and  200.  In  my
              experience, values beyond this are always counterproductive. If you find otherwise, please drop me
              a note.

       filtermaxseconds = int
              Maximum external filter execution time in seconds. Default 1200 (20mn). Set to  0  for  no  limit.
              This is mainly to avoid infinite loops in postscript files (loop.ps)

       filtermaxmbytes = int
              Maximum  virtual memory space for filter processes (setrlimit(RLIMIT_AS)), in megabytes. Note that
              this includes any mapped libs (there is no reliable Linux way to limit the data space only), so we
              need  to  be  a  bit  generous  here.  Anything over 2000 will be ignored on 32 bits machines. The
              previous default value of 2000 would  prevent  java  pdftk  to  work  when  executed  from  Python
              rclpdf.py.

       thrQSizes = string
              Stage  input queues configuration. There are three internal queues in the indexing pipeline stages
              (file data extraction, terms generation, index update). This parameter defines  the  queue  depths
              for  each  stage  (three integer values). If a value of -1 is given for a given stage, no queue is
              used, and the thread will go on performing the next stage. In practise, deep queues have not  been
              shown  to  increase performance. Default: a value of 0 for the first queue tells Recoll to perform
              autoconfiguration based on the detected number of CPUs (no need for the two other values  in  this
              case).  Use thrQSizes = -1 -1 -1 to disable multithreading entirely.

       thrTCounts = string
              Number  of threads used for each indexing stage. The three stages are: file data extraction, terms
              generation, index update). The use of the counts is also controlled  by  some  special  values  in
              thrQSizes:  if  the first queue depth is 0, all counts are ignored (autoconfigured); if a value of
              -1 is used for a queue depth, the corresponding thread count is ignored. It makes no sense to  use
              a  value  other than 1 for the last stage because updating the Xapian index is necessarily single-
              threaded (and protected by a mutex).

       loglevel = int
              Log file verbosity 1-6. A value of 2 will print only errors and warnings. 3 will print information
              like document updates, 4 is quite verbose and 6 very verbose.

       logfilename = fn
              Log file destination. Use 'stderr' (default) to write to the console.

       idxloglevel = int
              Override loglevel for the indexer.

       idxlogfilename = fn
              Override logfilename for the indexer.

       daemloglevel = int
              Override  loglevel  for  the indexer in real time mode. The default is to use the idx... values if
              set, else the log... values.

       daemlogfilename = fn
              Override logfilename for the indexer in real time mode. The default is to use the idx... values if
              set, else the log... values.

       pyloglevel = int
              Override loglevel for the python module.

       pylogfilename = fn
              Override logfilename for the python module.

       orgidxconfdir = dfn
              Original  location  of the configuration directory. This is used exclusively for movable datasets.
              Locating the configuration directory inside the  directory  tree  makes  it  possible  to  provide
              automatic  query  time  path translations once the data set has moved (for example, because it has
              been mounted on another location).

       curidxconfdir = dfn
              Current location of the configuration directory. Complement orgidxconfdir  for  movable  datasets.
              This  should  be  used  if the configuration directory has been copied from the dataset to another
              location, either because the dataset is readonly and an r/w copy is desired,  or  for  performance
              reasons.  This  records  the  original  moved  location  before  copy,  to  allow path translation
              computations.  For example if a dataset originally indexed as  '/home/me/mydata/config'  has  been
              mounted  to  '/media/me/mydata', and the GUI is running from a copied configuration, orgidxconfdir
              would be '/home/me/mydata/config', and curidxconfdir (as set in the copied configuration) would be
              '/media/me/mydata/config'.

       idxrundir = dfn
              Indexing  process  current  directory.  The  input handlers sometimes leave temporary files in the
              current directory, so it makes sense to have recollindex chdir to some temporary directory. If the
              value  is  empty,  the current directory is not changed. If the value is (literal) tmp, we use the
              temporary directory as set by the environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the  value
              is an absolute path to a directory, we go there.

       checkneedretryindexscript = fn
              Script  used  to  heuristically  check if we need to retry indexing files which previously failed.
              The default script checks the modified dates on /usr/bin and /usr/local/bin. A relative path  will
              be looked up in the filters dirs, then in the path. Use an absolute path to do otherwise.

       recollhelperpath = string
              Additional places to search for helper executables. This is only used on Windows for now.

       idxabsmlen = int
              Length of abstracts we store while indexing. Recoll stores an abstract for each indexed file.  The
              text can come from an actual 'abstract' section in the document or will just be the  beginning  of
              the  document.  It  is  stored  in  the  index so that it can be displayed inside the result lists
              without decoding the original file. The idxabsmlen  parameter  defines  the  size  of  the  stored
              abstract.  The  default  value  is 250 bytes. The search interface gives you the choice to display
              this stored text or a synthetic abstract built by extracting text around the search terms. If  you
              always prefer the synthetic abstract, you can reduce this value and save a little space.

       idxmetastoredlen = int
              Truncation  length  of  stored  metadata fields. This does not affect indexing (the whole field is
              processed anyway), just the amount of data stored in the  index  for  the  purpose  of  displaying
              fields inside result lists or previews. The default value is 150 bytes which may be too low if you
              have custom fields.

       idxtexttruncatelen = int
              Truncation length for all document texts. Only index the  beginning  of  documents.  This  is  not
              recommended  except  if  you are sure that the interesting keywords are at the top and have severe
              disk space issues.

       aspellLanguage = string
              Language definitions to use when creating the aspell dictionary. The value must  match  a  set  of
              aspell language definition files. You can type "aspell dicts" to see a list The default if this is
              not set is to use the NLS environment to guess the value. The values  are  the  2-letter  language
              codes (e.g. 'en', 'fr'...)

       aspellAddCreateParam = string
              Additional  option  and  parameter to aspell dictionary creation command. Some aspell packages may
              need an additional option (e.g. on Debian Jessie:  --local-data-dir=/usr/lib/aspell).  See  Debian
              bug 772415.

       aspellKeepStderr = bool
              Set  this  to  have a look at aspell dictionary creation errors. There are always many, so this is
              mostly for debugging.

       noaspell = bool
              Disable aspell use. The aspell dictionary generation takes time, and some combinations  of  aspell
              version, language, and local terms, result in aspell crashing, so it sometimes makes sense to just
              disable the thing.

       monauxinterval = int
              Auxiliary database update interval. The real time indexer only  updates  the  auxiliary  databases
              (stemdb,  aspell) periodically, because it would be too costly to do it for every document change.
              The default period is one hour.

       monixinterval = int
              Minimum interval (seconds) between processings of the indexing queue. The real time  indexer  does
              not  process  each event when it comes in, but lets the queue accumulate, to diminish overhead and
              to aggregate multiple events affecting the same file. Default 30 S.

       mondelaypatterns = string
              Timing parameters for the real time indexing. Definitions for  files  which  get  a  longer  delay
              before  reindexing is allowed. This is for fast-changing files, that should only be reindexed once
              in  a  while.  A  list  of  wildcardPattern:seconds  pairs.  The   patterns   are   matched   with
              fnmatch(pattern,  path,  0) You can quote entries containing white space with double quotes (quote
              the whole entry, not the pattern). The default is empty.   Example:  mondelaypatterns  =  *.log:20
              "*with spaces.*:30"

       idxniceprio = int
              "nice"  process  priority  for  the indexing processes. Default: 19 (lowest) Appeared with 1.26.5.
              Prior versions were fixed at 19.

       monioniceclass = int
              ionice class for the indexing process. Despite the misleading name, and on platforms where this is
              supported,  this  affects  all  indexing  processes,  not  only the real time/monitoring ones. The
              default value is 3 (use lowest "Idle" priority).

       monioniceclassdata = string
              ionice class level parameter if the class supports it. The default is empty, as the default "Idle"
              class has no levels.

       autodiacsens = bool
              auto-trigger  diacritics  sensitivity (raw index only). IF the index is not stripped, decide if we
              automatically trigger diacritics sensitivity if the search term has accented  characters  (not  in
              unac_except_trans).  Else  you  need  to  use  the  query language and the "D" modifier to specify
              diacritics sensitivity. Default is no.

       autocasesens = bool
              auto-trigger  case  sensitivity  (raw  index  only).  IF  the   index   is   not   stripped   (see
              indexStripChars), decide if we automatically trigger character case sensitivity if the search term
              has upper-case characters in any but the first position. Else you need to use the  query  language
              and the "C" modifier to specify character-case sensitivity. Default is yes.

       maxTermExpand = int
              Maximum  query  expansion  count for a single term (e.g.: when using wildcards). This only affects
              queries, not indexing. We used to not limit this at all (except for filenames where the limit  was
              too low at 1000), but it is unreasonable with a big index. Default 10000.

       maxXapianClauses = int
              Maximum  number  of  clauses  we  add  to  a  single  Xapian query. This only affects queries, not
              indexing. In some cases, the result of term expansion can be multiplicative, and we want to  avoid
              eating all the memory. Default 50000.

       snippetMaxPosWalk = int
              Maximum number of positions we walk while populating a snippet for the result list. The default of
              1,000,000 may be insufficient for very big documents,  the  consequence  would  be  snippets  with
              possibly meaning-altering missing words.

       pdfocr = bool
              Attempt  OCR of PDF files with no text content. This can be defined in subdirectories. The default
              is off because OCR is so very slow.

       pdfattach = bool
              Enable PDF attachment extraction by executing pdftk (if available).  This  is  normally  disabled,
              because it does slow down PDF indexing a bit even if not one attachment is ever found.

       pdfextrameta = string
              Extract  text from selected XMP metadata tags. This is a space-separated list of qualified XMP tag
              names. Each element can also include a translation to a Recoll field  name,  separated  by  a  '|'
              character.  If  the  second element is absent, the tag name is used as the Recoll field names. You
              will also need to add specifications to the "fields" file to direct processing  of  the  extracted
              data.

       pdfextrametafix = fn
              Define  name  of  XMP  field  editing  script.  This defines the name of a script to be loaded for
              editing XMP field values. The script should define a 'MetaFixer' class  with  a  metafix()  method
              which  will be called with the qualified tag name and value of each selected field, for editing or
              erasing. A new instance is created for each document, so that the object can keep state for,  e.g.
              eliminating duplicate values.

       ocrprogs = string
              OCR modules to try. The top OCR script will try to load the corresponding modules in order and use
              the first which reports being capable of performing OCR on the input file. Modules  for  tesseract
              (tesseract)   and  ABBYY  FineReader  (abbyy)  are  present  in  the  standard  distribution.  For
              compatibility with the previous version, if this is not defined  at  all,  the  default  value  is
              "tesseract".  Use  an  explicit  empty  value  if  needed.  A  value of "abbyy tesseract" will try
              everything.

       ocrcachedir = dfn
              Location for caching OCR data. The default if this is empty or undefined is to  store  the  cached
              OCR data under $RECOLL_CONFDIR/ocrcache.

       tesseractlang = string
              Language  to  assume for tesseract OCR. Important for improving the OCR accuracy. This can also be
              set  through  the  contents  of  a  file  in  the   currently   processed   directory.   See   the
              rclocrtesseract.py script. Example values: eng, fra... See the tesseract documentation.

       tesseractcmd = fn
              Path  for the tesseract command. Do not quote. This is mostly useful on Windows, or for specifying
              a     non-default     tesseract     command.     E.g.     on     Windows.      tesseractcmd      =
              C:/Program Files (x86)/Tesseract-OCR/tesseract.exe

       abbyylang = string
              Language  to  assume for abbyy OCR. Important for improving the OCR accuracy. This can also be set
              through the contents of a file in  the  currently  processed  directory.  See  the  rclocrabbyy.py
              script. Typical values: English, French... See the ABBYY documentation.

       abbyycmd = fn
              Path for the abbyy command The ABBY directory is usually not in the path, so you should set this.

       mhmboxquirks = string
              Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory where the email
              mbox files are stored.

SEE ALSO

       recollindex(1) recoll(1)

                                                14 November 2012                                  RECOLL.CONF(5)