Provided by: recoll_1.21.5-1_amd64 bug

NAME

       recoll.conf - main personal configuration file for Recoll

DESCRIPTION

       This file defines the index configuration for the Recoll full-text search system.

       The      system-wide     configuration     file     is     normally     located     inside
       /usr/[local]/share/recoll/examples. Any parameter set in the common file may be overridden
       by setting it in the personal configuration file, by default: $HOME/.recoll/recoll.conf

       Please  note  while  we  try  to  keep  this  manual  page  reasonably up to date, it will
       frequently lag the current state of the software. The best source of information about the
       configuration are the comments in the system-wide configuration file.

       A short extract of the file might look as follows:

              # Space-separated list of directories to index.
              topdirs =  ~/docs /usr/share/doc

              [~/somedirectory-with-utf8-txt-files]
              defaultcharset = utf-8

       There are three kinds of lines:

              •      Comment or empty

              •      Parameter affectation

              •      Section definition

       Empty lines or lines beginning with # are ignored.

       Affectation lines are in the form 'name = value'.

       Section lines allow redefining a parameter for a directory subtree. Some of the parameters
       used for indexing are looked up hierarchically from the more to the less specific. Not all
       parameters can be meaningfully redefined, this is specified for each in the next section.

       The  tilde  character  (~)  is  expanded  in  file  names  to  the name of the user's home
       directory.

       Where values are lists, white space is used for separation,  and  elements  with  embedded
       spaces can be quoted with double-quotes.

OPTIONS

       topdirs = directories
              Specifies the list of directories to index (recursively).

       skippedNames = patterns
              A space-separated list of patterns for names of files or directories that should be
              completely ignored. The list defined in the default file is:

              *~ #* bin CVS  Cache caughtspam  tmp

              The list can be redefined for subdirectories, but is only actually changed for  the
              top level ones in topdirs

       skippedPaths = patterns
              A  space-separated  list of patterns for paths the indexer should not descend into.
              Together with topdirs, this allows pruning  the  indexed  tree  to  one's  content.
              daemSkippedPaths  can be used to define a specific value for the real time indexing
              monitor.

       skippedPathsFnmPathname = 0/1
              The values in the *skippedPaths variables are matched by default  with  fnmatch(3),
              with  the  FNM_PATHNAME  and  FNM_LEADING_DIR flags. This means that '/' characters
              must be matched explicitly. You can set skippedPathsFnmPathname to 0 to disable the
              use of FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3).

       followLinks = boolean
              Specifies  if the indexer should follow symbolic links while walking the file tree.
              The default is to ignore symbolic links to avoid multiple indexing of linked files.
              No effort is made to avoid duplication when this option is set to true. This option
              can be set individually for each of the topdirs members by using sections.  It  can
              not be changed below the topdirs level.

       indexedmimetypes = list
              Recoll  normally  indexes  any  file which it knows how to read. This list lets you
              restrict the indexed mime types to what you specify. If the variable is unspecified
              or the list empty (the default), all supported types are processed.

       compressedfilemaxkbs = value
              Size  limit  for compressed (.gz or .bz2) files. These need to be decompressed in a
              temporary  directory  for  identification,  which   can   be   very   wasteful   if
              'uninteresting' big compressed files are present.  Negative means no limit, 0 means
              no processing of any compressed file. Defaults to -1.

       textfilemaxmbs = value
              Maximum size for text files. Very big text files are often uninteresting logs.  Set
              to -1 to disable (default 20MB).

       textfilepagekbs = value
              If  this  is set to other than -1, text files will be indexed as multiple documents
              of the given page size. This may be useful if you do want to index  very  big  text
              files  as it will both reduce memory usage at index time and help with loading data
              to the preview window. A size of a few megabytes would  seem  reasonable  (default:
              1000 : 1MB).

       membermaxkbs = value in kilobytes
              This  defines  the  maximum  size  for  an  archive  member (zip, tar or rar at the
              moment). Bigger entries will be skipped. Current default: 50000 (50 MB).

       indexallfilenames = boolean
              Recoll indexes file names into a special section of the database to allow  specific
              file  names searches using wild cards. This parameter decides if file name indexing
              is performed only for files with mime types that would qualify them for  full  text
              indexing, or for all files inside the selected subtrees, independent of mime type.

       usesystemfilecommand = boolean
              Decide  if  we  use  the file -i system command as a final step for determining the
              mime type for a file (the main procedure uses suffix associations as defined in the
              mimemap file). This can be useful for files with suffixless names, but it will also
              cause the indexing of many bogus "text" files.

       processbeaglequeue = 0/1
              If this is set, process the directory where Beagle Web browser plugins copy visited
              pages  for indexing. Of course, Beagle MUST NOT be running, else things will behave
              strangely.

       beaglequeuedir = directorypath
              The path to the Beagle indexing queue. This is hard-coded in the Beagle  plugin  as
              ~/.beagle/ToIndex so there should be no need to change it.

       indexStripChars = 0/1
              Decide  if  we strip characters of diacritics and convert them to lower-case before
              terms are indexed. If we don't, searches sensitive to case and  diacritics  can  be
              performed,  but the index will be bigger, and some marginal weirdness may sometimes
              occur. The default is a stripped index (indexStripChars = 1) for  now.  When  using
              multiple  indexes for a search, this parameter must be defined identically for all.
              Changing the value implies an index reset.

       maxTermExpand = value
              Maximum expansion count for a single term (e.g.: when using wildcards). The default
              of  10000  is reasonable and will avoid queries that appear frozen while the engine
              is walking the term list.

       maxXapianClauses = value
              Maximum number of elementary clauses we can add to a single Xapian query.  In  some
              cases,  the  result  of  term expansion can be multiplicative, and we want to avoid
              using excessive memory. The default of 100 000 should be both high enough  in  most
              cases and compatible with current typical hardware configurations.

       nonumbers = 0/1
              If  this  set  to  true, no terms will be generated for numbers. For example "123",
              "1.5e6", 192.168.1.4, would not be indexed ("value123" would still be). Numbers are
              often  quite  interesting to search for, and this should probably not be set except
              for special situations, ie, scientific documents with huge amounts  of  numbers  in
              them. This can only be set for a whole index, not for a subtree.

       nocjk = boolean
              If  this set to true, specific east asian (Chinese Korean Japanese) characters/word
              splitting is turned off. This will save a small amount of cpu if you  have  no  CJK
              documents.  If your document base does include such text but you are not interested
              in searching it, setting nocjk may be a significant time and space saver.

       cjkngramlen = value
              This lets you adjust the size of n-grams used for indexing CJK  text.  The  default
              value  of  2  is  probably appropriate in most cases. A value of 3 would allow more
              precision and efficiency on longer words, but the index will be approximately twice
              as large.

       indexstemminglanguages = languages
              A  list  of  languages  for  which  the stem expansion databases will be built. See
              recollindex(1) for possible values.

       defaultcharset = charset
              The name of the character set used for files that do not contain  a  character  set
              definition (ie: plain text files). This can be redefined for any subdirectory.

       unac_except_trans = list of utf-8 groups
              This  is  a list of characters, encoded in UTF-8, which should be handled specially
              when converting text to unaccented lowercase. For example, in Swedish,  the  letter
              "a  with  diaeresis" has full alphabet citizenship and should not be turned into an
              a.
              Each element in the space-separated list has the special character as first element
              and  the  translation  following. The handling of both the lowercase and upper-case
              versions of a character should be specified, as appartenance to the list will turn-
              off both standard accent and case processing.
              Note that the translation is not limited to a single character.
              This  parameter cannot be redefined for subdirectories, it is global, because there
              is no way to do otherwise when querying. If you have document sets which would need
              different values, you will have to index and query them separately.

       maildefcharset = charactersetname
              This  can  be  used  to  define  the  default  character set specifically for email
              messages which don't specify it. This is mainly useful for readpst (libpst)  dumps,
              which are utf-8 but do not say so.

       localfields = fieldname = value:...
              This allows setting fields for all documents under a given directory. Typical usage
              would be to set an "rclaptg" field, to be used in mimeview  to  select  a  specific
              viewer.  If  several  fields  are  to be set, they should be separated with a colon
              (':') character (which there is currently  no  way  to  escape).  Ie:  localfields=
              rclaptg=gnus:other  =  val,  then  select specifier viewer with mimetype|tag=... in
              mimeview.

       dbdir = directory
              The name of the Xapian database directory. It will be created if  needed  when  the
              database  is  initialized.  If  this  is not an absolute pathname, it will be taken
              relative to the configuration directory.

       idxstatusfile = file path
              The name of the scratch file where the indexer process updates its status. Default:
              idxstatus.txt inside the configuration directory.

       maxfsoccuppc = percentnumber
              Maximum  file system occupation before we stop indexing. The value is a percentage,
              corresponding to what the "Capacity" df output column shows.  The default value  is
              0, meaning no checking.

       mboxcachedir = directory path
              The  directory  where  mbox  message offsets cache files are held. This is normally
              $RECOLL_CONFDIR/mboxcache, but it may  be  useful  to  share  a  directory  between
              different configurations.

       mboxcacheminmbs = value in megabytes
              The  minimum  mbox  file  size  over which we cache the offsets. There is really no
              sense in caching offsets for small files. The default is 5 MB.

       webcachedir = directory path
              This is only used by the Beagle web browser plugin indexing code, and defines where
              the cache for visited pages will live. Default: $RECOLL_CONFDIR/webcache

       webcachemaxmbs = value in megabytes
              This  is  only used by the Beagle web browser plugin indexing code, and defines the
              maximum size for the web page cache. Default: 40 MB.

       idxflushmb = megabytes
              Threshold (megabytes of new text data) where we flush from memory  to  disk  index.
              Setting  this  can  help  control  memory  usage.  A  value  of 0 means no explicit
              flushing, letting Xapian use  its  own  default,  which  is  flushing  every  10000
              documents (or XAPIAN_FLUSH_THRESHOLD), meaning that memory usage depends on average
              document size. The default value is 10.

       autodiacsens = 0/1
              IF the index is  not  stripped,  decide  if  we  automatically  trigger  diacritics
              sensitivity  if the search term has accented characters (not in unac_except_trans).
              Else you need to use the query language and the D modifier  to  specify  diacritics
              sensitivity. Default is no.

       autocasesens = 0/1
              IF  the  index  is  not stripped, decide if we automatically trigger character case
              sensitivity if the search term has upper-case  characters  in  any  but  the  first
              position.  Else  you  need  to use the query language and the C modifier to specify
              character-case sensitivity. Default is yes.

       loglevel = value
              Verbosity level for recoll and recollindex. A value of  4  lists  quite  a  lot  of
              debug/information  messages.  3  lists  only  errors.   daemloglevel can be used to
              specify a different value for the real-time indexing daemon.

       logfilename = file
              Where  should  the  messages  go.  'stderr'  can  be  used  as  a  special   value.
              daemlogfilename can be used to specify a different value for the real-time indexing
              daemon.

       mondelaypatterns = list of patterns
              This allows specify wildcard path patterns (processed with fnmatch(3) with 0 flag),
              to  match  files  which  change  too often and for which a delay should be observed
              before re-indexing. This is a space-separated list, each entry being a pattern  and
              a  time in seconds, separated by a colon. You can use double quotes if a path entry
              contains white space. Example:

              mondelaypatterns = *.log:20 "this one has spaces*:10"

       monixinterval = value in seconds
              Minimum interval (seconds) for processing the indexing queue. The real time monitor
              does not process each event when it comes in, but will wait this time for the queue
              to accumulate to diminish overhead and in order to aggregate multiple events to the
              same file. Default 30 S.

       monauxinterval = value in seconds
              Period  (in  seconds)  at which the real time monitor will regenerate the auxiliary
              databases (spelling, stemming) if needed. The default is one hour.

       monioniceclass, monioniceclassdata
              These allow defining the ionice class and data used by the indexer  (default  class
              3, no data).

       filtermaxseconds = value in seconds
              Maximum  filter execution time, after which it is aborted. Some postscript programs
              just loop...

       filtersdir = directory
              A directory to search for the external filter scripts used to index some  types  of
              files.  The  value  should  not be changed, except if you want to modify one of the
              default scripts. The value can be redefined for any subdirectory.

       iconsdir = directory
              The name of the directory where recoll result list icons are stored. You can change
              this if you want different images.

       idxabsmlen = value
              Recoll  stores  an abstract for each indexed file inside the database. The text can
              come from an actual 'abstract'  section  in  the  document  or  will  just  be  the
              beginning  of  the  document. It is stored in the index so that it can be displayed
              inside the  result  lists  without  decoding  the  original  file.  The  idxabsmlen
              parameter  defines the size of the stored abstract. The default value is 250 bytes.
              The search interface gives you  the  choice  to  display  this  stored  text  or  a
              synthetic  abstract built by extracting text around the search terms. If you always
              prefer the synthetic abstract, you can reduce this value and save a little space.

       aspellLanguage = lang
              Language definitions to use when creating the aspell dictionary.   The  value  must
              match  a  set  of aspell language definition files. You can type "aspell config" to
              see where these are installed (look for data-dir). The default if the  variable  is
              not set is to use your desktop national language environment to guess the value.

       noaspell = boolean
              If  this  is  set, the aspell dictionary generation is turned off. Useful for cases
              where you don't need the functionality  or  when  it  is  unusable  because  aspell
              crashes during dictionary generation.

       mhmboxquirks = flags
              This  allows  definining location-related quirks for the mailbox handler. Currently
              only the tbird flag is defined, and it should be set  for  directories  which  hold
              Thunderbird data, as their folder format is weird.

SEE ALSO

       recollindex(1) recoll(1)

                                         14 November 2012                          RECOLL.CONF(5)