Ubuntu Manpage: recoll.conf - main personal configuration file for Recoll

Provided by: recollcmd_1.41.1-1.1build1_amd64

NAME

       recoll.conf - main personal configuration file for Recoll

DESCRIPTION

This file defines the index configuration for the Recoll full-text search system.

The system-wide configuration file is normally located inside /usr/[local]/share/recoll/examples. Any
parameter set in the common file may be overridden by setting it in the specific index configuration
file, by default: $HOME/.recoll/recoll.conf

All recoll commands will accept a -c option or use the $RECOLL_CONFDIR environment variable to specify a
non-default index configuration directory.

A short extract of the file might look as follows:

# Space-separated list of directories to index.
topdirs = ~/docs /usr/share/doc

[~/somedirectory-with-utf8-txt-files]
defaultcharset = utf-8

There are three kinds of lines:

• Comment or empty.

• Parameter affectation.

• Section definition.

Empty lines or lines beginning with # are ignored.

Affectation lines are in the form 'name = value'. In the following description, they also have a type,
which is mostly indicative. The two non-obvious ones are 'fn': file path, and 'dfn': directory path.

Section lines allow redefining a parameter for a directory subtree. Some of the parameters used for
indexing are looked up hierarchically from the more to the less specific. Not all parameters can be
meaningfully redefined, this is specified for each in the next section.

The tilde character (~) is expanded in file names to the name of the user's home directory.

Some 'string' values are lists, which is only indicated by their description. In this case white space is
used for separation, and elements with embedded spaces can be quoted with double-quotes.

OPTIONS

topdirs = string
Space-separated list of files or directories to recursively index. You can use symbolic links in
the list, they will be followed, independently of the value of the followLinks variable. The
default value is ~ : recursively index $HOME.

monitordirs = string
Space-separated list of files or directories to monitor for updates. When running the real-time
indexer, this allows monitoring only a subset of the whole indexed area. The elements must be
included in the tree defined by the 'topdirs' members.

skippedNames = string
File and directory names which should be ignored. White space separated list of wildcard patterns
(simple ones, not paths, must contain no

Have a look at the default configuration for the initial value, some entries may not suit your
situation. The easiest way to see it is through the GUI Index configuration "local parameters"
panel.

The list in the default configuration does not exclude hidden directories (names beginning with a
dot), which means that it may index quite a few things that you do not want. On the other hand,
email user agents like Thunderbird usually store messages in hidden directories, and you probably
want this indexed. One possible solution is to have ".*" in "skippedNames", and add things like
"~/.thunderbird" "~/.evolution" to "topdirs".

Not even the file names are indexed for patterns in this list, see the "noContentSuffixes"
variable for an alternative approach which indexes the file names. Can be redefined for any
subtree.

skippedNames- = string
List of name patterns to remove from the default skippedNames list. Allows modifying the list in
the local configuration without copying it.

skippedNames+ = string
List of name patterns to add to the default skippedNames list. Allows modifying the list in the
local configuration without copying it.

onlyNames = string
Regular file name filter patterns. This is normally empty. If set, only the file names not in
skippedNames and matching one of the patterns will be considered for indexing. Can be redefined
per subtree. Does not apply to directories.

noContentSuffixes = string
List of name endings (not necessarily dot-separated suffixes) for which we don't try MIME type
identification, and don't uncompress or index content. Only the names will be indexed. This
complements the now obsoleted recoll_noindex list from the mimemap file, which will go away in a
future release (the move from mimemap to recoll.conf allows editing the list through the GUI).
This is different from skippedNames because these are name ending matches only (not wildcard
patterns), and the file name itself gets indexed normally. This can be redefined for
subdirectories.

noContentSuffixes- = string
List of name endings to remove from the default noContentSuffixes list.

noContentSuffixes+ = string
List of name endings to add to the default noContentSuffixes list.

skippedPaths = string
Absolute paths we should not go into. Space-separated list of wildcard expressions for absolute
filesystem paths (for files or directories). The variable must be defined at the top level of the
configuration file, not in a subsection.

Any value in the list must be textually consistent with the values in topdirs, no attempts are
made to resolve symbolic links. In practise, if, as is frequently the case, /home is a link to
/usr/home, your default topdirs will have a single entry '~' which will be translated to with
'/usr/home/yourlogin'.

The index and configuration directories will automatically be added to the list.

The expressions are matched using 'fnmatch(3)' with the FNM_PATHNAME flag set by default. This
means that '/' characters must be matched explicitly. You can set 'skippedPathsFnmPathname' to 0
to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will match '/dir1/dir2/dir3').

The default value contains the usual mount point for removable media to remind you that it is in
most cases a bad idea to have Recoll work on these. Explicitly adding '/media/xxx' to the
'topdirs' variable will override this.

skippedPathsFnmPathname = bool
Set to 0 to override use of FNM_PATHNAME for matching skipped paths.

nowalkfn = string
File name which will cause its parent directory to be skipped. Any directory containing a file
with this name will be skipped as if it was part of the skippedPaths list. Ex: .recoll-noindex

daemSkippedPaths = string
skippedPaths equivalent specific to real time indexing. This enables having parts of the tree
which are initially indexed but not monitored. If daemSkippedPaths is not set, the daemon uses
skippedPaths.

zipUseSkippedNames = bool
Use skippedNames inside Zip archives. Fetched directly by the rclzip.py handler. Skip the patterns
defined by skippedNames inside Zip archives. Can be redefined for subdirectories. See
https://www.recoll.org/faqsandhowtos/FilteringOutZipArchiveMembers.html

zipSkippedNames = string
Space-separated list of wildcard expressions for names that should be ignored inside zip archives.
This is used directly by the zip handler. If zipUseSkippedNames is not set, zipSkippedNames
defines the patterns to be skipped inside archives. If zipUseSkippedNames is set, the two lists
are concatenated and used. Can be redefined for subdirectories. See
https://www.recoll.org/faqsandhowtos/FilteringOutZipArchiveMembers.html

followLinks = bool
Follow symbolic links during indexing. The default is to ignore symbolic links to avoid multiple
indexing of linked files. No effort is made to avoid duplication when this option is set to true.
This option can be set individually for each of the 'topdirs' members by using sections. It can
not be changed below the 'topdirs' level. Links in the 'topdirs' list itself are always followed.

indexedmimetypes = string
Restrictive list of indexed MIME types. Normally not set (in which case all supported types are
indexed). If it is set, only the types from the list will have their contents indexed. The names
will be indexed anyway if indexallfilenames is set (default). MIME type names should be taken from
the mimemap file (the values may be different from xdg-mime or file -i output in some cases). Can
be redefined for subtrees.

excludedmimetypes = string
List of excluded MIME types. Lets you exclude some types from indexing. MIME type names should be
taken from the mimemap file (the values may be different from xdg-mime or file -i output in some
cases) Can be redefined for subtrees.

nomd5types = string
MIME types for which we don't compute a md5 hash. md5 checksums are used only for deduplicating
results, and can be very expensive to compute on multimedia or other big files. This list lets you
turn off md5 computation for selected types. It is global (no redefinition for subtrees). At the
moment, it only has an effect for external handlers (exec and execm). The file types can be
specified by listing either MIME types (e.g. audio/mpeg) or handler names (e.g. rclaudio.py).

compressedfilemaxkbs = int
Size limit for compressed files. We need to decompress these in a temporary directory for
identification, which can be wasteful in some cases. Limit the waste. Negative means no limit. 0
results in no processing of any compressed file. Default 100 MB.

textfilemaxmbs = int
Size limit for text files. Mostly for skipping monster logs. Default 20 MB. Use a value of -1 to
disable.

textfilepagekbs = int
Page size for text files. If this is set, text/plain files will be divided into documents of
approximately this size. This will reduce memory usage at index time and help with loading data in
the preview window at query time. Particularly useful with very big files, such as application or
system logs. Also see textfilemaxmbs and compressedfilemaxkbs.

textunknownasplain = bool
Process unknown text/xxx files as text/plain Allows indexing misc. text files identified as
text/whatever by 'file' or 'xdg-mime' without having to explicitely set config entries for them.
This works fine for indexing (also will cause processing of a lot of useless files), but the
documents indexed this way will be opened by the desktop viewer, even if text/plain has a specific
editor.

indexallfilenames = bool
Index the file names of unprocessed files. Index the names of files the contents of which we don't
index because of an excluded or unsupported MIME type.

usesystemfilecommand = bool
Use a system mechanism as last resort to guess a MIME type. Depending on platform and version, a
compile-time configuration will decide if this actually executes a command or uses libmagic. This
last-resort identification (if the suffix-based one failed) is generally useful, but will cause
the indexing of many bogus extension-less 'text' files. Also see 'systemfilecommand'.

systemfilecommand = string
Command to use for guessing the MIME type if the internal methods fail. This is ignored on Windows
or with Recoll 1.38+ if compiled with libmagic enabled (the default). Otherwise, this should be a
"file -i" workalike. The file path will be added as a last parameter to the command line. "xdg-
mime" works better than the traditional "file" command, and is now the configured default (with a
hard-coded fallback to "file")

processwebqueue = bool
Decide if we process the Web queue. The queue is a directory where the Recoll Web browser plugins
create the copies of visited pages.

membermaxkbs = int
Size limit for archive members. This is passed to the MIME handlers in the environment as
RECOLL_FILTER_MAXMEMBERKB.

indexStripChars = bool
Decide if we store character case and diacritics in the index. If we do, searches sensitive to
case and diacritics can be performed, but the index will be bigger, and some marginal weirdness
may sometimes occur. The default is a stripped index. When using multiple indexes for a search,
this parameter must be defined identically for all. Changing the value implies an index reset.

indexStoreDocText = bool
Decide if we store the documents' text content in the index. Storing the text allows extracting
snippets from it at query time, instead of building them from index position data.

Newer Xapian index formats have rendered our use of positions list unacceptably slow in some
cases. The last Xapian index format with good performance for the old method is Chert, which is
default for 1.2, still supported but not default in 1.4 and will be dropped in 1.6.

The stored document text is translated from its original format to UTF-8 plain text, but not
stripped of upper-case, diacritics, or punctuation signs. Storing it increases the index size by
10-20% typically, but also allows for nicer snippets, so it may be worth enabling it even if not
strictly needed for performance if you can afford the space.

The variable only has an effect when creating an index, meaning that the xapiandb directory must
not exist yet. Its exact effect depends on the Xapian version.

For Xapian 1.4, if the variable is set to 0, we used to use the Chert format and not store the
text. If the variable was 1, Glass was used, and the text stored. We don't do this any more:
storing the text has proved to be the much better option, and dropping this possibility simplifies
the code.

So now, the index format for a new index is always the default, but the variable still controls if
the text is stored or not, and the abstract generation method. With Xapian 1.4 and later, and the
variable set to 0, abstract generation may be very slow, but this setting may still be useful to
save space if you do not use abstract generation at all, by using the appropriate setting in the
GUI, and/or avoiding the Python API or recollq options which would trigger it.

nonumbers = bool
Decides if terms will be generated for numbers. For example "123", "1.5e6", 192.168.1.4, would not
be indexed if nonumbers is set ("value123" would still be). Numbers are often quite interesting to
search for, and this should probably not be set except for special situations, ie, scientific
documents with huge amounts of numbers in them, where setting nonumbers will reduce the index
size. This can only be set for a whole index, not for a subtree.

notermpositions = bool
Do not store term positions. Term positions allow for phrase and proximity searches, but make the
index much bigger. In some special circumstances, you may want to dispense with them.

dehyphenate = bool
Determines if we index 'coworker' also when the input is 'co-worker'. This is new in version 1.22,
and on by default. Setting the variable to off allows restoring the previous behaviour.

indexedpunctuation = string
String of UTF-8 punctuation characters to be indexed as words. The resulting terms will then be
searchable and, for example, by setting the parameter to "%€" (without the double quotes), you
would be able to search separately for "100%" or "100€" Note that "100%" or "100 %" would be
indexed in the same way, the characters are their own word separators.

backslashasletter = bool
Process backslash as a normal letter. This may make sense for people wanting to index TeX commands
as such but is not of much general use.

underscoreasletter = bool
Process underscore as normal letter. This makes sense in so many cases that one wonders if it
should not be the default.

maxtermlength = int
Maximum term length in Unicode characters. Words longer than this will be discarded. The default
is 40 and used to be hard-coded, but it can now be adjusted. You may need an index reset if you
change the value.

nocjk = bool
Decides if specific East Asian (Chinese Korean Japanese) characters/word splitting is turned off.
This will save a small amount of CPU if you have no CJK documents. If your document base does
include such text but you are not interested in searching it, setting nocjk may be a significant
time and space saver.

cjkngramlen = int
This lets you adjust the size of n-grams used for indexing CJK text. The default value of 2 is
probably appropriate in most cases. A value of 3 would allow more precision and efficiency on
longer words, but the index will be approximately twice as large.

hangultagger = string
External tokenizer for Korean Hangul. This allows using an language specific processor for
extracting terms from Korean text, instead of the generic n-gram term generator. See
https://www.recoll.org/pages/recoll-korean.html for instructions.

chinesetagger = string
External tokenizer for Chinese. This allows using the language specific Jieba tokenizer for
extracting meaningful terms from Chinese text, instead of the generic n-gram term generator. See
https://www.recoll.org/pages/recoll-chinese.html for instructions.

indexstemminglanguages = string
Languages for which to create stemming expansion data. Stemmer names can be found by executing
'recollindex -l', or this can also be set from a list in the GUI. The values are full language
names, e.g. english, french...

defaultcharset = string
Default character set. This is used for files which do not contain a character set definition
(e.g.: text/plain). Values found inside files, e.g. a 'charset' tag in HTML documents, will
override it. If this is not set, the default character set is the one defined by the NLS
environment ($LC_ALL, $LC_CTYPE, $LANG), or ultimately iso-8859-1 (cp-1252 in fact). If for some
reason you want a general default which does not match your LANG and is not 8859-1, use this
variable. This can be redefined for any sub-directory.

unac_except_trans = string
A list of characters, encoded in UTF-8, which should be handled specially when converting text to
unaccented lowercase. For example, in Swedish, the letter a with diaeresis has full alphabet
citizenship and should not be turned into an a. Each element in the space-separated list has the
special character as first element and the translation following. The handling of both the
lowercase and upper-case versions of a character should be specified, as appartenance to the list
will turn-off both standard accent and case processing. The value is global and affects both
indexing and querying. We also convert a few confusing Unicode characters (quotes, hyphen) to
their ASCII equivalent to avoid "invisible" search failures.

Examples: Swedish: unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl åå Åå ’'
❜' ʼ' ‐- unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl ’' ❜' ʼ' ‐- a
German ß unac_except_trans = ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl ’' ❜' ʼ' ‐- are not performed by
unac, but it is unlikely that someone would type the composed forms in a search.
unac_except_trans = ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl ’' ❜' ʼ' ‐-

maildefcharset = string
Overrides the default character set for email messages which don't specify one. This is mainly
useful for readpst (libpst) dumps, which are utf-8 but do not say so.

localfields = string
Set fields on all files (usually of a specific fs area). Syntax is the usual: name = value ; attr1
= val1 ; [...] value is empty so this needs an initial semi-colon. This is useful, e.g., for
setting the rclaptg field for application selection inside mimeview.

testmodifusemtime = bool
Use mtime instead of ctime to test if a file has been modified. The time is used in addition to
the size, which is always used. Setting this can reduce re-indexing on systems where extended
attributes are used (by some other application), but not indexed, because changing extended
attributes only affects ctime. Notes: - This may prevent detection of change in some marginal
file rename cases (the target would need to have the same size and mtime). - You should probably
also set noxattrfields to 1 in this case, except if you still prefer to perform xattr indexing,
for example if the local file update pattern makes it of value (as in general, there is a risk for
pure extended attributes updates without file modification to go undetected). Perform a full index
reset after changing this.

noxattrfields = bool
Disable extended attributes conversion to metadata fields. This probably needs to be set if
testmodifusemtime is set.

metadatacmds = string
Define commands to gather external metadata, e.g. tmsu tags. There can be several entries,
separated by semi-colons, each defining which field name the data goes into and the command to
use. Don't forget the initial semi-colon. All the field names must be different. You can use
aliases in the "field" file if necessary. As a not too pretty hack conceded to convenience, any
field name beginning with "rclmulti" will be taken as an indication that the command returns
multiple field values inside a text blob formatted as a recoll configuration file ("fieldname =
fieldvalue" lines). The rclmultixx name will be ignored, and field names and values will be parsed
from the data. Example: metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf %f

cachedir = dfn
Top directory for Recoll data. Recoll data directories are normally located relative to the
configuration directory (e.g. ~/.recoll/xapiandb, ~/.recoll/mboxcache). If 'cachedir' is set, the
directories are stored under the specified value instead (e.g. if cachedir is ~/.cache/recoll, the
default dbdir would be ~/.cache/recoll/xapiandb). This affects dbdir, webcachedir, mboxcachedir,
aspellDicDir, which can still be individually specified to override cachedir. Note that if you
have multiple configurations, each must have a different cachedir, there is no automatic
computation of a subpath under cachedir.

maxfsoccuppc = int
Maximum file system occupation over which we stop indexing. The value is a percentage,
corresponding to what the "Capacity" df output column shows. The default value is 0, meaning no
checking. This parameter is only checked when the indexer starts, it will not change the behaviour
or a running process.

dbdir = dfn
Xapian database directory location. This will be created on first indexing. If the value is not an
absolute path, it will be interpreted as relative to cachedir if set, or the configuration
directory (-c argument or $RECOLL_CONFDIR). If nothing is specified, the default is then
~/.recoll/xapiandb/

idxstatusfile = fn
Name of the scratch file where the indexer process updates its status. Default: idxstatus.txt
inside the configuration directory.

mboxcachedir = dfn
Directory location for storing mbox message offsets cache files. This is normally 'mboxcache'
under cachedir if set, or else under the configuration directory, but it may be useful to share a
directory between different configurations.

mboxcacheminmbs = int
Minimum mbox file size over which we cache the offsets. There is really no sense in caching
offsets for small files. The default is 5 MB.

mboxmaxmsgmbs = int
Maximum mbox member message size in megabytes. Size over which we assume that the mbox format is
bad or we misinterpreted it, at which point we just stop processing the file.

webcachedir = dfn
Directory where we store the archived web pages after they are processed. This is only used by the
Web history indexing code. Note that this is different from webdownloadsdir which tells the
indexer where the web pages are stored by the browser, before they are indexed and stored into
webcachedir. Default: cachedir/webcache if cachedir is set, else $RECOLL_CONFDIR/webcache

webcachemaxmbs = int
Maximum size in MB of the Web archive. This is only used by the web history indexing code.
Default: 40 MB. Reducing the size will not physically truncate the file.

webqueuedir = fn
The path to the Web indexing queue. This used to be hard-coded in the old plugin as
~/.recollweb/ToIndex so there would be no need or possibility to change it, but the WebExtensions
plugin now downloads the files to the user Downloads directory, and a script moves them to
webqueuedir. The script reads this value from the config so it has become possible to change it.

webdownloadsdir = fn
The path to the browser add-on download directory. This tells the indexer where the Web browser
add-on stores the web page data. The data is then moved by a script to webqueuedir, then
processed, and finally stored in webcachedir for future previews.

webcachekeepinterval = string
Page recycle interval By default, only one instance of an URL is kept in the cache. This can be
changed by setting this to a value determining at what frequency we keep multiple instances
('day', 'week', 'month', entries.

aspellDicDir = dfn
Aspell dictionary storage directory location. The aspell dictionary (aspdict.(lang).rws) is
normally stored in the directory specified by cachedir if set, or under the configuration
directory.

filtersdir = dfn
Directory location for executable input handlers. If RECOLL_FILTERSDIR is set in the environment,
we use it instead. Defaults to $prefix/share/recoll/filters. Can be redefined for subdirectories.

iconsdir = dfn
Directory location for icons. The only reason to change this would be if you want to change the
icons displayed in the result list. Defaults to $prefix/share/recoll/images

idxflushmb = int
Threshold (megabytes of new data) where we flush from memory to disk index. Setting this allows
some control over memory usage by the indexer process. A value of 0 means no explicit flushing,
which lets Xapian perform its own thing, meaning flushing every $XAPIAN_FLUSH_THRESHOLD documents
created, modified or deleted: as memory usage depends on average document size, not only document
count, the Xapian approach is is not very useful, and you should let Recoll manage the flushes.
The program compiled value is 0. The configured default value (from this file) is now 50 MB, and
should be ok in many cases. You can set it as low as 10 to conserve memory, but if you are
looking for maximum speed, you may want to experiment with values between 20 and 200. In my
experience, values beyond this are always counterproductive. If you find otherwise, please drop me
a note.

filtermaxseconds = int
Maximum external filter execution time in seconds. Default 1200 (20mn). Set to 0 for no limit.
This is mainly to avoid infinite loops in postscript files (loop.ps)

filtermaxmbytes = int
Maximum virtual memory space for filter processes (setrlimit(RLIMIT_AS)), in megabytes. Note that
this includes any mapped libs (there is no reliable Linux way to limit the data space only), so we
need to be a bit generous here. Anything over 2000 will be ignored on 32 bits machines. The high
default value is needed because of java-based handlers (pdftk) which need a lot of VM (most of it
text), esp. pdftk when executed from Python rclpdf.py. You can use a much lower value if you don't
need Java.

thrQSizes = string
Task queue depths for each stage and threading configuration control. There are three internal
queues in the indexing pipeline stages (file data extraction, terms generation, index update).
This parameter defines the queue depths for each stage (three integer values). In practise, deep
queues have not been shown to increase performance. The first value is also used to control
threading autoconfiguration or disabling multithreading. If the first queue depth is set to 0
Recoll will set the queue depths and thread counts based on the detected number of CPUs. The
arbitrarily chosen values are as follows (depth,nthread). 1 CPU -> no threading. Less than 4 CPUs:
(2, 2) (2, 2) (2, 1). Less than 6: (2, 4), (2, 2), (2, 1). Else (2, 5), (2, 3), (2, 1). If the
first queue depth is set to -1, multithreading will be disabled entirely. The second and third
values are ignored in both these cases.

thrTCounts = string
Number of threads used for each indexing stage. If the first entry in thrQSizes is not 0 or -1,
these three values define the number of threads used for each stage (file data extraction, term
generation, index update). It makes no sense to use a value other than 1 for the last stage
because updating the Xapian index is necessarily single-threaded (and protected by a mutex).

thrTmpDbCnt = int
Number of temporary indexes used during incremental or full indexing. If not set to zero, this
defines how many temporary indexes we use during indexing. These temporary indexes are merged
into the main one at the end of the operation. Using multiple indexes and a final merge can
significantly improve indexing performance when the single-threaded Xapian index updates become a
bottleneck. How useful this is depends on the type of input and CPU. See the manual for more
details.

loglevel = int
Log file verbosity 1-6. A value of 2 will print only errors and warnings. 3 will print information
like document updates, 4 is quite verbose and 6 very verbose.

logfilename = fn
Log file destination. Use 'stderr' (default) to write to the console.

idxloglevel = int
Override loglevel for the indexer.

idxlogfilename = fn
Override logfilename for the indexer.

helperlogfilename = fn
Destination file for external helpers standard error output. The external program error output is
left alone by default, e.g. going to the terminal when the recoll[index] program is executed from
the command line. Use /dev/null or a file inside a non-existent directory to completely suppress
the output.

daemloglevel = int
Override loglevel for the indexer in real time mode. The default is to use the idx... values if
set, else the log... values.

daemlogfilename = fn
Override logfilename for the indexer in real time mode. The default is to use the idx... values if
set, else the log... values.

pyloglevel = int
Override loglevel for the python module.

pylogfilename = fn
Override logfilename for the python module.

idxnoautopurge = bool
Do not purge data for deleted or inaccessible files This can be overridden by recollindex command
line options and may be useful if some parts of the document set may predictably be inaccessible
at times, so that you would only run the purge after making sure that everything is there.

orgidxconfdir = dfn
Original location of the configuration directory. This is used exclusively for movable datasets.
Locating the configuration directory inside the directory tree makes it possible to provide
automatic query time path translations once the data set has moved (for example, because it has
been mounted on another location).

curidxconfdir = dfn
Current location of the configuration directory. Complement orgidxconfdir for movable datasets.
This should be used if the configuration directory has been copied from the dataset to another
location, either because the dataset is readonly and an r/w copy is desired, or for performance
reasons. This records the original moved location before copy, to allow path translation
computations. For example if a dataset originally indexed as '/home/me/mydata/config' has been
mounted to '/media/me/mydata', and the GUI is running from a copied configuration, orgidxconfdir
would be '/home/me/mydata/config', and curidxconfdir (as set in the copied configuration) would be

idxrundir = dfn
Indexing process current directory. The input handlers sometimes leave temporary files in the
current directory, so it makes sense to have recollindex chdir to some temporary directory. If the
value is empty, the current directory is not changed. If the value is (literal) tmp, we use the
temporary directory as set by the environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the value
is an absolute path to a directory, we go there.

checkneedretryindexscript = fn
Script used to heuristically check if we need to retry indexing files which previously failed.
The default script checks the modified dates on /usr/bin and /usr/local/bin. A relative path will
be looked up in the filters dirs, then in the path. Use an absolute path to do otherwise.

recollhelperpath = string
Additional places to search for helper executables. This is used, e.g., on Windows by the Python
code, and on Mac OS by the bundled recoll.app (because I could find no reliable way to tell
launchd to set the PATH). The example below is for Windows. Use ':' as entry separator for Mac and
Ux-like systems, ';' is for Windows only.

idxabsmlen = int
Length of abstracts we store while indexing. Recoll stores an abstract for each indexed file. The
text can come from an actual 'abstract' section in the document or will just be the beginning of
the document. It is stored in the index so that it can be displayed inside the result lists
without decoding the original file. The idxabsmlen parameter defines the size of the stored
abstract. The default value is 250 bytes. The search interface gives you the choice to display
this stored text or a synthetic abstract built by extracting text around the search terms. If you
always prefer the synthetic abstract, you can reduce this value and save a little space.

idxmetastoredlen = int
Truncation length of stored metadata fields. This does not affect indexing (the whole field is
processed anyway), just the amount of data stored in the index for the purpose of displaying
fields inside result lists or previews. The default value is 150 bytes which may be too low if you
have custom fields.

idxtexttruncatelen = int
Truncation length for all document texts. Only index the beginning of documents. This is not
recommended except if you are sure that the interesting keywords are at the top and have severe
disk space issues.

idxsynonyms = fn
Name of the index-time synonyms file. This is only used to issue multi-word single terms for
multi-word synonyms so that phrase and proximity searches work for them (ex: applejack "apple
jack"). The feature will only have an effect for querying if the query-time and index-time synonym
files are the same.

idxniceprio = int
"nice" process priority for the indexing processes. Default: 19 (lowest) Appeared with 1.26.5.
Prior versions were fixed at 19.

noaspell = bool
Disable aspell use. The aspell dictionary generation takes time, and some combinations of aspell
version, language, and local terms, result in aspell crashing, so it sometimes makes sense to just
disable the thing.

aspellLanguage = string
Language definitions to use when creating the aspell dictionary. The value must match a set of
aspell language definition files. You can type "aspell dicts" to see a list The default if this is
not set is to use the NLS environment to guess the value. The values are the 2-letter language
codes (e.g. 'en', 'fr'...)

aspellAddCreateParam = string
Additional option and parameter to aspell dictionary creation command. Some aspell packages may
need an additional option (e.g. on Debian Jessie: --local-data-dir=/usr/lib/aspell). See Debian
bug 772415.

aspellKeepStderr = bool
Set this to have a look at aspell dictionary creation errors. There are always many, so this is
mostly for debugging.

monauxinterval = int
Auxiliary database update interval. The real time indexer only updates the auxiliary databases
(stemdb, aspell) periodically, because it would be too costly to do it for every document change.
The default period is one hour.

monixinterval = int
Minimum interval (seconds) between processings of the indexing queue. The real time indexer does
not process each event when it comes in, but lets the queue accumulate, to diminish overhead and
to aggregate multiple events affecting the same file. Default 30 S.

mondelaypatterns = string
Timing parameters for the real time indexing. Definitions for files which get a longer delay
before reindexing is allowed. This is for fast-changing files, that should only be reindexed once
in a while. A list of wildcardPattern:seconds pairs. The patterns are matched with
fnmatch(pattern, path, 0) You can quote entries containing white space with double quotes (quote
the whole entry, not the pattern). The default is empty. Example: mondelaypatterns = *.log:20
"*with spaces.*:30"

monioniceclass = int
ionice class for the indexing process. Despite the misleading name, and on platforms where this is
supported, this affects all indexing processes, not only the real time/monitoring ones. The
default value is 3 (use lowest "Idle" priority).

monioniceclassdata = string
ionice class level parameter if the class supports it. The default is empty, as the default "Idle"
class has no levels.

autodiacsens = bool
auto-trigger diacritics sensitivity (raw index only). IF the index is not stripped, decide if we
automatically trigger diacritics sensitivity if the search term has accented characters (not in
unac_except_trans). Else you need to use the query language and the "D" modifier to specify
diacritics sensitivity. Default is no.

autocasesens = bool
auto-trigger case sensitivity (raw index only). IF the index is not stripped (see
indexStripChars), decide if we automatically trigger character case sensitivity if the search term
has upper-case characters in any but the first position. Else you need to use the query language
and the "C" modifier to specify character-case sensitivity. Default is yes.

maxTermExpand = int
Maximum query expansion count for a single term (e.g.: when using wildcards). This only affects
queries, not indexing. We used to not limit this at all (except for filenames where the limit was
too low at 1000), but it is unreasonable with a big index. Default 10000.

maxXapianClauses = int
Maximum number of clauses we add to a single Xapian query. This only affects queries, not
indexing. In some cases, the result of term expansion can be multiplicative, and we want to avoid
eating all the memory. Default 50000.

snippetMaxPosWalk = int
Maximum number of positions we walk while populating a snippet for the result list. The default of
1,000,000 may be insufficient for very big documents, the consequence would be snippets with
possibly meaning-altering missing words.

thumbnailercmd = string
Command to use for generating thumbnails. If set, this should be a path to a command or script
followed by its constant arguments. Four arguments will be appended before execution: the document
URL, MIME type, target icon SIZE (e.g. 128), and output file PATH. The command should generate a
thumbnail from these values. E.g. if the MIME is video, a script could use: ffmpegthumbnailer
-iURL -oPATH -sSIZE.

stemexpandphrases = bool
Default to applying stem expansion to phrase terms. Recoll normally does not apply stem expansion
to terms inside phrase searches. Setting this parameter will change the default behaviour to
expanding terms inside phrases. If set, you can use a 'l' modifier to disable expansion for a
specific instance.

autoSpellRarityThreshold = int
Inverse of the ratio of term occurrence to total db terms over which we look for spell neighbours
for automatic query expansion When a term is very uncommon, we may (depending on user choice) look
for spelling variations which would be more common and possibly add them to the query.

autoSpellSelectionThreshold = int
Ratio of spell neighbour frequency over user input term frequency beyond which we include the
neighbour in the query. When a term has been selected for spelling expansion because of its
rarity, we only include spelling neighbours which are more common by this ratio.

kioshowsubdocs = bool
Show embedded document results in KDE dolphin/kio and krunner Embedded documents may clutter the
results and are not always easily usable from the kio or krunner environment. Setting this
variable will restrict the results to standalone documents.

pdfocr = bool
Attempt OCR of PDF files with no text content. This can be defined in subdirectories. The default
is off because OCR is so very slow.

pdfoutline = bool
Extract outlines and bookmarks from PDF documents (needs pdftohtml). This is not enabled by
default because it is rarely needed, and the extra command takes a little time.

pdfattach = bool
Enable PDF attachment extraction by executing pdftk (if available). This is normally disabled,
because it does slow down PDF indexing a bit even if not one attachment is ever found.

pdfextrameta = string
Extract text from selected XMP metadata tags. This is a space-separated list of qualified XMP tag
names. Each element can also include a translation to a Recoll field name, separated by a '|'
character. If the second element is absent, the tag name is used as the Recoll field names. You
will also need to add specifications to the "fields" file to direct processing of the extracted
data.

pdfextrametafix = fn
Define name of XMP field editing script. This defines the name of a script to be loaded for
editing XMP field values. The script should define a 'MetaFixer' class with a metafix() method
which will be called with the qualified tag name and value of each selected field, for editing or
erasing. A new instance is created for each document, so that the object can keep state for, e.g.
eliminating duplicate values.

ocrprogs = string
OCR modules to try. The top OCR script will try to load the corresponding modules in order and use
the first which reports being capable of performing OCR on the input file. Modules for tesseract
(tesseract) and ABBYY FineReader (abbyy) are present in the standard distribution. For
compatibility with the previous version, if this is not defined at all, the default value is
"tesseract". Use an explicit empty value if needed. A value of "abbyy tesseract" will try
everything.

ocrcachedir = dfn
Location for caching OCR data. The default if this is empty or undefined is to store the cached
OCR data under $RECOLL_CONFDIR/ocrcache.

tesseractlang = string
Language to assume for tesseract OCR. Important for improving the OCR accuracy. This can also be
set through the contents of a file in the currently processed directory. See the
rclocrtesseract.py script. Example values: eng, fra... See the tesseract documentation.

tesseractcmd = fn
Path for the tesseract command. Do not quote. This is mostly useful on Windows, or for specifying
a non-default tesseract command. E.g. on Windows. tesseractcmd = C:/ProgramFiles(x86)/Tesseract-
OCR/tesseract.exe

abbyylang = string
Language to assume for abbyy OCR. Important for improving the OCR accuracy. This can also be set
through the contents of a file in the currently processed directory. See the rclocrabbyy.py
script. Typical values: English, French... See the ABBYY documentation.

abbyyocrcmd = fn
Path for the abbyy command The ABBY directory is usually not in the path, so you should set this.

speechtotext = string
Activate speech to text conversion The only possible value at the moment is "whisper" for using
the OpenAI whisper program.

sttmodel = string
Name of the whisper model

sttdevice = string
Name of the device to be used by for whisper

orgmodesubdocs = bool
Index org-mode level 1 sections as separate sub-documents This is the default. If set to false,
org-mode files will be indexed as plain text

mhmboxquirks = string
Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory where the email
mbox files are stored.

NAME

DESCRIPTION

OPTIONS

SEE ALSO