Ubuntu Manpage: recoll.conf - main personal configuration file for Recoll

Provided by: recollcmd_1.26.3-1build1_amd64

NAME

       recoll.conf - main personal configuration file for Recoll

DESCRIPTION

This file defines the index configuration for the Recoll full-text search system.

The system-wide configuration file is normally located inside
/usr/[local]/share/recoll/examples. Any parameter set in the common file may be overridden
by setting it in the personal configuration file, by default: $HOME/.recoll/recoll.conf

Please note while I try to keep this manual page reasonably up to date, it will frequently
lag the current state of the software. The best source of information about the
configuration are the comments in the system-wide configuration file or the user manual
which you can access from the recoll GUI help menu or on the recoll web site.

A short extract of the file might look as follows:

# Space-separated list of directories to index.
topdirs = ~/docs /usr/share/doc

[~/somedirectory-with-utf8-txt-files]
defaultcharset = utf-8

There are three kinds of lines:

• Comment or empty

• Parameter affectation

• Section definition

Empty lines or lines beginning with # are ignored.

Affectation lines are in the form 'name = value'.

Section lines allow redefining a parameter for a directory subtree. Some of the parameters
used for indexing are looked up hierarchically from the more to the less specific. Not all
parameters can be meaningfully redefined, this is specified for each in the next section.

The tilde character (~) is expanded in file names to the name of the user's home
directory.

Where values are lists, white space is used for separation, and elements with embedded
spaces can be quoted with double-quotes.

OPTIONS

topdirs = string
Space-separated list of files or directories to recursively index. Default to ~
(indexes $HOME). You can use symbolic links in the list, they will be followed,
independently of the value of the followLinks variable.

monitordirs = string
Space-separated list of files or directories to monitor for updates. When running
the real-time indexer, this allows monitoring only a subset of the whole indexed
area. The elements must be included in the tree defined by the 'topdirs' members.

skippedNames = string
Files and directories which should be ignored. White space separated list of
wildcard patterns (simple ones, not paths, must contain no / ), which will be
tested against file and directory names. The list in the default configuration
does not exclude hidden directories (names beginning with a dot), which means that
it may index quite a few things that you do not want. On the other hand, email user
agents like Thunderbird usually store messages in hidden directories, and you
probably want this indexed. One possible solution is to have ".*" in
"skippedNames", and add things like "~/.thunderbird" "~/.evolution" to "topdirs".
Not even the file names are indexed for patterns in this list, see the
"noContentSuffixes" variable for an alternative approach which indexes the file
names. Can be redefined for any subtree.

skippedNames- = string
List of name endings to remove from the default skippedNames list.

skippedNames+ = string
List of name endings to add to the default skippedNames list.

noContentSuffixes = string
List of name endings (not necessarily dot-separated suffixes) for which we don't
try MIME type identification, and don't uncompress or index content. Only the names
will be indexed. This complements the now obsoleted recoll_noindex list from the
mimemap file, which will go away in a future release (the move from mimemap to
recoll.conf allows editing the list through the GUI). This is different from
skippedNames because these are name ending matches only (not wildcard patterns),
and the file name itself gets indexed normally. This can be redefined for
subdirectories.

noContentSuffixes- = string
List of name endings to remove from the default noContentSuffixes list.

noContentSuffixes+ = string
List of name endings to add to the default noContentSuffixes list.

skippedPaths = string
Absolute paths we should not go into. Space-separated list of wildcard expressions
for absolute filesystem paths. Must be defined at the top level of the
configuration file, not in a subsection. Can contain files and directories. The
database and configuration directories will automatically be added. The expressions
are matched using 'fnmatch(3)' with the FNM_PATHNAME flag set by default. This
means that '/' characters must be matched explicitly. You can set
'skippedPathsFnmPathname' to 0 to disable the use of FNM_PATHNAME (meaning that
'/*/dir3' will match '/dir1/dir2/dir3'). The default value contains the usual mount
point for removable media to remind you that it is a bad idea to have Recoll work
on these (esp. with the monitor: media gets indexed on mount, all data gets erased
on unmount). Explicitly adding '/media/xxx' to the 'topdirs' variable will override
this.

skippedPathsFnmPathname = bool
Set to 0 to override use of FNM_PATHNAME for matching skipped paths.

nowalkfn = string
File name which will cause its parent directory to be skipped. Any directory
containing a file with this name will be skipped as if it was part of the
skippedPaths list. Ex: .recoll-noindex

daemSkippedPaths = string
skippedPaths equivalent specific to real time indexing. This enables having parts
of the tree which are initially indexed but not monitored. If daemSkippedPaths is
not set, the daemon uses skippedPaths.

zipUseSkippedNames = bool
Use skippedNames inside Zip archives. Fetched directly by the rclzip handler. Skip
the patterns defined by skippedNames inside Zip archives. Can be redefined for
subdirectories. See
https://www.lesbonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMembers.html

zipSkippedNames = string
Space-separated list of wildcard expressions for names that should be ignored
inside zip archives. This is used directly by the zip handler. If
zipUseSkippedNames is not set, zipSkippedNames defines the patterns to be skipped
inside archives. If zipUseSkippedNames is set, the two lists are concatenated and
used. Can be redefined for subdirectories. See
https://www.lesbonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMembers.html

followLinks = bool
Follow symbolic links during indexing. The default is to ignore symbolic links to
avoid multiple indexing of linked files. No effort is made to avoid duplication
when this option is set to true. This option can be set individually for each of
the 'topdirs' members by using sections. It can not be changed below the 'topdirs'
level. Links in the 'topdirs' list itself are always followed.

indexedmimetypes = string
Restrictive list of indexed mime types. Normally not set (in which case all
supported types are indexed). If it is set, only the types from the list will have
their contents indexed. The names will be indexed anyway if indexallfilenames is
set (default). MIME type names should be taken from the mimemap file (the values
may be different from xdg-mime or file -i output in some cases). Can be redefined
for subtrees.

excludedmimetypes = string
List of excluded MIME types. Lets you exclude some types from indexing. MIME type
names should be taken from the mimemap file (the values may be different from xdg-
mime or file -i output in some cases) Can be redefined for subtrees.

nomd5types = string
Don't compute md5 for these types. md5 checksums are used only for deduplicating
results, and can be very expensive to compute on multimedia or other big files.
This list lets you turn off md5 computation for selected types. It is global (no
redefinition for subtrees). At the moment, it only has an effect for external
handlers (exec and execm). The file types can be specified by listing either MIME
types (e.g. audio/mpeg) or handler names (e.g. rclaudio).

compressedfilemaxkbs = int
Size limit for compressed files. We need to decompress these in a temporary
directory for identification, which can be wasteful in some cases. Limit the waste.
Negative means no limit. 0 results in no processing of any compressed file. Default
50 MB.

textfilemaxmbs = int
Size limit for text files. Mostly for skipping monster logs. Default 20 MB.

indexallfilenames = bool
Index the file names of unprocessed files Index the names of files the contents of
which we don't index because of an excluded or unsupported MIME type.

usesystemfilecommand = bool
Use a system command for file MIME type guessing as a final step in file type
identification This is generally useful, but will usually cause the indexing of
many bogus 'text' files. See 'systemfilecommand' for the command used.

systemfilecommand = string
Command used to guess MIME types if the internal methods fails This should be a
"file -i" workalike. The file path will be added as a last parameter to the
command line. "xdg-mime" works better than the traditional "file" command, and is
now the configured default (with a hard-coded fallback to "file")

processwebqueue = bool
Decide if we process the Web queue. The queue is a directory where the Recoll Web
browser plugins create the copies of visited pages.

textfilepagekbs = int
Page size for text files. If this is set, text/plain files will be divided into
documents of approximately this size. Will reduce memory usage at index time and
help with loading data in the preview window at query time. Particularly useful
with very big files, such as application or system logs. Also see textfilemaxmbs
and compressedfilemaxkbs.

membermaxkbs = int
Size limit for archive members. This is passed to the filters in the environment as
RECOLL_FILTER_MAXMEMBERKB.

indexStripChars = bool
Decide if we store character case and diacritics in the index. If we do, searches
sensitive to case and diacritics can be performed, but the index will be bigger,
and some marginal weirdness may sometimes occur. The default is a stripped index.
When using multiple indexes for a search, this parameter must be defined
identically for all. Changing the value implies an index reset.

indexStoreDocText = bool
Decide if we store the documents' text content in the index. Storing the text
allows extracting snippets from it at query time, instead of building them from
index position data. Newer Xapian index formats have rendered our use of positions
list unacceptably slow in some cases. The last Xapian index format with good
performance for the old method is Chert, which is default for 1.2, still supported
but not default in 1.4 and will be dropped in 1.6. The stored document text is
translated from its original format to UTF-8 plain text, but not stripped of upper-
case, diacritics, or punctuation signs. Storing it increases the index size by
10-20% typically, but also allows for nicer snippets, so it may be worth enabling
it even if not strictly needed for performance if you can afford the space. The
variable only has an effect when creating an index, meaning that the xapiandb
directory must not exist yet. Its exact effect depends on the Xapian version. For
Xapian 1.4, if the variable is set to 0, the Chert format will be used, and the
text will not be stored. If the variable is 1, Glass will be used, and the text
stored. For Xapian 1.2, and for versions after 1.5 and newer, the index format is
always the default, but the variable controls if the text is stored or not, and the
abstract generation method. With Xapian 1.5 and later, and the variable set to 0,
abstract generation may be very slow, but this setting may still be useful to save
space if you do not use abstract generation at all.

nonumbers = bool
Decides if terms will be generated for numbers. For example "123", "1.5e6",
192.168.1.4, would not be indexed if nonumbers is set ("value123" would still be).
Numbers are often quite interesting to search for, and this should probably not be
set except for special situations, ie, scientific documents with huge amounts of
numbers in them, where setting nonumbers will reduce the index size. This can only
be set for a whole index, not for a subtree.

dehyphenate = bool
Determines if we index in version 1.22, and on by default. Setting the variable to
off allows restoring the previous behaviour.

backslashasletter = bool
Process backslash as normal letter This may make sense for people wanting to index
TeX commands as such but is not of much general use.

maxtermlength = int
Maximum term length. Words longer than this will be discarded. The default is 40
and used to be hard-coded, but it can now be adjusted. You need an index reset if
you change the value.

nocjk = bool
Decides if specific East Asian (Chinese Korean Japanese) characters/word splitting
is turned off. This will save a small amount of CPU if you have no CJK documents.
If your document base does include such text but you are not interested in
searching it, setting nocjk may be a significant time and space saver.

cjkngramlen = int
This lets you adjust the size of n-grams used for indexing CJK text. The default
value of 2 is probably appropriate in most cases. A value of 3 would allow more
precision and efficiency on longer words, but the index will be approximately twice
as large.

indexstemminglanguages = string
Languages for which to create stemming expansion data. Stemmer names can be found
by executing 'recollindex -l', or this can also be set from a list in the GUI.

defaultcharset = string
Default character set. This is used for files which do not contain a character set
definition (e.g.: text/plain). Values found inside files, e.g. a 'charset' tag in
HTML documents, will override it. If this is not set, the default character set is
the one defined by the NLS environment ($LC_ALL, $LC_CTYPE, $LANG), or ultimately
iso-8859-1 (cp-1252 in fact). If for some reason you want a general default which
does not match your LANG and is not 8859-1, use this variable. This can be
redefined for any sub-directory.

unac_except_trans = string
A list of characters, encoded in UTF-8, which should be handled specially when
converting text to unaccented lowercase. For example, in Swedish, the letter a with
diaeresis has full alphabet citizenship and should not be turned into an a. Each
element in the space-separated list has the special character as first element and
the translation following. The handling of both the lowercase and upper-case
versions of a character should be specified, as appartenance to the list will turn-
off both standard accent and case processing. The value is global and affects both
indexing and querying. Examples: Swedish: unac_except_trans = ää Ää öö Öö üü Üü
ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl åå Åå unac_except_trans = ää Ää öö Öö üü Üü ßss œoe
Œoe æae Æae ﬀff ﬁfi ﬂfl In French, you probably want to decompose oe and ae and
nobody would type a German ß unac_except_trans = ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl
are not performed by unac, but it is unlikely that someone would type the composed
forms in a search. unac_except_trans = ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl

maildefcharset = string
Overrides the default character set for email messages which don't specify one.
This is mainly useful for readpst (libpst) dumps, which are utf-8 but do not say
so.

localfields = string
Set fields on all files (usually of a specific fs area). Syntax is the usual: name
= value ; attr1 = val1 ; [...] value is empty so this needs an initial semi-colon.
This is useful, e.g., for setting the rclaptg field for application selection
inside mimeview.

testmodifusemtime = bool
Use mtime instead of ctime to test if a file has been modified. The time is used in
addition to the size, which is always used. Setting this can reduce re-indexing on
systems where extended attributes are used (by some other application), but not
indexed, because changing extended attributes only affects ctime. Notes: - This
may prevent detection of change in some marginal file rename cases (the target
would need to have the same size and mtime). - You should probably also set
noxattrfields to 1 in this case, except if you still prefer to perform xattr
indexing, for example if the local file update pattern makes it of value (as in
general, there is a risk for pure extended attributes updates without file
modification to go undetected). Perform a full index reset after changing this.

noxattrfields = bool
Disable extended attributes conversion to metadata fields. This probably needs to
be set if testmodifusemtime is set.

metadatacmds = string
Define commands to gather external metadata, e.g. tmsu tags. There can be several
entries, separated by semi-colons, each defining which field name the data goes
into and the command to use. Don't forget the initial semi-colon. All the field
names must be different. You can use aliases in the "field" file if necessary. As
a not too pretty hack conceded to convenience, any field name beginning with
"rclmulti" will be taken as an indication that the command returns multiple field
values inside a text blob formatted as a recoll configuration file ("fieldname =
fieldvalue" lines). The rclmultixx name will be ignored, and field names and values
will be parsed from the data. Example: metadatacmds = ; tags = tmsu tags %f;
rclmulti1 = cmdOutputsConf %f

cachedir = dfn
Top directory for Recoll data. Recoll data directories are normally located
relative to the configuration directory (e.g. ~/.recoll/xapiandb,
~/.recoll/mboxcache). If 'cachedir' is set, the directories are stored under the
specified value instead (e.g. if cachedir is ~/.cache/recoll, the default dbdir
would be ~/.cache/recoll/xapiandb). This affects dbdir, webcachedir, mboxcachedir,
aspellDicDir, which can still be individually specified to override cachedir. Note
that if you have multiple configurations, each must have a different cachedir,
there is no automatic computation of a subpath under cachedir.

maxfsoccuppc = int
Maximum file system occupation over which we stop indexing. The value is a
percentage, corresponding to what the "Capacity" df output column shows. The
default value is 0, meaning no checking.

dbdir = dfn
Xapian database directory location. This will be created on first indexing. If the
value is not an absolute path, it will be interpreted as relative to cachedir if
set, or the configuration directory (-c argument or $RECOLL_CONFDIR). If nothing
is specified, the default is then ~/.recoll/xapiandb/

idxstatusfile = fn
Name of the scratch file where the indexer process updates its status. Default:
idxstatus.txt inside the configuration directory.

mboxcachedir = dfn
Directory location for storing mbox message offsets cache files. This is normally
'mboxcache' under cachedir if set, or else under the configuration directory, but
it may be useful to share a directory between different configurations.

mboxcacheminmbs = int
Minimum mbox file size over which we cache the offsets. There is really no sense in
caching offsets for small files. The default is 5 MB.

webcachedir = dfn
Directory where we store the archived web pages. This is only used by the web
history indexing code Default: cachedir/webcache if cachedir is set, else
$RECOLL_CONFDIR/webcache

webcachemaxmbs = int
Maximum size in MB of the Web archive. This is only used by the web history
indexing code. Default: 40 MB. Reducing the size will not physically truncate the
file.

webqueuedir = fn
The path to the Web indexing queue. This used to be hard-coded in the old plugin as
~/.recollweb/ToIndex so there would be no need or possibility to change it, but the
WebExtensions plugin now downloads the files to the user Downloads directory, and a
script moves them to webqueuedir. The script reads this value from the config so it
has become possible to change it.

webdownloadsdir = fn
The path to browser downloads directory. This is where the new browser add-on
extension has to create the files. They are then moved by a script to webqueuedir.

aspellDicDir = dfn
Aspell dictionary storage directory location. The aspell dictionary
(aspdict.(lang).rws) is normally stored in the directory specified by cachedir if
set, or under the configuration directory.

filtersdir = dfn
Directory location for executable input handlers. If RECOLL_FILTERSDIR is set in
the environment, we use it instead. Defaults to $prefix/share/recoll/filters. Can
be redefined for subdirectories.

iconsdir = dfn
Directory location for icons. The only reason to change this would be if you want
to change the icons displayed in the result list. Defaults to
$prefix/share/recoll/images

idxflushmb = int
Threshold (megabytes of new data) where we flush from memory to disk index. Setting
this allows some control over memory usage by the indexer process. A value of 0
means no explicit flushing, which lets Xapian perform its own thing, meaning
flushing every $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as
memory usage depends on average document size, not only document count, the Xapian
approach is is not very useful, and you should let Recoll manage the flushes. The
program compiled value is 0. The configured default value (from this file) is now
50 MB, and should be ok in many cases. You can set it as low as 10 to conserve
memory, but if you are looking for maximum speed, you may want to experiment with
values between 20 and 200. In my experience, values beyond this are always
counterproductive. If you find otherwise, please drop me a note.

filtermaxseconds = int
Maximum external filter execution time in seconds. Default 1200 (20mn). Set to 0
for no limit. This is mainly to avoid infinite loops in postscript files (loop.ps)

filtermaxmbytes = int
Maximum virtual memory space for filter processes (setrlimit(RLIMIT_AS)), in
megabytes. Note that this includes any mapped libs (there is no reliable Linux way
to limit the data space only), so we need to be a bit generous here. Anything over
2000 will be ignored on 32 bits machines.

thrQSizes = string
Stage input queues configuration. There are three internal queues in the indexing
pipeline stages (file data extraction, terms generation, index update). This
parameter defines the queue depths for each stage (three integer values). If a
value of -1 is given for a given stage, no queue is used, and the thread will go on
performing the next stage. In practise, deep queues have not been shown to increase
performance. Default: a value of 0 for the first queue tells Recoll to perform
autoconfiguration based on the detected number of CPUs (no need for the two other
values in this case). Use thrQSizes = -1 -1 -1 to disable multithreading entirely.

thrTCounts = string
Number of threads used for each indexing stage. The three stages are: file data
extraction, terms generation, index update). The use of the counts is also
controlled by some special values in thrQSizes: if the first queue depth is 0, all
counts are ignored (autoconfigured); if a value of -1 is used for a queue depth,
the corresponding thread count is ignored. It makes no sense to use a value other
than 1 for the last stage because updating the Xapian index is necessarily single-
threaded (and protected by a mutex).

loglevel = int
Log file verbosity 1-6. A value of 2 will print only errors and warnings. 3 will
print information like document updates, 4 is quite verbose and 6 very verbose.

logfilename = fn
Log file destination. Use 'stderr' (default) to write to the console.

idxloglevel = int
Override loglevel for the indexer.

idxlogfilename = fn
Override logfilename for the indexer.

daemloglevel = int
Override loglevel for the indexer in real time mode. The default is to use the
idx... values if set, else the log... values.

daemlogfilename = fn
Override logfilename for the indexer in real time mode. The default is to use the
idx... values if set, else the log... values.

orgidxconfdir = dfn
Original location of the configuration directory. This is used exclusively for
movable datasets. Locating the configuration directory inside the directory tree
makes it possible to provide automatic query time path translations once the data
set has moved (for example, because it has been mounted on another location).

curidxconfdir = dfn
Current location of the configuration directory. Complement orgidxconfdir for
movable datasets. This should be used if the configuration directory has been
copied from the dataset to another location, either because the dataset is readonly
and an r/w copy is desired, or for performance reasons. This records the original
moved location before copy, to allow path translation computations. For example if
a dataset originally indexed as '/home/me/mydata/config' has been mounted to
'/media/me/mydata', and the GUI is running from a copied configuration,
orgidxconfdir would be '/home/me/mydata/config', and curidxconfdir (as set in the
copied configuration) would be

idxrundir = dfn
Indexing process current directory. The input handlers sometimes leave temporary
files in the current directory, so it makes sense to have recollindex chdir to some
temporary directory. If the value is empty, the current directory is not changed.
If the value is (literal) tmp, we use the temporary directory as set by the
environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the value is an absolute path
to a directory, we go there.

checkneedretryindexscript = fn
Script used to heuristically check if we need to retry indexing files which
previously failed. The default script checks the modified dates on /usr/bin and
/usr/local/bin. A relative path will be looked up in the filters dirs, then in the
path. Use an absolute path to do otherwise.

recollhelperpath = string
Additional places to search for helper executables. This is only used on Windows
for now.

idxabsmlen = int
Length of abstracts we store while indexing. Recoll stores an abstract for each
indexed file. The text can come from an actual 'abstract' section in the document
or will just be the beginning of the document. It is stored in the index so that it
can be displayed inside the result lists without decoding the original file. The
idxabsmlen parameter defines the size of the stored abstract. The default value is
250 bytes. The search interface gives you the choice to display this stored text or
a synthetic abstract built by extracting text around the search terms. If you
always prefer the synthetic abstract, you can reduce this value and save a little
space.

idxmetastoredlen = int
Truncation length of stored metadata fields. This does not affect indexing (the
whole field is processed anyway), just the amount of data stored in the index for
the purpose of displaying fields inside result lists or previews. The default value
is 150 bytes which may be too low if you have custom fields.

idxtexttruncatelen = int
Truncation length for all document texts. Only index the beginning of documents.
This is not recommended except if you are sure that the interesting keywords are at
the top and have severe disk space issues.

aspellLanguage = string
Language definitions to use when creating the aspell dictionary. The value must
match a set of aspell language definition files. You can type "aspell dicts" to
see a list The default if this is not set is to use the NLS environment to guess
the value.

aspellAddCreateParam = string
Additional option and parameter to aspell dictionary creation command. Some aspell
packages may need an additional option (e.g. on Debian Jessie: --local-data-
dir=/usr/lib/aspell). See Debian bug 772415.

aspellKeepStderr = bool
Set this to have a look at aspell dictionary creation errors. There are always
many, so this is mostly for debugging.

noaspell = bool
Disable aspell use. The aspell dictionary generation takes time, and some
combinations of aspell version, language, and local terms, result in aspell
crashing, so it sometimes makes sense to just disable the thing.

monauxinterval = int
Auxiliary database update interval. The real time indexer only updates the
auxiliary databases (stemdb, aspell) periodically, because it would be too costly
to do it for every document change. The default period is one hour.

monixinterval = int
Minimum interval (seconds) between processings of the indexing queue. The real time
indexer does not process each event when it comes in, but lets the queue
accumulate, to diminish overhead and to aggregate multiple events affecting the
same file. Default 30 S.

mondelaypatterns = string
Timing parameters for the real time indexing. Definitions for files which get a
longer delay before reindexing is allowed. This is for fast-changing files, that
should only be reindexed once in a while. A list of wildcardPattern:seconds pairs.
The patterns are matched with fnmatch(pattern, path, 0) You can quote entries
containing white space with double quotes (quote the whole entry, not the pattern).
The default is empty. Example: mondelaypatterns = *.log:20 "*with spaces.*:30"

monioniceclass = int
ionice class for the real time indexing process On platforms where this is
supported. The default value is 3.

monioniceclassdata = string
ionice class parameter for the real time indexing process. On platforms where this
is supported. The default is empty.

autodiacsens = bool
auto-trigger diacritics sensitivity (raw index only). IF the index is not stripped,
decide if we automatically trigger diacritics sensitivity if the search term has
accented characters (not in unac_except_trans). Else you need to use the query
language and the "D" modifier to specify diacritics sensitivity. Default is no.

autocasesens = bool
auto-trigger case sensitivity (raw index only). IF the index is not stripped (see
indexStripChars), decide if we automatically trigger character case sensitivity if
the search term has upper-case characters in any but the first position. Else you
need to use the query language and the "C" modifier to specify character-case
sensitivity. Default is yes.

maxTermExpand = int
Maximum query expansion count for a single term (e.g.: when using wildcards). This
only affects queries, not indexing. We used to not limit this at all (except for
filenames where the limit was too low at 1000), but it is unreasonable with a big
index. Default 10000.

maxXapianClauses = int
Maximum number of clauses we add to a single Xapian query. This only affects
queries, not indexing. In some cases, the result of term expansion can be
multiplicative, and we want to avoid eating all the memory. Default 50000.

snippetMaxPosWalk = int
Maximum number of positions we walk while populating a snippet for the result list.
The default of 1,000,000 may be insufficient for very big documents, the
consequence would be snippets with possibly meaning-altering missing words.

pdfocr = bool
Attempt OCR of PDF files with no text content if both tesseract and pdftoppm are
installed. The default is off because OCR is so very slow.

pdfocrlang = string
Language to assume for PDF OCR. This is very important for having a reasonable rate
of errors with tesseract. This can also be set through a configuration variable or
directory-local parameters. See the rclpdf.py script.

pdfattach = bool
Enable PDF attachment extraction by executing pdftk (if available). This is
normally disabled, because it does slow down PDF indexing a bit even if not one
attachment is ever found.

pdfextrameta = string
Extract text from selected XMP metadata tags. This is a space-separated list of
qualified XMP tag names. Each element can also include a translation to a Recoll
field name, separated by a '|' character. If the second element is absent, the tag
name is used as the Recoll field names. You will also need to add specifications to
the "fields" file to direct processing of the extracted data.

pdfextrametafix = fn
Define name of XMP field editing script. This defines the name of a script to be
loaded for editing XMP field values. The script should define a 'MetaFixer' class
with a metafix() method which will be called with the qualified tag name and value
of each selected field, for editing or erasing. A new instance is created for each
document, so that the object can keep state for, e.g. eliminating duplicate values.

mhmboxquirks = string
Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory
where the email mbox files are stored.

NAME

DESCRIPTION

OPTIONS

SEE ALSO