Ubuntu Manpage: recoll.conf - main personal configuration file for Recoll

NAME

       recoll.conf - main personal configuration file for Recoll

DESCRIPTION

This file defines the index configuration for the Recoll full-text search system.

The system-wide configuration file is normally located inside
/usr/[local]/share/recoll/examples. Any parameter set in the common file may be overridden
by setting it in the personal configuration file, by default: $HOME/.recoll/recoll.conf

Please note while we try to keep this manual page reasonably up to date, it will
frequently lag the current state of the software. The best source of information about the
configuration are the comments in the system-wide configuration file.

A short extract of the file might look as follows:

# Space-separated list of directories to index.
topdirs = ~/docs /usr/share/doc

[~/somedirectory-with-utf8-txt-files]
defaultcharset = utf-8

There are three kinds of lines:

• Comment or empty

• Parameter affectation

• Section definition

Empty lines or lines beginning with # are ignored.

Affectation lines are in the form 'name = value'.

Section lines allow redefining a parameter for a directory subtree. Some of the parameters
used for indexing are looked up hierarchically from the more to the less specific. Not all
parameters can be meaningfully redefined, this is specified for each in the next section.

The tilde character (~) is expanded in file names to the name of the user's home
directory.

Where values are lists, white space is used for separation, and elements with embedded
spaces can be quoted with double-quotes.

OPTIONS

topdirs = directories
Specifies the list of directories to index (recursively).

skippedNames = patterns
A space-separated list of patterns for names of files or directories that should be
completely ignored. The list defined in the default file is:

*~ #* bin CVS Cache caughtspam tmp

The list can be redefined for subdirectories, but is only actually changed for the
top level ones in topdirs

skippedPaths = patterns
A space-separated list of patterns for paths the indexer should not descend into.
Together with topdirs, this allows pruning the indexed tree to one's content.
daemSkippedPaths can be used to define a specific value for the real time indexing
monitor.

skippedPathsFnmPathname = 0/1
The values in the *skippedPaths variables are matched by default with fnmatch(3),
with the FNM_PATHNAME and FNM_LEADING_DIR flags. This means that '/' characters
must be matched explicitly. You can set skippedPathsFnmPathname to 0 to disable the
use of FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3).

followLinks = boolean
Specifies if the indexer should follow symbolic links while walking the file tree.
The default is to ignore symbolic links to avoid multiple indexing of linked files.
No effort is made to avoid duplication when this option is set to true. This option
can be set individually for each of the topdirs members by using sections. It can
not be changed below the topdirs level.

indexedmimetypes = list
Recoll normally indexes any file which it knows how to read. This list lets you
restrict the indexed mime types to what you specify. If the variable is unspecified
or the list empty (the default), all supported types are processed.

compressedfilemaxkbs = value
Size limit for compressed (.gz or .bz2) files. These need to be decompressed in a
temporary directory for identification, which can be very wasteful if
'uninteresting' big compressed files are present. Negative means no limit, 0 means
no processing of any compressed file. Defaults to -1.

textfilemaxmbs = value
Maximum size for text files. Very big text files are often uninteresting logs. Set
to -1 to disable (default 20MB).

textfilepagekbs = value
If this is set to other than -1, text files will be indexed as multiple documents
of the given page size. This may be useful if you do want to index very big text
files as it will both reduce memory usage at index time and help with loading data
to the preview window. A size of a few megabytes would seem reasonable (default:
1000 : 1MB).

membermaxkbs = value in kilobytes
This defines the maximum size for an archive member (zip, tar or rar at the
moment). Bigger entries will be skipped. Current default: 50000 (50 MB).

indexallfilenames = boolean
Recoll indexes file names into a special section of the database to allow specific
file names searches using wild cards. This parameter decides if file name indexing
is performed only for files with mime types that would qualify them for full text
indexing, or for all files inside the selected subtrees, independent of mime type.

usesystemfilecommand = boolean
Decide if we use the file -i system command as a final step for determining the
mime type for a file (the main procedure uses suffix associations as defined in the
mimemap file). This can be useful for files with suffixless names, but it will also
cause the indexing of many bogus "text" files.

processbeaglequeue = 0/1
If this is set, process the directory where Beagle Web browser plugins copy visited
pages for indexing. Of course, Beagle MUST NOT be running, else things will behave
strangely.

beaglequeuedir = directorypath
The path to the Beagle indexing queue. This is hard-coded in the Beagle plugin as
~/.beagle/ToIndex so there should be no need to change it.

indexStripChars = 0/1
Decide if we strip characters of diacritics and convert them to lower-case before
terms are indexed. If we don't, searches sensitive to case and diacritics can be
performed, but the index will be bigger, and some marginal weirdness may sometimes
occur. The default is a stripped index (indexStripChars = 1) for now. When using
multiple indexes for a search, this parameter must be defined identically for all.
Changing the value implies an index reset.

maxTermExpand = value
Maximum expansion count for a single term (e.g.: when using wildcards). The default
of 10000 is reasonable and will avoid queries that appear frozen while the engine
is walking the term list.

maxXapianClauses = value
Maximum number of elementary clauses we can add to a single Xapian query. In some
cases, the result of term expansion can be multiplicative, and we want to avoid
using excessive memory. The default of 100 000 should be both high enough in most
cases and compatible with current typical hardware configurations.

nonumbers = 0/1
If this set to true, no terms will be generated for numbers. For example "123",
"1.5e6", 192.168.1.4, would not be indexed ("value123" would still be). Numbers are
often quite interesting to search for, and this should probably not be set except
for special situations, ie, scientific documents with huge amounts of numbers in
them. This can only be set for a whole index, not for a subtree.

nocjk = boolean
If this set to true, specific east asian (Chinese Korean Japanese) characters/word
splitting is turned off. This will save a small amount of cpu if you have no CJK
documents. If your document base does include such text but you are not interested
in searching it, setting nocjk may be a significant time and space saver.

cjkngramlen = value
This lets you adjust the size of n-grams used for indexing CJK text. The default
value of 2 is probably appropriate in most cases. A value of 3 would allow more
precision and efficiency on longer words, but the index will be approximately twice
as large.

indexstemminglanguages = languages
A list of languages for which the stem expansion databases will be built. See
recollindex(1) for possible values.

defaultcharset = charset
The name of the character set used for files that do not contain a character set
definition (ie: plain text files). This can be redefined for any subdirectory.

unac_except_trans = list of utf-8 groups
This is a list of characters, encoded in UTF-8, which should be handled specially
when converting text to unaccented lowercase. For example, in Swedish, the letter
"a with diaeresis" has full alphabet citizenship and should not be turned into an
a.
Each element in the space-separated list has the special character as first element
and the translation following. The handling of both the lowercase and upper-case
versions of a character should be specified, as appartenance to the list will turn-
off both standard accent and case processing.
Note that the translation is not limited to a single character.
This parameter cannot be redefined for subdirectories, it is global, because there
is no way to do otherwise when querying. If you have document sets which would need
different values, you will have to index and query them separately.

maildefcharset = charactersetname
This can be used to define the default character set specifically for email
messages which don't specify it. This is mainly useful for readpst (libpst) dumps,
which are utf-8 but do not say so.

localfields = fieldname = value:...
This allows setting fields for all documents under a given directory. Typical usage
would be to set an "rclaptg" field, to be used in mimeview to select a specific
viewer. If several fields are to be set, they should be separated with a colon
(':') character (which there is currently no way to escape). Ie: localfields=
rclaptg=gnus:other = val, then select specifier viewer with mimetype|tag=... in
mimeview.

dbdir = directory
The name of the Xapian database directory. It will be created if needed when the
database is initialized. If this is not an absolute pathname, it will be taken
relative to the configuration directory.

idxstatusfile = file path
The name of the scratch file where the indexer process updates its status. Default:
idxstatus.txt inside the configuration directory.

maxfsoccuppc = percentnumber
Maximum file system occupation before we stop indexing. The value is a percentage,
corresponding to what the "Capacity" df output column shows. The default value is
0, meaning no checking.

mboxcachedir = directory path
The directory where mbox message offsets cache files are held. This is normally
$RECOLL_CONFDIR/mboxcache, but it may be useful to share a directory between
different configurations.

mboxcacheminmbs = value in megabytes
The minimum mbox file size over which we cache the offsets. There is really no
sense in caching offsets for small files. The default is 5 MB.

webcachedir = directory path
This is only used by the Beagle web browser plugin indexing code, and defines where
the cache for visited pages will live. Default: $RECOLL_CONFDIR/webcache

webcachemaxmbs = value in megabytes
This is only used by the Beagle web browser plugin indexing code, and defines the
maximum size for the web page cache. Default: 40 MB.

idxflushmb = megabytes
Threshold (megabytes of new text data) where we flush from memory to disk index.
Setting this can help control memory usage. A value of 0 means no explicit
flushing, letting Xapian use its own default, which is flushing every 10000
documents (or XAPIAN_FLUSH_THRESHOLD), meaning that memory usage depends on average
document size. The default value is 10.

autodiacsens = 0/1
IF the index is not stripped, decide if we automatically trigger diacritics
sensitivity if the search term has accented characters (not in unac_except_trans).
Else you need to use the query language and the D modifier to specify diacritics
sensitivity. Default is no.

autocasesens = 0/1
IF the index is not stripped, decide if we automatically trigger character case
sensitivity if the search term has upper-case characters in any but the first
position. Else you need to use the query language and the C modifier to specify
character-case sensitivity. Default is yes.

loglevel = value
Verbosity level for recoll and recollindex. A value of 4 lists quite a lot of
debug/information messages. 3 lists only errors. daemloglevel can be used to
specify a different value for the real-time indexing daemon.

logfilename = file
Where should the messages go. 'stderr' can be used as a special value.
daemlogfilename can be used to specify a different value for the real-time indexing
daemon.

mondelaypatterns = list of patterns
This allows specify wildcard path patterns (processed with fnmatch(3) with 0 flag),
to match files which change too often and for which a delay should be observed
before re-indexing. This is a space-separated list, each entry being a pattern and
a time in seconds, separated by a colon. You can use double quotes if a path entry
contains white space. Example:

mondelaypatterns = *.log:20 "this one has spaces*:10"

monixinterval = value in seconds
Minimum interval (seconds) for processing the indexing queue. The real time monitor
does not process each event when it comes in, but will wait this time for the queue
to accumulate to diminish overhead and in order to aggregate multiple events to the
same file. Default 30 S.

monauxinterval = value in seconds
Period (in seconds) at which the real time monitor will regenerate the auxiliary
databases (spelling, stemming) if needed. The default is one hour.

monioniceclass, monioniceclassdata
These allow defining the ionice class and data used by the indexer (default class
3, no data).

filtermaxseconds = value in seconds
Maximum filter execution time, after which it is aborted. Some postscript programs
just loop...

filtersdir = directory
A directory to search for the external filter scripts used to index some types of
files. The value should not be changed, except if you want to modify one of the
default scripts. The value can be redefined for any subdirectory.

iconsdir = directory
The name of the directory where recoll result list icons are stored. You can change
this if you want different images.

idxabsmlen = value
Recoll stores an abstract for each indexed file inside the database. The text can
come from an actual 'abstract' section in the document or will just be the
beginning of the document. It is stored in the index so that it can be displayed
inside the result lists without decoding the original file. The idxabsmlen
parameter defines the size of the stored abstract. The default value is 250 bytes.
The search interface gives you the choice to display this stored text or a
synthetic abstract built by extracting text around the search terms. If you always
prefer the synthetic abstract, you can reduce this value and save a little space.

aspellLanguage = lang
Language definitions to use when creating the aspell dictionary. The value must
match a set of aspell language definition files. You can type "aspell config" to
see where these are installed (look for data-dir). The default if the variable is
not set is to use your desktop national language environment to guess the value.

noaspell = boolean
If this is set, the aspell dictionary generation is turned off. Useful for cases
where you don't need the functionality or when it is unusable because aspell
crashes during dictionary generation.

mhmboxquirks = flags
This allows definining location-related quirks for the mailbox handler. Currently
only the tbird flag is defined, and it should be set for directories which hold
Thunderbird data, as their folder format is weird.

NAME

DESCRIPTION

OPTIONS

SEE ALSO