jammy (1) mu-index.1.gz

Provided by: maildir-utils_1.6.10-1_amd64 bug

NAME

       mu_index - index e-mail messages stored in Maildirs

SYNOPSIS

       mu index [options]

DESCRIPTION

       mu  index  is  the mu command for scanning the contents of Maildir directories and storing
       the results in a Xapian database. The data can then be queried using mu-find(1).

       Note that before the first time you run mu index, you must run mu init to  initialize  the
       database.

       index  understands  Maildirs  as defined by Daniel Bernstein for qmail(7). In addition, it
       understands recursive Maildirs (Maildirs within Maildirs), Maildir++.  It  can  also  deal
       with VFAT-based Maildirs which use '!' or ';' as the separators instead of ':'.

       E-mail messages which are not stored in something resembling a maildir leaf-directory (cur
       and new) are ignored, as are the cache directories for notmuch  and  gnus,  and  any  dot-
       directory.

       Starting  with  mu  1.5.x,  symlinks  are  followed,  and  can  be  spread  over  multiple
       filesystems;  however  note  that  moving  files  around  is  much  faster  when  multiple
       filesystems are not involved.

       If  there is a file called .noindex in a directory, the contents of that directory and all
       of its subdirectories will be ignored. This can be useful to exclude  certain  directories
       from the indexing process, for example directories with spam-messages.

       If there is a file called .noupdate in a directory, the contents of that directory and all
       of its subdirectories will be ignored, unless we do a full rebuild (with  mu  init).  This
       can  be  useful to speed up things you have some maildirs that never change. Note that you
       can still search for these messages, this only affects updating the database. .noupdate is
       ignored when you start indexing with an empty database (such as directly after mu init.

       There also the --lazy-check which can greatly speed up indexing; see below for details.

       The  first  run  of  mu  index  may  take a few minutes if you have a lot of mail (tens of
       thousands of messages).  Fortunately, such a full scan needs to be done only  once;  after
       that  it  suffices  to  index  the  changes,  which  goes  much  faster.  See the 'Note on
       performance (i,ii,iii)' below for more information.

       The optional 'phase two' of the indexing-process is  the  removal  of  messages  from  the
       database  for  which there is no longer a corresponding file in the Maildir. If you do not
       want this, you can use -n, --nocleanup.

       When mu index catches one of the signals SIGINT, SIGHUP or SIGTERM (e.g., when  you  press
       Ctrl-C during the indexing process), it tries to shutdown gracefully; it tries to save and
       commit data, and close the database  etc.  If  it  receives  another  signal  (e.g.,  when
       pressing Ctrl-C once more), mu index will terminate immediately.

OPTIONS

       Note,  some  of  the  general options are described in the mu(1) man-page and not here, as
       they apply to multiple mu commands.

       --lazy-check
              in lazy-check mode, mu does not consider messages for which the time-stamp  (ctime)
              of  the  directory  they reside in has not changed since the previous indexing run.
              This is much faster than the non-lazy check, but won't update  messages  that  have
              change  (rather  than having been added or removed), since merely editing a message
              does not  update  the  directory  time-stamp.  Of  course,  you  can  run  mu-index
              occasionally without --lazy-check, to pick up such messages.

       --nocleanup
              disables the database cleanup that mu does by default after indexing.

   A note on performance (i)
       As  a  non-scientific  benchmark,  a  simple test on the author's machine (a Thinkpad X61s
       laptop using Linux 2.6.35 and an ext3 file  system)  with  no  existing  database,  and  a
       maildir with 27273 messages:

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        66,65s user 6,05s system 27% cpu 4:24,20 total
       (about 103 messages per second)

       A  second  run,  which is the more typical use case when there is a database already, goes
       much faster:

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        0,48s user 0,76s system 10% cpu 11,796 total
       (more than 56818 messages per second)

       Note that each test flushes the caches first; a more common use case might be  to  run  mu
       index when new mail has arrived; the cache may stay quite 'warm' in that case:

        $ time mu index --quiet
        0,33s user 0,40s system 80% cpu 0,905 total
       which is more than 30000 messages per second.

   A note on performance (ii)
       As  per  June  2012,  we  did  the  same non-scientific benchmark, this time with an Intel
       i5-2500 CPU @ 3.30GHz, an ext4 file system and a maildir with  22589  messages.  We  start
       without an existing database.

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        27,79s user 2,17s system 48% cpu 1:01,47 total
       (about 813 messages per second)

       A  second  run,  which is the more typical use case when there is a database already, goes
       much faster:

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        0,13s user 0,30s system 19% cpu 2,162 total
       (more than 173000 messages per second)

   A note on performance (iii)
       As per July 2016, we did the same non-scientific benchmark, again with the  Intel  i5-2500
       CPU @ 3.30GHz, an ext4 file system. This time, the maildir contains 72525 messages.

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        40,34s user 2,56s system 64% cpu 1:06,17 total
       (about 1099 messages per second).

       As shown, mu has been getting faster with each release, even with relatively expensive new
       features such as text-normalization (for case-insensitve/accent-insensitive matching). The
       profiles are dominated by operations in the Xapian database now.

FILES

       mu  stores  logs  of  its  operations  and queries in <muhome>/mu.log (by default, this is
       ~/.cache/mu/mu.log). Upon startup, mu checks the size of this log file. If  it  exceeds  1
       MB,  it  will  be  moved  to ~/.cache/mu/mu.log.old, overwriting any existing file of that
       name, and start with an empty log file. This scheme allows for continued use of mu without
       the need for any manual maintenance of log files.

ENVIRONMENT

       mu  index  uses MAILDIR to find the user's Maildir if it has not been specified explicitly
       with --maildir=<maildir>. If MAILDIR is not set, mu index will try ~/Maildir.

RETURN VALUE

       mu index return 0 upon successful completion, and any other number greater than 0  signals
       an error.

BUGS

       Please report bugs if you find them: https://github.com/djcb/mu/issues

AUTHOR

       Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>

SEE ALSO

       maildir(5), mu(1), mu-init(1), mu-find(1), mu-cfind(1)