Provided by: maildir-utils_1.8.14-1_amd64 bug

NAME

       mu_index - index e-mail messages stored in Maildirs

SYNOPSIS

       mu index [options]

DESCRIPTION

       mu  index  is  the mu command for scanning the contents of Maildir directories and storing
       the results in a Xapian database. The data can then be queried using mu-find(1).

       Before the first time you run mu index, you must run mu init to initialize the database.

       index understands Maildirs as defined by Daniel Bernstein for qmail(7).  In  addition,  it
       understands  recursive  Maildirs  (Maildirs  within Maildirs), Maildir++. It can also deal
       with VFAT-based Maildirs which use '!'  or ';' as the separators instead of ':'.

       E-mail messages which are not stored in something resembling a maildir leaf-directory (cur
       and  new)  are  ignored,  as  are the cache directories for notmuch and gnus, and any dot-
       directory.

       Starting  with  mu  1.5.x,  symlinks  are  followed,  and  can  be  spread  over  multiple
       filesystems;  however  note  that  moving  files  around  is  much  faster  when  multiple
       filesystems are not involved.

       If there is a file called .noindex in a directory, the contents of that directory and  all
       of  its  subdirectories will be ignored. This can be useful to exclude certain directories
       from the indexing process, for example directories with spam-messages.

       If there is a file called .noupdate in a directory, the contents of that directory and all
       of  its  subdirectories  will be ignored, unless we do a full rebuild (with mu init). This
       can be useful to speed up things you have some maildirs that never change. Note  that  you
       can still search for these messages, this only affects updating the database. .noupdate is
       ignored when you start indexing with an empty database (such as directly after mu init.

       There also the --lazy-check which can greatly speed up indexing; see below for details.

       The first run of mu index may take a few minutes if you  have  a  lot  of  mail  (tens  of
       thousands  of  messages).  Fortunately, such a full scan needs to be done only once; after
       that it suffices to index  the  changes,  which  goes  much  faster.   See  the  'Note  on
       performance (i,ii,iii)' below for more information.

       The  optional  'phase  two'  of  the  indexing-process is the removal of messages from the
       database for which there is no longer a corresponding file in the Maildir.  If you do  not
       want this, you can use -n, --nocleanup.

       When  mu  index catches one of the signals SIGINT, SIGHUP or SIGTERM (e.g., when you press
       Ctrl-C during the indexing process), it tries to shutdown gracefully; it tries to save and
       commit  data,  and  close  the  database  etc.  If  it receives another signal (e.g., when
       pressing Ctrl-C once more), mu index will terminate immediately.

OPTIONS

       Some of the general options are described in the mu(1) man-page  and  not  here,  as  they
       apply to multiple mu commands.

       --lazy-check
              in  lazy-check mode, mu does not consider messages for which the time-stamp (ctime)
              of the directory they reside in has not changed since the  previous  indexing  run.
              This  is  much  faster than the non-lazy check, but won't update messages that have
              change (rather than having been added or removed), since merely editing  a  message
              does  not  update  the  directory  time-stamp.  Of  course,  you  can  run mu-index
              occasionally without --lazy-check, to pick up such messages.

       --nocleanup
              disables the database cleanup that mu does by default after indexing.

   A note on performance (i)
       As a non-scientific benchmark, a simple test on the  author's  machine  (a  Thinkpad  X61s
       laptop  using  Linux  2.6.35  and  an  ext3  file system) with no existing database, and a
       maildir with 27273 messages:

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        66,65s user 6,05s system 27% cpu 4:24,20 total
       (about 103 messages per second)

       A second run, which is the more typical use case when there is a  database  already,  goes
       much faster:

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        0,48s user 0,76s system 10% cpu 11,796 total
       (more than 56818 messages per second)

       Note  that  each  test flushes the caches first; a more common use case might be to run mu
       index when new mail has arrived; the cache may stay quite 'warm' in that case:

        $ time mu index --quiet
        0,33s user 0,40s system 80% cpu 0,905 total
       which is more than 30000 messages per second.

   A note on performance (ii)
       As per June 2012, we did the same  non-scientific  benchmark,  this  time  with  an  Intel
       i5-2500  CPU  @  3.30GHz,  an ext4 file system and a maildir with 22589 messages. We start
       without an existing database.

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        27,79s user 2,17s system 48% cpu 1:01,47 total
       (about 813 messages per second)

       A second run, which is the more typical use case when there is a  database  already,  goes
       much faster:

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        0,13s user 0,30s system 19% cpu 2,162 total
       (more than 173000 messages per second)

   A note on performance (iii)
       As  per  July 2016, we did the same non-scientific benchmark, again with the Intel i5-2500
       CPU @ 3.30GHz, an ext4 file system. This time, the maildir contains 72525 messages.

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        40,34s user 2,56s system 64% cpu 1:06,17 total
       (about 1099 messages per second).

   A note on performance (iv)
       A few years later and its June 2022. There's a lot more  happening  during  indexing,  but
       indexing  became  multi-threaded  and  machines are faster; e.g. this is with an AMD Ryzen
       Threadripper 1950X (32) @ 3.399GHz.

       The instructions are a little different since we have a proper repeatable  benchmark  now.
       After building,

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
       % THREAD_NUM=4 build/lib/tests/bench-indexer -m perf
       # random seed: R02Sf5c50e4851ec51adaf301e0e054bd52b
       1..1
       # Start of bench tests
       # Start of indexer tests
       indexed 5000 messages in 20 maildirs in 3763ms; 752 μs/message; 1328 messages/s (4 thread(s))
       ok 1 /bench/indexer/4-cores
       # End of indexer tests
       # End of bench tests

       Things  are  again  a  little  faster,  even  though  the index does a lot more now (text-
       normalizatian, and pre-generating message-sexps). A faster machine helps, too!

RETURN VALUE

       mu index return 0 upon successful completion; any other number signals an error.

BUGS

       Please report bugs if you find any: https://github.com/djcb/mu/issues

AUTHOR

       Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>

SEE ALSO

       maildir(5), mu(1), mu-init(1), mu-find(1), mu-cfind(1)