Provided by: maildir-utils_1.12.7-1_amd64 bug

NAME

       mu-index - index e-mail messages stored in Maildirs

SYNOPSIS

       mu [COMMON-OPTIONS] index

DESCRIPTION

       mu  index  is  the mu command for scanning the contents of Maildir directories and storing
       the results in a Xapian database. The data can then be queried using mu-find(1).

       Before the first time you run mu index, you must run mu init to initialize the database.

       index understands Maildirs as defined by Daniel Bernstein for qmail(7).  In  addition,  it
       understands  recursive  Maildirs  (Maildirs  within Maildirs), Maildir++. It also supports
       VFAT-based Maildirs which use ! or ; as the separators instead of :.

       E-mail messages which are not stored in  something  resembling  a  maildir  leaf-directory
       (_cur_  and  new)  are ignored, as are the cache directories for notmuch and gnus, and any
       dot-directory.

       Symlinks are followed, and the  directories  can  be  spread  over  multiple  filesystems;
       however  note  that  moving  files around is much faster when multiple filesystems are not
       involved. Be careful to avoid self-referential symlinks!

       If there is a file called .noindex in a directory, the contents of that directory and  all
       of  its  subdirectories will be ignored. This can be useful to exclude certain directories
       from the indexing process, for example directories with spam-messages.

       If there is a file called .noupdate in a directory, the contents of that directory and all
       of its subdirectories will be ignored. This can be useful to speed up things you have some
       maildirs that never change.

       .noupdate does not  affect  already-indexed  message:  you  can  still  search  for  them.
       .noupdate  is  ignored  when  you  start indexing with an empty database (such as directly
       after mu init).

       There also the option --lazy-check which can greatly speed  up  indexing;  see  below  for
       details.

       The  first  run  of  mu  index  may  take a few minutes if you have a lot of mail (tens of
       thousands of messages). Fortunately, such a full scan needs to be done  only  once;  after
       that  it  suffices  to  index  the  changes,  which goes much faster. See the `PERFORMANCE
       (i,ii,iii)' below for more information.

       The optional `phase two' of the indexing-process is  the  removal  of  messages  from  the
       database  for which there is no longer a corresponding file in the Maildir.  If you do not
       want this, you can use -n, --nocleanup.

       When mu index catches one of the signals SIGINT, SIGHUP or SIGTERM (e.g., when  you  press
       Ctrl-C  during the indexing process), it attempts to shutdown gracefully; it tries to save
       and commit data, and close the database etc. If it receives  another  signal  (e.g.,  when
       pressing Ctrl-C once more), mu index will terminate immediately.

INDEX OPTIONS

   --lazy-check
       In  lazy-check mode, mu does not consider messages for which the time-stamp (ctime) of the
       directory they reside in has not changed since the previous indexing  run.  This  is  much
       faster  than  the  non-lazy check, but won't update messages that have change (rather than
       having been added or removed),  since  merely  editing  a  message  does  not  update  the
       directory  time-stamp.  Of course, you can run mu-index occasionally without --lazy-check,
       to pick up such messages.

   --nocleanup
       Disable the database cleanup that mu does by default after indexing.

   --reindex
       Perform a complete reindexing of all the messages in the maildir.

   --muhome
       Use a non-default directory to store and read the  database,  write  the  logs,  etc.   By
       default,  mu uses the XDG Base Directory Specification (e.g. on GNU/Linux this defaults to
       ~/.cache/mu and ~/.config/mu). Earlier versions  of  mu  defaulted  to  ~/.mu,  which  now
       requires --muhome=~/.mu.

       The  environment variable MUHOME can be used as an alternative to --muhome. The latter has
       precedence.

COMMON OPTIONS

   -d, --debug
       Makes mu generate extra debug information, useful for debugging the program itself.  Debug
       information goes to the standard logging location; see mu(1).

   -q, --quiet
       Causes  mu  not  to  output  informational  messages  and progress information to standard
       output, but only to the log file. Error messages will still be  sent  to  standard  error.
       Note  that  mu index is much faster with --quiet, so it is recommended you use this option
       when using mu from scripts etc.

   --log-stderr
       Causes mu to not output log messages to standard error, in addition to sending them to the
       standard logging location.

   --nocolor
       Do not use ANSI colors. The environment variable NO_COLOR can be used as an alternative to
       --nocolor.

   -V, --version
       Prints mu version and copyright information.

   -h, --help
       Lists the various command line options.

ENCRYPTION

       mu index does not decrypt messages, and only the metadata (such as headers)  of  encrypted
       messages  makes  it to the database. mu view and mu4e can decrypt messages, but those work
       with the message directly and the information is not added to the database.

PERFORMANCE

   indexing in ancient times (2009?)
       As a non-scientific benchmark, a simple test on the  author's  machine  (a  Thinkpad  X61s
       laptop  using  Linux  2.6.35  and  an  ext3  file system) with no existing database, and a
       maildir with 27273 messages:

              $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
              $ time mu index --quiet
              66,65s user 6,05s system 27% cpu 4:24,20 total

       (about 103 messages per second)

       A second run, which is the more typical use case when there is a  database  already,  goes
       much faster:

              $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
              $ time mu index --quiet
              0,48s user 0,76s system 10% cpu 11,796 total

       (more than 56818 messages per second)

       Note  that  each  test flushes the caches first; a more common use case might be to run mu
       index when new mail has arrived; the cache may stay quite `warm' in that case:

              $ time mu index --quiet
              0,33s user 0,40s system 80% cpu 0,905 total

       which is more than 30000 messages per second.

   indexing in 2012
       As per June 2012, we did the same  non-scientific  benchmark,  this  time  with  an  Intel
       i5-2500  CPU  @  3.30GHz,  an ext4 file system and a maildir with 22589 messages. We start
       without an existing database.

              $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
              $ time mu index --quiet
              27,79s user 2,17s system 48% cpu 1:01,47 total

       (about 813 messages per second)

       A second run, which is the more typical use case when there is a  database  already,  goes
       much faster:

              $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
              $ time mu index --quiet
              0,13s user 0,30s system 19% cpu 2,162 total

       (more than 173000 messages per second)

   indexing in 2016
       As  per  July 2016, we did the same non-scientific benchmark, again with the Intel i5-2500
       CPU @ 3.30GHz, an ext4 file system. This time, the maildir contains 72525 messages.

              $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
              $ time mu index --quiet
              40,34s user 2,56s system 64% cpu 1:06,17 total

       (about 1099 messages per second).

   indexing in 2022
       A few years later and it is June 2022. There's a lot more happening during  indexing,  but
       indexing  became  multi-threaded  and  machines are faster; e.g. this is with an AMD Ryzen
       Threadripper 1950X (16 cores) @ 3.399GHz.

       The instructions are a little different since we have a proper repeatable  benchmark  now.
       After building,

               $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
              % THREAD_NUM=4 build/lib/tests/bench-indexer -m perf
              # random seed: R02Sf5c50e4851ec51adaf301e0e054bd52b
              1..1
              # Start of bench tests
              # Start of indexer tests
              indexed 5000 messages in 20 maildirs in 3763ms; 752 μs/message; 1328 messages/s (4 thread(s))
              ok 1 /bench/indexer/4-cores
              # End of indexer tests
              # End of bench tests

       Things  are  again  a  little  faster,  even  though  the index does a lot more now (text-
       normalizatian, and pre-generating message-sexps). A faster machine helps, too!

   recent releases
       Indexing the the same 93000-message mail corpus with the last few releases:

                ┌──────────────────────────────────────────────────────────────────────┐
                │      release   time (sec)   notes                                    │
                ├──────────────────────────────────────────────────────────────────────┤
                │          1.4   160s                                                  │
                │          1.6   178s                                                  │
                │          1.8   97s                                                   │
                │         1.10   120s         adds html indexing, sexp-caching         │
                │1.11 (master)   96s          adds language-guessing, batch-size=50000 │
                └──────────────────────────────────────────────────────────────────────┘

       Quite some variation!

       Over time new features / refactoring can change the timings quite a bit. At least for now,
       the latest code is both the fastest and the most featureful!

EXIT CODE

       This command returns 0 upon successful completion, or a non-zero exit code otherwise.

       0.  success

       2.  no matches found. Try a different query

       11. database schema mismatch. You need to re-initialize mu, see mu-init(1)

       19. failed to acquire lock. Some other program has exclusive access to the mu database

       99. caught an exception

REPORTING BUGS

       Please report bugs at https://github.com/djcb/mu/issues.

AUTHOR

       Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>

COPYRIGHT

       This manpage is part of mu 1.12.7.

       Copyright  ©  2008-2024  Dirk-Jan  C.  Binnema. License GPLv3+: GNU GPL version 3 or later
       https://gnu.org/licenses/gpl.html. This is free software:  you  are  free  to  change  and
       redistribute it. There is NO WARRANTY, to the extent permitted by law.

SEE ALSO

       maildir(5), mu(1), mu-init(1), mu-find(1), mu-cfind(1)

                                                                                      MU INDEX(1)