Provided by: maildir-utils_1.0-6_amd64 bug


       mu_index - index e-mail messages stored in Maildirs


       mu index [options]


       mu  index  is  the mu command for scanning the contents of Maildir directories and storing
       the results in a Xapian database. The data can then be queried using mu-find(1).

       index understands Maildirs as defined by Daniel Bernstein for qmail(7).  In  addition,  it
       understands  recursive  Maildirs  (Maildirs  within Maildirs), Maildir++. It can also deal
       with VFAT-based Maildirs which use '!' as the separators instead of ':'.

       E-mail messages which are not stored in something resembling a maildir leaf-directory (cur
       and  new)  are  ignored,  as  are the cache directories for notmuch and gnus, and any dot-

       Symlinks are not followed.

       If there is a file called .noindex in a directory, the contents of that directory and  all
       of  its  subdirectories will be ignored. This can be useful to exclude certain directories
       from the indexing process, for example directories with spam-messages.

       If there is a file called .noupdate in a directory, the contents of that directory and all
       of  its subdirectories will be ignored, unless we do a full rebuild (with --rebuild). This
       can be useful to speed up things you have some maildirs that never change. Note  that  you
       can still search for these messages, this only affects updating the database.

       There also the --lazy-check which can greatly speed up indexing; see below for details.

       The  first  run  of  mu  index  may  take a few minutes if you have a lot of mail (tens of
       thousands of messages).  Fortunately, such a full scan needs to be done only  once;  after
       that  it  suffices  to  index  the  changes,  which  goes  much  faster.  See the 'Note on
       performance (i,ii,iii)' below for more information.

       The optional 'phase two' of the indexing-process is  the  removal  of  messages  from  the
       database  for  which there is no longer a corresponding file in the Maildir. If you do not
       want this, you can use -n, --nocleanup.

       When mu index catches one of the signals SIGINT, SIGHUP or SIGTERM (e.g., when  you  press
       Ctrl-C during the indexing process), it tries to shutdown gracefully; it tries to save and
       commit data, and close the database  etc.  If  it  receives  another  signal  (e.g.,  when
       pressing Ctrl-C once more), mu index will terminate immediately.


       Note,  some  of  the  general options are described in the mu(1) man-page and not here, as
       they apply to multiple mu commands.

       -m, --maildir=<maildir>
              starts searching at <maildir>. By default, mu uses whatever the MAILDIR environment
              variable  is  set  to; if it is not set, it tries ~/Maildir. See the note on mixing
              sub-maildirs below.

              specifies that some e-mail  address  is  'my-address'  (--my-address  can  be  used
              multiple  times).  This  is  used  by  mu  cfind -- any e-mail address found in the
              address fields of a message which also has <my-email-address> in one of its address
              fields  is  considered  a personal e-mail address. This allows you, for example, to
              filter out (mu cfind --personal) addresses which were merely seen in  mailing  list

              in  lazy-check mode, mu does not consider messages for which the time-stamp (ctime)
              of the directory they reside in has not changed since the  previous  indexing  run.
              This  is  much  faster than the non-lazy check, but won't update messages that have
              change (rather than having been added or removed), since merely editing  a  message
              does  not  update  the  directory  time-stamp.  Of  course,  you  can  run mu-index
              occasionally without --lazy-check, to pick up such messages.

              disables the database cleanup that mu does by default after indexing.

              clear all messages from the database before  indexing.  --rebuild  guarantees  that
              after  the  indexing  has  finished,  there  are  no 'old' messages in the database
              anymore, which is not true with --reindex when indexing only  a  part  of  messages
              (using  --maildir). For this reason, it is necessary to run mu index --rebuild when
              there is an upgrade in the database format. mu index will  issue  a  warning  about

              automatically  use -y, --empty when mu notices that the database version is not up-
              to-date. This option is for use in cron scripts and the like, so they won't require
              any user interaction, even when mu introduces a new database version.

       --xbatchsize=<batch size>
              set  the  maximum  number of messages to process in a single Xapian transaction. In
              practice, this option is only useful if you find that mu is running out  of  memory
              while  indexing;  in  that  case, you can set the batch size to (for example) 1000,
              which will reduce memory consumption, but also substantially  reduce  the  indexing

       --max-msg-size=<max msg size>
              set  the  maximum  size  (in bytes) for messages. The default maximum (currently at
              500Mb) should be enough in most cases, but if you encounter warnings from mu  about
              ignoring  messsage  because  they  are too big, you may want to increase this. Note
              that the reason for having a maximum size is that big messages require  big  memory
              allocations, which may lead to problems.

              NOTE:  It  is not recommended to mix maildirs and sub-maildirs within the hierarchy
              in  the  same  database;  for  example,  it's  better  not  to  index   both   with
              --maildir=~/MyMaildir and --maildir=~/MyMaildir/foo, as this may lead to unexpected
              results when searching with the 'maildir:' search parameter (see below).

   A note on performance (i)
       As a non-scientific benchmark, a simple test on the  author's  machine  (a  Thinkpad  X61s
       laptop  using  Linux  2.6.35  and  an  ext3  file system) with no existing database, and a
       maildir with 27273 messages:

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        66,65s user 6,05s system 27% cpu 4:24,20 total
       (about 103 messages per second)

       A second run, which is the more typical use case when there is a  database  already,  goes
       much faster:

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        0,48s user 0,76s system 10% cpu 11,796 total
       (more than 56818 messages per second)

       Note  that  each  test flushes the caches first; a more common use case might be to run mu
       index when new mail has arrived; the cache may stay quite 'warm' in that case:

        $ time mu index --quiet
        0,33s user 0,40s system 80% cpu 0,905 total
       which is more than 30000 messages per second.

   A note on performance (ii)
       As per June 2012, we did the same  non-scientific  benchmark,  this  time  with  an  Intel
       i5-2500  CPU  @  3.30GHz,  an ext4 file system and a maildir with 22589 messages. We start
       without an existing database.

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        27,79s user 2,17s system 48% cpu 1:01,47 total
       (about 813 messages per second)

       A second run, which is the more typical use case when there is a  database  already,  goes
       much faster:

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        0,13s user 0,30s system 19% cpu 2,162 total
       (more than 173000 messages per second)

   A note on performance (iii)
       As  per  July 2016, we did the same non-scientific benchmark, again with the Intel i5-2500
       CPU @ 3.30GHz, an ext4 file system. This time, the maildir contains 72525 messages.

        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
        $ time mu index --quiet
        40,34s user 2,56s system 64% cpu 1:06,17 total
       (about 1099 messages per second).

       As shown, mu has been getting faster with each release, even with relatively expensive new
       features such as text-normalization (for case-insensitve/accent-insensitive matching). The
       profiles are dominated by operations in the Xapian database now.


       By default, mu index stores its message database in  ~/.mu/xapian;  the  database  has  an
       embedded  version  number, and mu will automatically update it when it notices a different
       version. This allows for automatic updating of mu-versions, without the need to clear  out
       any old databases.

       However,  note  that  versions  of  mu  before 0.7 used a different scheme, which puts the
       database in ~/.mu/xapian-<version>. These older databases can safely be deleted.  Starting
       from version 0.7, this manual cleanup should no longer be needed.

       mu  stores  logs  of  its  operations  and queries in <muhome>/mu.log (by default, this is
       ~/.mu/mu.log). Upon startup, mu checks the size of this log file. If it exceeds 1  MB,  it
       will  be  moved to ~/.mu/mu.log.old, overwriting any existing file of that name, and start
       with an empty log file. This scheme allows for continued use of mu without  the  need  for
       any manual maintenance of log files.


       mu  index  uses MAILDIR to find the user's Maildir if it has not been specified explicitly
       with --maildir=<maildir>. If MAILDIR is not set, mu index will try ~/Maildir.


       mu index return 0 upon successful completion, and any other number greater than 0  signals
       an error.


       Please report bugs if you find them:


       Dirk-Jan C. Binnema <>


       maildir(5) mu(1) mu-find(1) mu-cfind(1)