Provided by: public-inbox_1.8.0-1_all bug

NAME

       public-inbox-v1-format - git repository and tree description (aka "ssoma")

DESCRIPTION

       WARNING: this does NOT describe the scalable v2 format used by public-inbox.  Use of ssoma
       is not recommended for new installations due to scalability problems.

       ssoma uses a git repository to store each email as a git blob.  The tree filename of the
       blob is based on the SHA1 hexdigest of the first Message-ID header.  A commit is made for
       each message delivered.  The commit SHA-1 identifier is used by ssoma clients to track
       synchronization state.

PATHNAMES IN TREES

       A Message-ID may be extremely long and also contain slashes, so using them as a path name
       is challenging.  Instead we use the SHA-1 hexdigest of the Message-ID (excluding the
       leading "<" and trailing ">") to generate a path name.  Leading and trailing white space
       in the Message-ID header is ignored for hashing.

       A message with Message-ID of: <20131106023245.GA20224@dcvr.yhbt.net>

       Would be stored as: f2/8c6cfd2b0a65f994c3e1be266105413b3d3f63

       Thus it is easy to look up the contents of a message matching a given a Message-ID.

MESSAGE-ID CONFLICTS

       public-inbox v1 repositories currently do not resolve conflicting Message-IDs or messages
       with multiple Message-IDs.

HEADERS

       The Message-ID header is required.  "Bytes", "Lines" and "Content-Length" headers are
       stripped and not allowed, they can interfere with further processing.  When using ssoma
       with public-inbox-mda, the "Status" mbox header is also stripped as that header makes no
       sense in a public archive.

LOCKING

       flock(2) locking exclusively locks the empty $GIT_DIR/ssoma.lock file for all non-atomic
       operations.

EXAMPLE INPUT FLOW (SERVER-SIDE MDA)

       1. Message is delivered to a mail transport agent (MTA)

       1a. (optional) reject/discard spam, this should run before ssoma-mda

       1b. (optional) reject/strip unwanted attachments

       ssoma-mda handles all steps once invoked.

       2. Mail transport agent invokes ssoma-mda

       3. reads message via stdin, extracting Message-ID

       4. acquires exclusive flock lock on $GIT_DIR/ssoma.lock

       5. creates or updates the blob of associated 2/38 SHA-1 path

       6. updates the index and commits

       7. releases $GIT_DIR/ssoma.lock

       ssoma-mda can also be used as an inotify(7) trigger to monitor maildirs, and the ability
       to monitor IMAP mailboxes using IDLE will be available in the future.

GIT REPOSITORIES (SERVERS)

       ssoma uses bare git repositories on both servers and clients.

       Using the git-init(1) command with --bare is the recommend method of creating a git
       repository on a server:

               git init --bare /path/to/wherever/you/want.git

       There are no standardized paths for servers, administrators make all the choices regarding
       git repository locations.

       Special files in $GIT_DIR on the server:

       $GIT_DIR/ssoma.lock
           An empty file for flock(2) locking.  This is necessary to ensure the index and commits
           are updated consistently and multiple processes running MDA do not step on each other.

       $GIT_DIR/public-inbox/msgmap.sqlite3
           SQLite3 database maintaining a stable mapping of Message-IDs to NNTP article numbers.
           Used by public-inbox-nntpd(1) and created and updated by public-inbox-index(1).

           Users of the PublicInbox::WWW interface will find it useful for attempting recovery
           from copy-paste truncations of URLs containing long Message-IDs.

           Automatically updated by public-inbox-mda(1), public-inbox-learn(1) and
           public-inbox-watch(1).

           Losing or damaging this file will cause synchronization problems for NNTP clients.
           This file is expected to be stable and require no updates to its schema.

           Requires DBD::SQLite.

       $GIT_DIR/public-inbox/xapian$N/
           Xapian database for search indices in the PSGI web UI.

           $N is the value of PublicInbox::Search::SCHEMA_VERSION, and installations may have
           parallel versions on disk during upgrades or to roll-back upgrades.

           This is created and updated by public-inbox-index(1).

           Automatically updated by public-inbox-mda(1), public-inbox-learn(1) and
           public-inbox-watch(1).

           This directory can always be regenerated with public-inbox-index(1).  If lost or
           damaged, there is no need to back it up unless the CPU/memory cost of regenerating it
           outweighs the storage/transfer cost.

           Since SCHEMA_VERSION 15 and the development of the v2 format, the "overview" DB also
           exists in the xapian directory for v1 repositories.  See "OVERVIEW DB" in
           public-inbox-v2-format(5)

           Our use of the "OVERVIEW DB" requires Xapian document IDs to remain stable.  Using
           public-inbox-compact(1) and public-inbox-xcpdb(1) wrappers are recommended over tools
           provided by Xapian.

           This directory is large, often two to three times the size of the objects stored in a
           packed git repository.

       $GIT_DIR/ssoma.index
           This file is no longer used or created by public-inbox, but it is updated if it exists
           to remain compatible with ssoma installations.

           A git index file used for MDA updates.  The normal git index (in $GIT_DIR/index) is
           not used at all as there is typically no working tree.

       Each client $GIT_DIR may have multiple mbox/maildir/command targets.  It is possible for a
       client to extract the mail stored in the git repository to multiple mboxes for
       compatibility with a variety of different tools.

CAVEATS

       It is NOT recommended to check out the working directory of a git.  there may be many
       files.

       It is impossible to completely expunge messages, even spam, as git retains full history.
       Projects may (with adequate notice) cycle to new repositories/branches with history
       cleaned up via git-filter-repo(1) or git-filter-branch(1).  This is up to the
       administrators.

COPYRIGHT

       Copyright 2013-2021 all contributors <mailto:meta@public-inbox.org>

       License: AGPL-3.0+ <http://www.gnu.org/licenses/agpl-3.0.txt>

SEE ALSO

       gitrepository-layout(5), ssoma(1)