Provided by: inn2_2.7.3~20241006-1_amd64 bug

NAME

       storage.conf - Configuration file for storage manager

DESCRIPTION

       The file pathetc/storage.conf contains the rules to be used in assigning articles to
       different storage methods.  These rules determine where incoming articles will be stored.

       The storage manager is a unified interface between INN and a variety of different storage
       methods, allowing the news administrator to choose between different storage methods with
       different trade-offs (or even use several at the same time for different newsgroups, or
       articles of different sizes).  The rest of INN need not care what type of storage method
       was used for a given article; the storage manager will figure this out automatically when
       that article is retrieved via the storage API.  Note that you may also want to see the
       options provided in inn.conf(5) regarding article storage.

       The storage.conf file consists of a series of storage method entries.  Blank lines and
       lines beginning with a number sign ("#") are ignored.  The maximum number of characters in
       each line is 255.  The order of entries in this file is important, see below.

       Each entry specifies a storage method and a set of rules.  Articles which match all of the
       rules of a storage method entry will be stored using that storage method; if an article
       matches multiple storage method entries, the first one will be used.  Each entry is
       formatted as follows:

           method <methodname> {
               newsgroups: <wildmat>
               class: <storage_class>
               size: <minsize>[,<maxsize>]
               expires: <mintime>[,<maxtime>]
               options: <options>
               exactmatch: <bool>
               filtered: <bool>
               path: <wildmat>
           }

       If spaces or tabs are included in a value, that value must be enclosed in double quotes
       ("").  If either a number sign ("#") or a double quote are meant to be included verbatim
       in a value, they should be escaped with "\".

       <methodname> is the name of a storage method to use for articles which match the rules of
       this entry.  The currently available storage methods are:

           cnfs
           timecaf
           timehash
           tradspool
           trash

       See the "STORAGE METHODS" section below for more details.

       The meanings of the keys in each storage method entry are as follows:

       newsgroups: <wildmat>
           What newsgroups are stored using this storage method.  <wildmat> is a uwildmat pattern
           which is matched against the newsgroups an article is posted to.  If storeonxref in
           inn.conf is true, this pattern will be matched against the newsgroup names in the Xref
           header field body; otherwise, it will be matched against the newsgroup names in the
           Newsgroups header field body (see inn.conf(5) for discussion of the differences
           between these possibilities).  Poison wildmat expressions (expressions starting with
           "@") are allowed and can be used to exclude certain group patterns: articles
           crossposted to poisoned newsgroups will not be stored using this storage method.  The
           <wildmat> pattern is matched in order.

           There is no default newsgroups pattern; if an entry should match all newsgroups, use
           an explicit "newsgroups: *".

       class: <storage_class>
           An identifier for this storage method entry.  <storage_class> should be a number
           between 0 and 255.  It should be unique across all of the entries in this file.  It is
           mainly used for specifying expiration times by storage class as described in
           expire.ctl(5); "timehash" and "timecaf" will also set the top-level directory in which
           articles accepted by this storage class are stored.  Storage classes can be for
           instance numbered sequentially in storage.conf.

           The assignment of a particular number to a storage class is arbitrary but permanent
           (since it is used in storage tokens).  As a matter of fact, an article is assigned a
           storage class depending on the storage rules in effect at the time of its arrival.
           This identifier will not change even if storage.conf is modified afterwards and the
           same article would have been assigned another storage class, had it been received
           after that change.  The article is still perfectly valid and retrievable.  The only
           difference will be for expiration with groupbaseexpiry set to false in inn.conf: the
           rules in expire.ctl apply to the storage class assigned to articles at their arrival.

       size: <minsize>[,<maxsize>]
           A range of article sizes (in bytes) which should be stored using this storage method.
           If <maxsize> is "0" or not given, the upper size of articles is limited only by
           maxartsize in inn.conf.  The size: field is optional and may be omitted entirely if
           you want articles of any size to be stored in this storage method (if, of course,
           these articles fulfill all the other requirements of this storage method entry).  By
           default, <minsize> is set to "0".

       expires: <mintime>[,<maxtime>]
           A range of article expiration times which should be stored using this storage method.
           Be careful; this is less useful than it may appear at first.  This is based only on
           the Expires header field of the article, not on any local expiration policies or
           anything in expire.ctl!  If <mintime> is non-zero, then this entry will not match any
           article without an Expires header field.  This key is therefore only really useful for
           assigning articles with requested longer expire times to a separate storage method.
           Articles only match if the time until expiration (that is to say, the amount of time
           into the future that the Expires header field of the article requests that it remain
           around) falls in the interval specified by <mintime> and <maxtime>.

           The format of these parameters is "0d0h0m0s" (days, hours, minutes, and seconds into
           the future).  If <maxtime> is "0s" or is not specified, there is no upper bound on
           expire times falling into this entry (note that this key has no effect on when the
           article will actually be expired, but only on whether or not the article will be
           stored using this storage method).  This field is also optional and may be omitted
           entirely if you do not want to store articles according to their Expires header field,
           if any.

           A <mintime> value greater than "0s" implies that this storage method won't match any
           article without an Expires header field.

       options: <options>
           This key is for passing special options to storage methods that require them
           (currently only "cnfs").  See the "STORAGE METHODS" section below for a description of
           its use.

       exactmatch: <bool>
           If this key is set to true, all the newsgroups in the Newsgroups header field body (or
           Xref if storeonxref in inn.conf is true) of incoming articles will be examined to see
           if they match newsgroups patterns.  (Normally, any non-zero number of matching
           newsgroups is sufficient, provided no newsgroup matches a poison wildmat as described
           above.)  This is a boolean value; "true", "yes" and "on" are usable to enable this
           key.  The case of these values is not significant.  The default is false.

       filtered: <bool>
           If this key is set to true, the article must have been rejected by any enabled article
           filters (Perl or Python) for innd.  This also requires that dontrejectfiltered is set
           to true in inn.conf.  Filtered articles are usually stored in a small CNFS buffer, or
           another storage method with a rather tight expiration policy.  This is a boolean
           value; "true", "yes" and "on" are usable to enable this key.  The case of these values
           is not significant.  The default is false.

           If all the storage classes have this key set to false, filtered articles are stored in
           the same storage class as accepted articles.  It is only when at least one storage
           class has this key set to true than filtered articles and accepted articles are no
           longer stored mixed together in any storage class.

       path: <wildmat>
           What articles by their Path header field are stored using this storage method.
           <wildmat> is a uwildmat pattern which is matched against the Path header field body of
           articles, which corresponds to where articles have passed, or were posted at.  Poison
           wildmat expressions (expressions starting with "@") are allowed and can be used to
           exclude certain path patterns (in lieu of expressions starting with "!" that are not
           considered negated because "!" has a special meaning in Path header fields).  The
           <wildmat> pattern is matched in order.

           A typical use case might be to store articles from a spammy site in a small CNFS
           buffer to avoid overall retention impacts:

               path: "*!spam-site.example.com!not-for-mail"

           The default is to match all articles.

       If an article matches all of the constraints of an entry, it is stored via that storage
       method and is associated with that <storage_class>.  This file is scanned in order and the
       first matching entry is used to store the article.

       If an article does not match any entry, either by being posted to a newsgroup which does
       not match any of the <wildmat> patterns or by being outside the size and expires ranges of
       all entries whose newsgroups pattern it does match, the article is not stored and is
       rejected by innd.  When this happens, the error message:

           cant store article: no matching entry in storage.conf

       is logged to syslog.  If you want to silently drop articles matching certain newsgroup
       patterns or size or expires ranges, assign them to the "trash" storage method rather than
       having them not match any storage method entry.

STORAGE METHODS

       Currently, there are five storage methods available.  Each method has its pros and cons;
       you can choose any mixture of them as is suitable for your environment.  Note that each
       method has an attribute EXPENSIVESTAT which indicates whether checking the existence of an
       article is expensive or not.  This is used to run expireover(8).

       cnfs
           The "cnfs" storage method stores articles in large cyclic buffers (CNFS stands for
           Cyclic News File System).  Articles are stored in CNFS buffers in arrival order, and
           when the buffer fills, it wraps around to the beginning and stores new articles over
           the top of the oldest articles in the buffer.  The expire time of articles stored in
           CNFS buffers is therefore entirely determined by how long it takes the buffer to wrap
           around, which depends on how quickly data is being stored in it.  (This method is
           therefore said to have self-expire functionality.  It also means that when an article
           is cancelled, the cycbuff doesn't go back and use space until it rolls over and the
           whole cycbuff starts being reused.)  EXPENSIVESTAT is false for this method.

           CNFS has its own configuration file, cycbuff.conf, which describes some subtleties to
           the basic description given above.  Storage method entries for the "cnfs" storage
           method must have an options: field specifying the metacycbuff into which articles
           matching that entry should be stored; see cycbuff.conf(5) for details on metacycbuffs.

           Advantages: By far the fastest of all storage methods (except for "trash"), since it
           eliminates the overhead of dealing with a file system and creating new files.  Unlike
           all other storage methods, it does not require manual article expiration.  With CNFS,
           the server will never throttle itself due to a full spool disk, and groups are
           restricted to just the buffer files given so that they can never use more than the
           amount of disk space allocated to them.

           Disadvantages: Article retention times are more difficult to control because old
           articles are overwritten automatically.  Attacks on Usenet, such as flooding or
           massive amounts of spam, can result in wanted articles expiring much faster than
           intended (with no warning).

       timecaf
           This method stores multiple articles in one file, whose name is based on the article's
           arrival time and the storage class.  The file name will be:

               <patharticles>/timecaf-nn/bb/aacc.CF

           where "nn" is the hexadecimal value of <storage_class>, "bb" and "aacc" are the
           hexadecimal components of the arrival time, and "CF" is a hardcoded extension.  (The
           arrival time, in seconds since the epoch, is converted to hexadecimal and interpreted
           as "0xaabbccdd", with "aa", "bb", and "cc" used to build the path.)  This method does
           not have self-expire functionality (meaning expire has to run periodically to delete
           old articles, as well as cancelled articles if immediatecancel is not set to true in
           inn.conf).  EXPENSIVESTAT is false for this method.

           A given CAF file contains all the articles received during a time frame of 4 minutes
           or so (256 seconds), and is limited to 262,144 articles and about 3,5 GB.  It is
           enough for normal operations.  The only caveat is when you're feeding at high speed
           bunches of articles between two servers; you'll then want to limit it to that amount
           of articles during the time frame when a CAF file stores newly arrived articles.

           Advantages: It is roughly four times faster than "timehash" for article writes, since
           much of the file system overhead is bypassed, while still retaining the same fine
           control over article retention time.

           Disadvantages: Using this method means giving up all but the most careful manually
           fiddling with the article spool; in this aspect, it looks like "cnfs".  As one of the
           newer and least widely used storage types, "timecaf" has not been as thoroughly tested
           as the other methods.

       timehash
           This method is very similar to "timecaf" except that each article is stored in a
           separate file.  The name of the file for a given article will be:

               <patharticles>/time-nn/bb/cc/yyyy-aadd

           where "nn" is the hexadecimal value of <storage_class>, "yyyy" is a hexadecimal
           sequence number, and "bb", "cc", and "aadd" are components of the arrival time in
           hexadecimal (the arrival time is interpreted as documented above under "timecaf").
           This method does not have self-expire functionality.  Cancelled articles are removed
           immediately.  EXPENSIVESTAT is true for this method.

           Advantages: Heavy traffic groups do not cause bottlenecks, and a fine control of
           article retention time is still possible.

           Disadvantages: The ability to easily find all articles in a given newsgroup and
           manually fiddle with the article spool is lost, and INN still suffers from speed
           degradation due to file system overhead (creating and deleting individual files is a
           slow operation).

       tradspool
           Traditional spool, or "tradspool", is the traditional news article storage format.
           Each article is stored in an individual text file named:

               <patharticles>/news/group/name/nnnnn

           where "news/group/name" is the name of the newsgroup to which the article was posted
           with each period changed to a slash, and "nnnnn" is the sequence number of the article
           in that newsgroup.  For crossposted articles, the article is linked into each
           newsgroup to which it is crossposted (using either hard or symbolic links).  This is
           the way versions of INN prior to 2.0 stored all articles, as well as being the article
           storage format used by C News and earlier news systems.  This method does not have
           self-expire functionality.  Cancelled articles are removed immediately.  EXPENSIVESTAT
           is true for this method.

           Advantages: It is widely used and well-understood; it can read article spools written
           by older versions of INN and it is compatible with all third-party INN add-ons.  This
           storage mechanism provides easy and direct access to the articles stored on the
           server, makes writing programs that fiddle with the news spool very easy, gives fine
           control over article retention times, and comes with the scanspool support utility to
           perform sanity checks.

           Disadvantages: It takes a very fast file system and I/O system to keep up with current
           Usenet traffic volumes due to file system overhead.  Groups with heavy traffic tend to
           create a bottleneck because of inefficiencies in storing large numbers of article
           files in a single directory.  It requires a nightly expire program to delete old
           articles out of the news spool, a process that can slow down the server for several
           hours or more.

       trash
           This method silently discards all articles stored in it.  Its only real uses are for
           testing and for silently discarding articles matching a particular storage method
           entry (for whatever reason).  Articles stored in this method take up no disk space and
           can never be retrieved, so this method has self-expire functionality of a sort.
           EXPENSIVESTAT is false for this method.

EXAMPLES

       The following sample storage.conf file would store all articles posted to alt.binaries.*
       in the "BINARIES" CNFS metacycbuff, all articles over roughly 50 KB in any other hierarchy
       in the "LARGE" CNFS metacycbuff, all other articles in alt.* in one timehash class, and
       all other articles in any newsgroups in a second timehash class, except for the internal.*
       hierarchy which is stored in traditional spool format.

           method tradspool {
               class: 1
               newsgroups: internal.*
           }
           method cnfs {
               class: 2
               newsgroups: alt.binaries.*
               options: BINARIES
           }
           method cnfs {
               class: 3
               newsgroups: *
               size: 50000
               options: LARGE
           }
           method timehash {
               class: 4
               newsgroups: alt.*
           }
           method timehash {
               class: 5
               newsgroups: *
           }

       Notice that the last storage method entry will catch everything.  This is a good habit to
       get into; make sure that you have at least one catch-all entry just in case something you
       did not expect falls through the cracks.  Notice also that the special rule for the
       internal.* hierarchy is first, so it will catch even articles crossposted to
       alt.binaries.* or over 50 KB in size.

       As for poison wildmat expressions, if you have for instance an article crossposted between
       misc.foo and misc.bar, the pattern:

           misc.*,!misc.bar

       will match that article whereas the pattern:

           misc.*,@misc.bar

       will not match that article.  An article posted only to misc.bar will fail to match either
       pattern.

       Usually, high-volume groups and groups whose articles do not need to be kept around very
       long (binaries groups, *.jobs*, news.lists.filters, etc.) are stored in CNFS buffers.  Use
       the other methods (or CNFS buffers again) for everything else.  However, it is as often as
       not most convenient to keep in "tradspool" special hierarchies like local hierarchies and
       hierarchies that should never expire or through the spool of which you need to go
       manually.

HISTORY

       Written by Katsuhiro Kondou <kondou@nec.co.jp> for InterNetNews.  Rewritten into POD by
       Julien Elie.

SEE ALSO

       cycbuff.conf(5), expire.ctl(5), expireover(8), inn.conf(5), innd(8), libinn_uwildmat(3),
       scanspool(8).