Ubuntu Manpage: ovdb - Overview storage method for INN

NAME

       ovdb - Overview storage method for INN

DESCRIPTION

       The ovdb overview is a storage method that uses the Berkeley DB library to store overview
       data.  It requires version 4.4 or later of the Berkeley DB library (4.7+ is recommended
       because older versions suffer from various issues).

       The ovdb overview method makes use of the full transaction/logging/locking functionality
       of the Berkeley DB environment.  Berkeley DB may be downloaded from
       <http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index.html>
       and is needed to build the ovdb backend.

UPGRADING

       There are several versions of the ovdb storage method:

       · Version 1, the initial version shipped with INN 2.3.0 up to INN 2.3.5.

       · Version 2, with improved performance, since INN 2.4.0.

       · Version 3, corresponding to version 2 with compression enabled, starting with INN 2.5.0.

       If you have a database created with a previous version of ovdb, your database will need to
       be upgraded using ovdb_init.  See the ovdb_init(8) man page for upgrade instructions, as
       well as the COMPRESSION section below.

       Note that when the Berkeley DB library is updated to a newer version, the ovdb database
       also needs being upgraded.

INSTALLATION

If the Berkeley DB library is found at configure time, INN will be built with Berkeley DB
support unless the --without-bdb flag is explicitly passed to configure. By default,
configure will search for Berkeley DB in standard locations; there will be a message in
the configure output indicating the pathname that will be used.

You can override this pathname by adding a path to the option, for instance
--with-bdb=/usr/BerkeleyDB.4.4. This directory is expected to have subdirectories include
and lib (lib32 and lib64 are also checked), containing respectively db.h, and the library
itself. In case non-standard paths to the Berkeley DB libraries are used, one or both of
the options --with-bdb-include and --with-bdb-lib can be given to configure with a path.

The ovdb database may take up more disk space for a given spool than the other overview
methods. Plan on needing at least 1.1 KB for every article in your spool (not counting
crossposts). So, if you have 5 million articles, you'll need at least 5.5 GB of disk
space for ovdb. With compression enabled, this estimate changes to 0.9 KB per article, so
you'll need at least 4.5 GB of disk space for 5 million articles. See the COMPRESSION
section below. Plus, you'll need additional space for transaction logs: at least 100 MB.
By default, the transaction logs go in the same directory as the database. To improve
performance, they can be placed on a different disk -- see the DB_CONFIG section.

CONFIGURATION

To enable the ovdb overview method, set the ovmethod parameter in inn.conf to "ovdb". The
ovdb database is stored in the directory specified by the pathoverview parameter in
inn.conf. This is the "DB_HOME" directory. To start out, this directory should be empty
(other than an optional DB_CONFIG file; see DB_CONFIG for details), and innd (or
makehistory) will create the files as necessary in that directory. Also, make sure the
directory is owned by the news user.

Other parameters for configuring ovdb are in the ovdb.conf configuration file. The
following parameters can be set in that file:

compress
If INN was compiled with zlib, and this compress parameter is true, ovdb will compress
overview records that are longer than 600 bytes. See the COMPRESSION section below.

cachesize
Size of the memory pool cache, in kilobytes. The cache will have a backing store file
in the DB directory which will be at least as big. In general, the bigger the cache,
the better. Use "ovdb_stat -m" to see cache hit percentages. To make a change of
this parameter take effect, shut down and restart INN (be sure to kill all of the
nnrpd processes when shutting down). Default is 8000 (KB), which is adequate for
small to medium-sized servers. Large servers will probably need at least 20000 (KB).

ncache
Number of regions across which to split the cache. The region size is equal to
cachesize divided by ncache. Default is 1 for ncache, that is to say the cache will
be allocated contiguously in memory.

numdbfiles
Overview data is split between this many files. Currently, innd will keep all of the
files open, so don't set this too high or innd may run out of file descriptors. nnrpd
only opens one at a time, regardless. May be set to one, or just a few, but only do
that if your OS supports large (> 2 GB) files. Changing this parameter has no effect
on an already-established database. Default is 32.

txn_nosync
If txn_nosync is set to false, Berkeley DB flushes the log after every transaction.
This minimizes the number of transactions that may be lost in the event of a crash,
but results in significantly degraded performance. Default is true.

useshm
If useshm is set to true, Berkeley DB will use shared memory instead of mmap for its
environment regions (cache, lock, etc). With some platforms, this may improve
performance. Default is false.

shmkey
Sets the shared memory key used by Berkeley DB when useshm is true. Berkeley DB will
create several (usually 5) shared memory segments, using sequentially numbered keys
starting with "shmkey". Choose a key that does not conflict with any existing shared
memory segments on your system. Default is 6400.

pagesize
Sets the page size for the DB files (in bytes). Must be a power of 2. Best choices
are 4096 or 8192. The default is 8192. Changing this parameter has no effect on an
already-established database.

minkey
Sets the minimum number of keys per page. See the Berkeley DB documentation for more
information. Default is based on page size and whether compression is enabled:

default_minkey = MAX(2, pagesize / 2600) if compress is false
default_minkey = MAX(2, pagesize / 1500) if compress is true

The lowest allowed minkey is 2. Setting minkey higher than the default is not
recommended, as it will cause the databases to have a lot of overflow pages. Changing
this parameter has no effect on an already-established database.

maxlocks
Sets the Berkeley DB lk_max parameter, which is the maximum number of locks that can
exist in the database at the same time. Default is 4000.

nocompact
The nocompact parameter affects the behaviour of expireover. The expireover function
in ovdb can do its job in one of two ways: by simply deleting expired records from
the database; or by re-writing the overview records into a different location leaving
out the expired records. The first method is faster, but it leaves 'holes' that
result in space that can not immediately be reused. The second method 'compacts' the
records by rewriting them.

If this parameter is set to 0, expireover will compact all newsgroups; if set to 1,
expireover will not compact any newsgroups; and if set to a value greater than one,
expireover will only compact groups that have less than that number of articles.

Experience has shown that compacting has minimal effect (other than making expireover
take longer) so the default is 1. This parameter will probably be removed in the
future.

readserver
When the readserver parameter is set to false, each nnrpd process directly accesses
the Berkeley DB environment. The process of attaching to the database (and detaching
when finished) is fairly expensive, and can result in high loads in situations when
there are lots of reader connections of relatively short duration.

When the readserver parameter is set to true, the nnrpd processes will access overview
via a helper server (ovdb_server -- which is started by ovdb_init). All ovdb reads
will then be funnelled through a single process with a cleaner interface to the
underlying Berkeley DB database. This will result in cleaner shutdowns for the
database, improving stability and avoiding deadlocks, timing issues and corrupted
databases. That's why you should try to set this parameter to true if you are
experiencing any instability in the ovdb overview method.

Default value is true.

numrsprocs
This parameter is only used when readserver is true. It sets the number of
ovdb_server processes. As each ovdb_server can process only one transaction at a
time, running more servers can improve reader response times. Default is 5.

maxrsconn
This parameter is only used when readserver is true. It sets a maximum number of
readers that a given ovdb_server process will serve at one time. This means the
maximum number of readers for all of the ovdb_server processes is (numrsprocs *
maxrsconn). This does not limit the actual number of readers, since nnrpd will fall
back to opening the database directly if it can't connect to an ovdb_server. Default
is 0, which means an unlimited number of connections is allowed.

COMPRESSION

The ovdb storage method has the ability to compress overview data before it is stored into
the database. In addition to consuming less disk space, compression keeps the average
size of the database keys smaller. This in turn increases the average number of keys per
page, which can significantly improve performance and also helps keep the database more
compact. This feature requires that INN be built with zlib. Only records larger than 600
bytes get compressed, because that is the point at which compression starts to become
significant.

If compression is not enabled (either from the compress option in ovdb.conf or INN was not
built with zlib support), the database will be backward compatible with older versions of
ovdb. However, if compression is enabled, the database is marked with a newer version
that will prevent older versions of ovdb from opening the database.

You can upgrade an existing database to use compression simply by setting compress to true
in ovdb.conf. Note that existing records in the database will remain uncompressed; only
new records added after enabling compression will be compressed.

If you disable compression on a database that previously had it enabled, new records will
be stored uncompressed, but the database will still be incompatible with older versions of
ovdb (and will also be incompatible with this version of ovdb if INN was not built with
zlib support). So to downgrade to a completely uncompressed database, you will have to
rebuild the database using makehistory.

DB_CONFIG

A file called DB_CONFIG may be placed in the database directory (pathoverview in inn.conf)
to customize where the various database files and transaction logs are written. By
default, all of the files are written in the "DB_HOME" directory. One way to improve
performance is to put the transaction logs on a different disk. To do this, put:

DB_LOG_DIR /path/to/logs

in the DB_CONFIG file. If the pathname you give starts with a "/", it is treated as an
absolute path; otherwise, it is relative to the "DB_HOME" directory. Make sure that any
directories you specify exist and have proper ownership/mode before starting INN, because
they won't be created automatically. Also, don't change the DB_CONFIG file while anything
that uses ovdb is running.

Another thing that you can do with this file is to split the overview database across
multiple disks. In the DB_CONFIG file, you can list directories that Berkeley DB will
search when it goes to open a database.

For example, let's say that you have pathoverview set to /mnt/overview and you have four
additional file systems created on /mnt/ovX. You would create a file
/mnt/overview/DB_CONFIG containing the following lines:

set_data_dir /mnt/overview
set_data_dir /mnt/ov1
set_data_dir /mnt/ov2
set_data_dir /mnt/ov3
set_data_dir /mnt/ov4

Distribute your ovNNNNN files into the four filesystems (say, 8 each). When called upon
to open a database file, the db library will look for it in each of the specified
directories (in order). If said file is not found, one will be created in the first of
those directories.

Whenever you change DB_CONFIG or move database files around, make sure all news processes
that use the database are shut down first (including nnrpd processes).

The DB_CONFIG functionality is part of Berkeley DB itself, rather than something provided
by ovdb. See the Berkeley DB documentation for complete details for the version of
Berkeley DB that you're running.

RUNNING

       When starting the news system, rc.news will invoke the ovdb_init program.  See the
       ovdb_init(8) man page for information about the tasks it performs.  ovdb_init must be run
       before using the database.

       And when stopping INN, rc.news kills the ovdb_monitor processes after the other INN
       processes have been shut down.

DIAGNOSTICS

       Problems relating to ovdb are logged to news.err with "OVDB" in the error message.

       INN programs that use overview will fail to start up if the ovdb_monitor processes aren't
       running.  Be sure to run ovdb_init before running anything that accesses overview.

       Also, INN programs that use overview will fail to start up if the user running them is not
       the news user.

       If a program accessing the database crashes, or otherwise exits uncleanly, it might leave
       a stale lock in the database.  This lock could cause other processes to deadlock on that
       stale lock.  To fix this, shut down all news processes (using "kill -9" if necessary) and
       then restart.  ovdb_init should perform a recovery operation which will remove the locks
       and repair damage caused by killing the deadlocked processes.

FILES

       pathetc/inn.conf
           The ovmethod and pathoverview parameters are relevant to ovdb.

       pathetc/ovdb.conf
           Optional configuration file for tuning.  See CONFIGURATION above.

       pathoverview
           Directory where the database goes.  Berkeley DB calls it the "DB_HOME" directory.

       pathoverview/DB_CONFIG
           Optional file to configure the layout of the database files.

       pathrun/ovdb.sem
           A file that gets locked by every process that is accessing the database.  This is used
           by ovdb_init to determine whether the database is active or quiescent.

       pathrun/ovdb_monitor.pid
           Contains the process ID of ovdb_monitor.

TO DO

       Implement a way to limit how many databases can be open at once (to reduce file descriptor
       usage); maybe using something similar to the cache code in legacy ov3.c file.

HISTORY

       Written by Heath Kehoe <hakehoe@avalon.net> for InterNetNews.