Provided by: inn2-dev_2.7.2~20240212-1build3_amd64 bug

NAME

       dbz - Database routines for InterNetNews

SYNOPSIS

           #include <inn/dbz.h>

           #define DBZMAXKEY              ...
           #define DBZ_INTERNAL_HASH_SIZE ...

           typedef enum
           {
               DBZSTORE_OK,
               DBZSTORE_EXISTS,
               DBZSTORE_ERROR
           } DBZSTORE_RESULT;

           typedef enum
           {
               INCORE_NO,
               INCORE_MEM,
               INCORE_MMAP
           } dbz_incore_val;

           typedef struct {
               bool writethrough;
               dbz_incore_val pag_incore;
               dbz_incore_val exists_incore;
               bool nonblock;
           } dbzoptions;

           typedef struct {
               char hash[DBZ_INTERNAL_HASH_SIZE];
           } __attribute__((__packed__)) erec;

           extern bool dbzinit(const char *name);
           extern bool dbzclose(void);

           extern bool dbzfresh(const char *name, off_t size);
           extern bool dbzagain(const char *name, const char *oldname);
           extern bool dbzexists(const HASH key);
           extern bool dbzfetch(const HASH key, off_t *value);
           extern DBZSTORE_RESULT dbzstore(const HASH key, off_t data);
           extern bool dbzsync(void);
           extern long dbzsize(off_t contents);
           extern void dbzsetoptions(const dbzoptions options);
           extern void dbzgetoptions(dbzoptions *options);

DESCRIPTION

       These functions provide an indexing system for rapid random access to a text file,
       hereafter named the base file.

       dbz stores offsets into the base file for rapid retrieval.  All retrievals are keyed on a
       hash value that is generated by the HashMessageID function in libinn(3).

       dbzinit opens a database, an index into the base file name, consisting of files name.dir,
       name.index, and name.hash which must already exist.  (If the database is new, they should
       be zero-length files.)  Subsequent accesses go to that database until dbzclose is called
       to close the database.  When tagged hash format is used (if --enable-tagged-hash was given
       at configure time), a name.pag file is used instead of .index and .hash.

       dbzfetch searches the database for the specified key, assigning the offset of the base
       file for the corresponding key to value, if any.

       dbzstore stores the key-data pair in the database.  It will return "DBZSTORE_EXISTS" for
       duplicates (already existing entries), and "DBZSTORE_OK" for success.  It will fail with
       "DBZSTORE_ERROR" if the database files are not writable or not opened, or if any other
       error occurs.

       dbzexists will verify whether or not the given hash exists or not.  dbz is optimized for
       this operation and it may be significantly faster than dbzfetch.

       dbzfresh is a variant of dbzinit for creating a new database with more control over
       details.  The size parameter specifies the size of the first hash table within the
       database, in number of key-value pairs.  Performance will be best if the number of key-
       value pairs stored in the database does not exceed about 2/3 of size, or 1/2 of size when
       the tagged hash format is used.  (The dbzsize function, given the expected number of key-
       value pairs, will suggest a database size that meets these criteria.)  Assuming that an
       fseek offset is 4 bytes, the .index file will be 4 * size bytes.  The .hash file will be
       "DBZ_INTERNAL_HASH_SIZE" * size bytes (the .dir file is tiny and roughly constant in size)
       until the number of key-value pairs exceeds about 80% of size.  (Nothing awful will happen
       if the database grows beyond 100% of size, but accesses will slow down quite a bit and the
       .index and .hash files will grow somewhat.)

       dbz stores up to "DBZ_INTERNAL_HASH_SIZE" bytes (by default, 4 bytes if tagged hash format
       is used, 6 otherwise) of the Message-ID's hash in the .hash file to confirm a hit.  This
       eliminates the need to read the base file to handle collisions.

       A size of "0" given to dbzfresh is synonymous with the local default; the normal default
       is suitable for tables of about 6,000,000 key-value pairs (or 500,000 key-value pairs when
       the tagged hash format is used).  That default value is used by dbzinit.

       When databases are regenerated periodically, as it is the case for the history file, it is
       simplest to pick the parameters for a new database based on the old one.  This also
       permits some memory of past sizes of the old database, so that a new database size can be
       chosen to cover expected fluctuations.  dbzagain is a variant of dbzinit for creating a
       new database as a new generation of an old database.  The database files for oldname must
       exist.  dbzagain is equivalent to calling dbzfresh with a size equal to the result of
       applying dbzsize to the largest number of entries in the oldname database and its previous
       10 generations.

       When many accesses are being done by the same program, dbz is massively faster if its
       first hash table is in memory.  If the pag_incore flag is set to "INCORE_MEM", an attempt
       is made to read the table in when the database is opened, and dbzclose writes it out to
       disk again (if it was read successfully and has been modified).  dbzsetoptions can be used
       to set the pag_incore and exists_incore flags to different values which should be
       "INCORE_NO" (read from disk), "INCORE_MEM" (read from memory) or "INCORE_MMAP" (read from
       a mmap'ed file) for the .hash and .index files separately; this does not affect the status
       of a database that has already been opened.  The default is "INCORE_NO" for the .index
       file and "INCORE_MMAP" for the .hash file.  The attempt to read the table in may fail due
       to memory shortage; in this case dbz fails with an error.  Stores to an in-memory database
       are not (in general) written out to the file until dbzclose or dbzsync, so if robustness
       in the presence of crashes or concurrent accesses is crucial, in-memory databases should
       probably be avoided or the writethrough option should be set to true (telling to
       systematically write to the filesystem in addition to updating the in-memory database).

       If the nonblock option is true, then writes to the .hash and .index files will be done
       using non-blocking I/O.  This can be significantly faster if your platform supports non-
       blocking I/O with files.  It is only applicable if you're not mmap'ing the database.

       dbzsync causes all buffers etc. to be flushed out to the files.  It is typically used as a
       precaution against crashes or concurrent accesses when a dbz-using process will be running
       for a long time.  It is a somewhat expensive operation, especially for an in-memory
       database.

       Concurrent reading of databases is fairly safe, but there is no (inter)locking, so
       concurrent updating is not.

       An open database occupies three stdio streams and two file descriptors; Memory consumption
       is negligible except for in-memory databases (and stdio buffers).

DIAGNOSTICS

       Functions returning bool values return true for success, false for failure.

       dbzinit attempts to have errno set plausibly on return, but otherwise this is not
       guaranteed.  An errno of "EDOM" from dbzinit indicates that the database did not appear to
       be in dbz format.

       If "DBZTEST" is defined at compile-time, then a main() function will be included.  This
       will do performance tests and integrity test.

BUGS

       Unlike dbm, dbz will refuse to dbzstore with a key already in the database.  The user is
       responsible for avoiding this.

       The RFC5322 case mapper implements only a first approximation to the hideously-complex
       RFC5322 case rules.

       dbz no longer tries to be call-compatible with dbm in any way.

HISTORY

       The original dbz was written by Jon Zeeff <zeeff@b-tech.ann-arbor.mi.us>.  Later
       contributions by David Butler and Mark Moraes.  Extensive reworking, including this
       documentation, by Henry Spencer <henry@zoo.toronto.edu> as part of the C News project.
       MD5 code borrowed from RSA.  Extensive reworking to remove backwards compatibility and to
       add hashes into dbz files by Clayton O'Neill <coneill@oneill.net>.  Rewritten into POD by
       Julien Elie.

SEE ALSO

       dbm(3), history(5), libinn(3).