Provided by: inn_1.7.2q-46build3_amd64 bug

NAME

       dbminit, fetch, store, dbmclose - somewhat dbm-compatible database routines
       dbzfresh, dbzagain, dbzfetch, dbzstore - database routines
       dbzsync, dbzsize, dbzincore, dbzcancel, dbzdebug - database routines

SYNOPSIS

       #include <dbz.h>

       dbminit(base)
       char *base;

       datum
       fetch(key)
       datum key;

       store(key, value)
       datum key;
       datum value;

       dbmclose()

       dbzfresh(base, size, fieldsep, cmap, tagmask)
       char *base;
       long size;
       int fieldsep;
       int cmap;
       long tagmask;

       dbzagain(base, oldbase)
       char *base;
       char *oldbase;

       datum
       dbzfetch(key)
       datum key;

       dbzstore(key, value)
       datum key;
       datum value;

       dbzsync()

       long
       dbzsize(nentries)
       long nentries;

       dbzincore(newvalue)

       dbzcancel()

       dbzdebug(newvalue)

DESCRIPTION

       These  functions  provide  an  indexing system for rapid random access to a text file (the
       base file).  Subject  to  certain  constraints,  they  are  call-compatible  with  dbm(3),
       although  they also provide some extensions.  (Note that they are not file-compatible with
       dbm or any variant thereof.)

       In principle, dbz stores key-value pairs, where both key and value are arbitrary sequences
       of bytes, specified to the functions by values of type datum, typedefed in the header file
       to be a structure with members dptr (a value of type char * pointing  to  the  bytes)  and
       dsize (a value of type int indicating how long the byte sequence is).

       In practice, dbz is more restricted than dbm.  A dbz database must be an index into a base
       file, with the database values being fseek(3) offsets into the base file.  Each such value
       must  ``point to'' a place in the base file where the corresponding key sequence is found.
       A key can be no longer than DBZMAXKEY (a constant defined in the header file)  bytes.   No
       key  can  be  an  initial subsequence of another, which in most applications requires that
       keys be either bracketed or terminated in some way (see the  discussion  of  the  fieldsep
       parameter of dbzfresh, below, for a fine point on terminators).

       Dbminit  opens  a database, an index into the base file base, consisting of files base.dir
       and base.pag which must already exist.  (If the database is  new,  they  should  be  zero-
       length  files.)  Subsequent accesses go to that database until dbmclose is called to close
       the database.  The base file need not exist at the time of the dbminit, but it must  exist
       before accesses are attempted.

       Fetch  searches  the  database for the specified key, returning the corresponding value if
       any.  Store stores the key-value pair  in  the  database.   Store  will  fail  unless  the
       database files are writeable.  See below for a complication arising from case mapping.

       Dbzfresh  is  a  variant  of  dbminit  for  creating a new database with more control over
       details.  Unlike for dbminit, the database files need not exist: they will be  created  if
       necessary, and truncated in any case.

       Dbzfresh's  size parameter specifies the size of the first hash table within the database,
       in key-value pairs.  Performance will be best if size is a prime number and the number  of
       key-value  pairs  stored  in the database does not exceed about 2/3 of size.  (The dbzsize
       function, given the expected number of key-value pairs, will suggest a database size  that
       meets  these  criteria.)   Assuming that an fseek offset is 4 bytes, the .pag file will be
       4*size bytes (the .dir file is tiny and roughly constant in size) until the number of key-
       value  pairs  exceeds about 80% of size.  (Nothing awful will happen if the database grows
       beyond 100% of size, but accesses will slow down somewhat and  the  .pag  file  will  grow
       somewhat.)

       Dbzfresh's  fieldsep parameter specifies the field separator in the base file.  If this is
       not NUL (0), and the last character of a key argument is NUL, that NUL compares  equal  to
       either  a  NUL  or  a fieldsep in the base file.  This permits use of NUL to terminate key
       strings without requiring that NULs appear in the base file.  The fieldsep of  a  database
       created with dbminit is the horizontal-tab character.

       For  use  in  news systems, various forms of case mapping (e.g. uppercase to lowercase) in
       keys are available.  The cmap parameter to dbzfresh is a single character specifying which
       of several mapping algorithms to use.  Available algorithms are:

              0      case-sensitive:  no case mapping

              B      same as 0

              NUL    same as 0

              =      case-insensitive:  uppercase and lowercase equivalent

              b      same as =

              C      RFC822 message-ID rules, case-sensitive before `@' (with certain exceptions)
                     and case-insensitive after

              ?      whatever the local default is, normally C

       Mapping algorithm 0 (no mapping) is faster than  the  others  and  is  overwhelmingly  the
       correct  choice  for most applications.  Unless compatibility constraints interfere, it is
       more efficient to pre-map the keys, storing mapped keys in the base file, than to have dbz
       do the mapping on every search.

       For  historical  reasons, fetch and store expect their key arguments to be pre-mapped, but
       expect unmapped keys in the base file.  Dbzfetch and dbzstore do the same jobs but  handle
       all case mapping internally, so the customer need not worry about it.

       Dbz stores only the database values in its files, relying on reference to the base file to
       confirm a hit on a key.  References to the base file can be minimized, greatly speeding up
       searches,  if  a  little bit of information about the keys can be stored in the dbz files.
       This is ``free'' if there are some unused bits in an fseek offset, so that the offset  can
       be  tagged  with some information about the key.  The tagmask parameter of dbzfresh allows
       specifying the location of unused bits.  Tagmask should  be  a  mask  with  one  group  of
       contiguous  1  bits.   The bits in the mask should be unused (0) in most offsets.  The bit
       immediately above the mask (the flag bit) should be unused (0) in all offsets;  (dbz)store
       will  reject  attempts  to  store a key-value pair in which the value has the flag bit on.
       Apart from this restriction, tagging is invisible to the  user.   As  a  special  case,  a
       tagmask  of  1  means  ``no tagging'', for use with enormous base files or on systems with
       unusual offset representations.

       A size of 0 given to dbzfresh is synonymous with the local default; the normal default  is
       suitable  for  tables of 90-100,000 key-value pairs.  A cmap of 0 (NUL) is synonymous with
       the character 0, signifying no case mapping (note that  the  character  ?   specifies  the
       local  default  mapping, normally C).  A tagmask of 0 is synonymous with the local default
       tag mask, normally 0x7f000000 (specifying the top bit in a 32-bit offset as the flag  bit,
       and  the  next  7  bits  as  the mask, which is suitable for base files up to circa 24MB).
       Calling  dbminit(name)  with  the  database  files  empty   is   equivalent   to   calling
       dbzfresh(name,0,'\t','?',0).

       When  databases  are  regenerated  periodically,  as  in  news, it is simplest to pick the
       parameters for a new database based on the old one.  This also permits some memory of past
       sizes  of  the  old  database, so that a new database size can be chosen to cover expected
       fluctuations.  Dbzagain is a variant of dbminit for creating  a  new  database  as  a  new
       generation  of  an  old database.  The database files for oldbase must exist.  Dbzagain is
       equivalent to calling dbzfresh with the same field separator, case mapping, and  tag  mask
       as  the  old  database,  and a size equal to the result of applying dbzsize to the largest
       number of entries in the oldbase database and its previous 10 generations.

       When many accesses are being done by the same program, dbz  is  massively  faster  if  its
       first  hash  table is in memory.  If an internal flag is 1, an attempt is made to read the
       table in when the database is opened, and dbmclose writes it out to disk again (if it  was
       read  successfully  and  has  been  modified).  Dbzincore sets the flag to newvalue (which
       should be 0 or 1) and returns the previous value; this does not affect  the  status  of  a
       database  that  has already been opened.  The default is 0.  The attempt to read the table
       in may fail due to memory shortage; in this case dbz quietly falls  back  on  its  default
       behavior.   Stores  to  an in-memory database are not (in general) written out to the file
       until dbmclose or dbzsync, so if robustness in  the  presence  of  crashes  or  concurrent
       accesses is crucial, in-memory databases should probably be avoided.

       Dbzsync causes all buffers etc. to be flushed out to the files.  It is typically used as a
       precaution against crashes or concurrent accesses when a dbz-using process will be running
       for  a  long  time.   It  is  a  somewhat expensive operation, especially for an in-memory
       database.

       Dbzcancel cancels any pending writes from buffers.  This is typically useful only for  in-
       core databases, since writes are otherwise done immediately.  Its main purpose is to let a
       child process, in the wake of a fork, do a dbmclose without writing its parent's  data  to
       disk.

       If  dbz has been compiled with debugging facilities available (which makes it bigger and a
       bit slower), dbzdebug alters the value (and returns the previous  value)  of  an  internal
       flag  which (when 1; default is 0) causes verbose and cryptic debugging output on standard
       output.

       Concurrent reading of databases is  fairly  safe,  but  there  is  no  (inter)locking,  so
       concurrent updating is not.

       The  database  files  include  a  record  of  the byte order of the processor creating the
       database, and accesses by processors with different byte order will  work,  although  they
       will  be slightly slower.  Byte order is preserved by dbzagain.  However, agreement on the
       size and internal structure of an fseek offset  is  necessary,  as  is  consensus  on  the
       character set.

       An  open database occupies three stdio streams and their corresponding file descriptors; a
       fourth is needed for an in-memory database.  Memory consumption is negligible (except  for
       stdio buffers) except for in-memory databases.

SEE ALSO

       dbz(1), dbm(3)

DIAGNOSTICS

       Functions  returning int values return 0 for success, -1 for failure.  Functions returning
       datum values return a value with dptr set to NULL for failure.  Dbminit attempts  to  have
       errno  set  plausibly  on  return, but otherwise this is not guaranteed.  An errno of EDOM
       from dbminit indicates that the database did not appear to be in dbz format.

HISTORY

       The  original  dbz  was  written  by  Jon  Zeeff  (zeeff@b-tech.ann-arbor.mi.us).    Later
       contributions  by  David  Butler  and  Mark  Moraes.   Extensive reworking, including this
       documentation, by Henry Spencer (henry@zoo.toronto.edu) as part of  the  C  News  project.
       Hashing function by Peter Honeyman.

BUGS

       The  dptr members of returned datum values point to static storage which is overwritten by
       later calls.

       Unlike dbm, dbz will misbehave if an existing key-value pair is  `overwritten'  by  a  new
       (dbz)store  with  the  same  key.   The  user  is  responsible  for avoiding this by using
       (dbz)fetch first to check for duplicates; an internal optimization remembers the result of
       the first search so there is minimal overhead in this.

       Waiting  until  after  dbminit to bring the base file into existence will fail if chdir(2)
       has been used meanwhile.

       The RFC822 case mapper implements only a  first  approximation  to  the  hideously-complex
       RFC822 case rules.

       The prime finder in dbzsize is not particularly quick.

       Should implement the dbm functions delete, firstkey, and nextkey.

       On  C  implementations which trap integer overflow, dbz will refuse to (dbz)store an fseek
       offset equal to the greatest representable positive number, as this would  cause  overflow
       in the biased representation used.

       Dbzagain  perhaps  ought  to notice when many offsets in the old database were too big for
       tagging, and shrink the tag mask to match.

       Marking dbz's file descriptors close-on-exec would be a better  approach  to  the  problem
       dbzcancel tries to address, but that's harder to do portably.