Provided by: libgetdata-tools_0.7.3-6_i386 bug

NAME

       dirfile-encoding -- dirfile database encoding schemes

DESCRIPTION

       The  Dirfile Standards indicate that RAW fields defined in the database
       are accompanied by binary  files  containing  the  field  data  in  the
       specified   simple  data  type.   In  certain  situations,  it  may  be
       advantageous to convert the binary files in the database  into  a  more
       convenient form.  This is accomplished by encoding the binary file into
       the alternate form.  A common use-case for encoding a binary file is to
       compress  it  to save disk space.  Only data is modified by an encoding
       scheme.  Database metadata is unaffected.

       Support for encoding schemes is optional.  An implementation  need  not
       support  any  particular  encoding  scheme, or may only support certain
       operations with it, but should expect  to  encounter  unknown  encoding
       schemes and fail gracefully in such situations.

       Additionally, how a particular encoding is implemented is not specified
       by the Dirfile Standards, but, for purposes  of  interoperability,  all
       dirfile   implementations   are  encouraged  to  support  the  encoding
       implementation used by the GetData  dirfile  reference  implementation,
       elaborated below.

       An  encoding  scheme  is  local  to the particular format specification
       fragment in which it is indicated.  This allows  a  single  dirfile  to
       have  binary files which are stored using multiple encodings, by having
       them defined in multiple fragments.

       The rest of this  manual  page  discusses  specifics  of  the  encoding
       framework  implemented  in the GetData library, and does not constitute
       part of the Dirfile Standards.

THE GETDATA ENCODING FRAMEWORK

       The GetData library provides  an  encoding  framework  which  abstracts
       binary  file  I/O,  allowing  for generic support for a wide variety of
       encoding schemes.   Functions  which  may  make  use  of  the  encoding
       framework are:

              gd_add(3),             gd_add_raw(3),            gd_add_spec(3),
              gd_alter_encoding(3),                    gd_alter_endianness(3),
              gd_alter_frameoffset(3),                      gd_alter_entry(3),
              gd_alter_raw(3),        gd_alter_spec(3),         gd_getdata(3),
              gd_move(3), gd_nframes(3), gd_putdata(3), and gd_rename(3).

       Most  of  the  encodings  supported  by GetData are implemented through
       external  libraries  which  handle  the  actual  file  I/O   and   data
       translation.   All  such libraries are optional; a build of the library
       which omits an external library will lack support  for  the  associated
       encoding  scheme.   In  this case, GetData will still properly identify
       the encoding scheme, but attempts to use GetData for file I/O  via  the
       encoding will fail with the GD_E_UNSUPPORTED error code.

       GetData  discovers  the  encoding  scheme  of a particular RAW field by
       noting the filename extension  of  files  associated  with  the  field.
       Binary  files  which  form an unencoded dirfile have no file extension.
       The file extension  used  by  the  other  encodings  are  noted  below.
       Encoding  discovery proceeds by searching for files with the known list
       of file extensions (in an unspecified  order)  and  stopping  when  the
       first  successful match is made.  Because of this, when the a field has
       multiple data files with different,  supported  file  extensions  which
       could   legitimately   be  associated  with  it,  the  encoding  scheme
       discovered by GetData is not well defined.

       In addition to  raw  (unencoded)  data,  GetData  supports  five  other
       encoding  schemes:  text  encoding, bzip2 encoding, gzip encoding, lzma
       encoding, and slim encoding, all discussed below.

   Text Encoding
       The Text Encoding is unique among GetData encoding schemes in  that  it
       requires  no  external library.  As a result, all builds of the library
       contain full support for this encoding.  It is  meant  to  serve  as  a
       reference  encoding  and  example of the encoding framework for work on
       other encoding schemes.

       The Text Encoding replaces the binary data files with 7-bit ASCII files
       containing  a  decimal  text encoding of the data, one sample per line.
       All operations are supported by the Text Encoding.  The file  extension
       of the Text Encoding is .txt.

   BZip2 Encoding
       The  BZip2  Encoding  compresses  raw  binary  files using the Burrows-
       Wheeler block sorting text compression algorithm and Huffman coding, as
       implemented  in  the  bzip2 format.  GetData's BZip2 Encoding scheme is
       implemented through the the bzip2 compression library written by Julian
       Seward.   GetData's  BZip2  Encoding  framework  currently  lacks write
       capabilities; as a result the BZip2 Encoding does not support functions
       which modify binary data.

       GetData  caches  an  uncompressed  megabyte  of data at a time to speed
       access times.  A call to get_nframes(3) requires decompression  of  the
       entire  binary  file  to  determine its uncompressed size, and may take
       some time to complete.  The file extension of  the  BZip2  Encoding  is
       .bz2.

   GZip Encoding
       The  GZip  Encoding compresses raw binary files using Lempel-Ziv coding
       (LZ77) as implemented in the  gzip  format.   GetData's  GZip  Encoding
       scheme  is implemented through the the zlib compression library written
       by Jean-loup Gailly and Mark Adler.  GetData's GZip Encoding  framework
       currently  lacks write capabilities; as a result the GZip Encoding does
       not support functions which modify binary data.

       To speed the operation of get_nframes(3), the GZip Encoding  takes  the
       uncompressed  size  of  the  file  the  gzip footer, which contains the
       file's uncompressed size in bytes, modulo 2^32.  As a result,  using  a
       field  with an (uncompressed) binary file size larger than 4 GiB as the
       reference field will  result  in  the  wrong  number  of  frames  being
       reported.  The file extension of the GZip Encoding is .gz.

   LZMA Encoding
       The  LZMA  Encoding  compresses  raw  binary files using the Lempel-Ziv
       Markov Chain Algorithm  (LZMA)  as  implemented  in  the  xz  container
       format.  GetData's LZMA Encoding scheme is implemented through the lzma
       library, part of the XZ Utils suite  written  by  Lasse  Collin,  Ville
       Koskinen, and Igor Pavlov.  GetData's LZMA Encoding framework currently
       lacks write capabilities; as  a  result  the  LZMA  Encoding  does  not
       support functions which modify binary data.

       As  with the BZip2 Encoding, GetData caches an uncompressed megabyte of
       data at a time  to  speed  access  times.   A  call  to  get_nframes(3)
       requires  decompression  of  the  entire  binary  file to determine its
       uncompressed size, and may  take  some  time  to  complete.   The  file
       extension of the LZMA Encoding is .xz, or .lzma.

   Slim Encoding
       The  Slim  Encoding  compresses  raw  binary  files  using  the slimlib
       compression library written by Joseph Fowler.  The slimlib library  was
       developed  at  Princeton  University  to  compress  dirfile-like  data.
       GetData's Slim Encoding framework currently lacks  write  capabilities;
       as  a  result, the Slim Encoding does not support function which modify
       binary files.  The file extension of the Slim Encoding is .slm.

AUTHOR

       This manual page was by D. V. Wiebe <dvw@ketiltrout.net>.

SEE ALSO

       dirfile(5), dirfile-format(5), bzip2(1), gzip(1), zlib(3).