lunar (1) dictunzip.1.gz

Provided by: dictzip_1.13.0+dfsg-1build2_amd64 bug

NAME

       dictzip, dictunzip - compress (or expand) files, allowing random access

SYNOPSIS

       dictzip [options] name
       dictunzip [options] name

DESCRIPTION

       dictzip  compresses  files  using  the  gzip(1)  algorithm  (LZ77)  in  a  manner which is
       completely compatible with the gzip file format.  An extension to  the  gzip  file  format
       (Extra  Field,  described  in  2.3.1.1  of RFC 1952) allows extra data to be stored in the
       header of a compressed file.  Programs like gzip and zcat will  ignore  this  extra  data.
       However,  dictd(8),  the  DICT  protocol  dictionary  server will make use of this data to
       perform pseudo-random access on the file.  Files in the dictzip format should end in ".dz"
       so  that  they may be distinguished from common gzip files that do not contain the special
       header information.

       From RFC 1952, the extra field is specified as follows:

              If the FLG.FEXTRA bit is set, an "extra field" is present in the header, with total
              length XLEN bytes.  It consists of a series of subfields, each of the form:

              +---+---+---+---+==================================+
              |SI1|SI2|  LEN  |... LEN bytes of subfield data ...|
              +---+---+---+---+==================================+

              SI1  and  SI2 provide a subfield ID, typically two ASCII letters with some mnemonic
              value.  Jean-Loup  Gailly  <gzip@prep.ai.mit.edu>  is  maintaining  a  registry  of
              subfield  IDs;  please send him any subfield ID you wish to use.  Subfield IDs with
              SI2 = 0 are reserved for future use.

              LEN gives the length of the subfield data, excluding the 4 initial bytes.

       The dictzip program uses 'R' for SI1, and 'A' for SI2 (i.e., "Random Access").  After  the
       LEN field, the data is arranged as follows:

       +---+---+---+---+---+---+===============================+
       |  VER  | CHLEN | CHCNT |  ... CHCNT words of data ...  |
       +---+---+---+---+---+---+===============================+

       As  per RFC 1952, all data is stored least-significant byte first.  For VER 1 of the data,
       all values are 16-bits long (2 bytes), and are unsigned integers.

       XLEN (which is specified earlier in the header) is a two byte integer, so the extra  field
       can be 0xffff bytes long, 2 bytes of which are used for the subfield ID (SI1 and SI1), and
       2 bytes of which are used for the subfield length (LEN).  This leaves 0xfffb bytes (0x7ffd
       2-byte  entries or 0x3ffe 4-byte entries).  Given that the zip output buffer must be 10% +
       12 bytes larger than the input buffer, we can store 58969 bytes per entry, or about  1.8GB
       if the 2-byte entries are used.  If this becomes a limiting factor, another format version
       can be selected and defined for 4-byte entries.

       For compression, the file is divided up into "chunks" of data, each  chunk  is  less  than
       64kB,  and  can  be  compressed  into  an  area  that  is also less than 64kB long (taking
       incompressible data into account -- usually the data is compressed into a  block  that  is
       much  smaller  than  the  original).  The CHLEN field specifies the length of a "chunk" of
       data.  The CHCNT field specifies how many chunks are preset, and the CHCNT words  of  data
       specifies how long each chunk is after compression (i.e., in the current compressed file).

       To  perform  random  access on the data, the offset and length of the data are provided to
       library routines.  These routines determine the chunk in which the  desired  data  begins,
       and decompresses that chunk.  Consecutive chunks are decompressed as necessary.

TRADEOFFS

       Speed  True  random file access is not realized, since any access, even for a single byte,
              requires that a 64kB chunk be read and decompressed.  This is slower than accessing
              a flat text file, but is much, much faster than performing serial access on a fully
              compressed file.

       Space  For the textual dictionary databases we are working with, the use  of  64kB  chunks
              and maximal LZ77 compression realizes a file which is only about 4% larger than the
              same file compressed all at once.

OPTIONS

       -d or --decompress
              Decompress.  This is the default if the executable is called dictunzip.

       -c or --stdout
              Write output on standard output; keep  original  files  unchanged.   This  is  only
              available  when  decompressing (because parts of the header must be updated after a
              write when compressing).

       -f or --force
              Force compression or decompression even if the output file already exists.

       -h or --help
              Display help.

       -k or --keep
              Do not delete the original file.

       -l or --list
              For each compressed file, list the following fields:

                  type: dzip, gzip, or text (includes files in unknown formats)
                  crc: CRC checksum
                  date and time: from header
                  chunks: number of chunks in file
                  size: size of each uncompressed chunk
                  compr.: compressed size
                  uncompr.: uncompressed size
                  ratio: compression ratio (0.0% if unknown)
                  name: name of uncompressed file

              Unlike gzip, the compression method is not detected.

       -L or --license
              Display the dictzip license and quit.

       -t or --test
              Check the compressed file integrity.  This option is not implemented.  Instead,  it
              will list the header information.

       -n or --no-name
              Don't save the original filename and timestamp.

       -v or --verbose
              Verbose. Display extra information during compression.

       -V or --version
              Version. Display the version number and compilation options then quit.

       -s start or --start start
              Specify the offer to start decompression, using decimal numbers.  The default is at
              the beginning of the file.

       -e size or --size size
              Specify the size of the portion of the file to decompress, using  decimal  numbers.
              The default is the whole file.

       -S start or --Start start
              Specify  the offer to start decompression, using base64 numbers.  The default is at
              the beginning of the file.

       -E size or --Size start
              Specify the size of the portion of the file to decompress,  using  base64  numbers.
              The default is the whole file.

       -p prefilter or --pre prefilter
              Specify  a shell command to execute as a filter before compression or decompression
              of a chunk.   The  pre-  and  post-compression  filters  can  be  used  to  provide
              additional  compression  or  output  formatting.   The filters may not increase the
              buffer size significantly.  The pre- and post-compression filters were designed  to
              provide the most general interface possible.

       -P postfilter or --post postfilter
              Specify a shell command to execute as a filter after compression or decompression.

CREDITS

       dictzip  was written by Rik Faith (faith@cs.unc.edu) and is distributed under the terms of
       the GNU General Public License.  If you need to distribute under other terms, write to the
       author.

       The  main  libraries  used  by  this  programs (zlib, regex, libmaa) are distributed under
       different terms, so you may be able to  use  the  libraries  for  applications  which  are
       incompatible with the GPL -- please see the copyright notices and license information that
       come with the libraries for more information, and consult with your  attorney  to  resolve
       these issues.

SEE ALSO

       dict(1), dictd(8), gzip(1), gunzip(1), zcat(1)

                                           22 Jun 1997                                 DICTZIP(1)