Provided by: dictzip_1.12.1+dfsg-8_amd64 bug

NAME

       dictzip, dictunzip - compress (or expand) files, allowing random access

SYNOPSIS

       dictzip [options] name
       dictunzip [options] name

DESCRIPTION

       dictzip  compresses  files  using the gzip(1) algorithm (LZ77) in a manner which is completely compatible
       with the gzip file format.  An extension to the gzip file format (Extra Field, described  in  2.3.1.1  of
       RFC 1952) allows extra data to be stored in the header of a compressed file.  Programs like gzip and zcat
       will ignore this extra data.  However, dictd(8), the DICT protocol dictionary server  will  make  use  of
       this  data  to perform pseudo-random access on the file.  Files in the dictzip format should end in ".dz"
       so that they may be distinguished from  common  gzip  files  that  do  not  contain  the  special  header
       information.

       From RFC 1952, the extra field is specified as follows:

              If  the  FLG.FEXTRA  bit is set, an "extra field" is present in the header, with total length XLEN
              bytes.  It consists of a series of subfields, each of the form:

              +---+---+---+---+==================================+
              |SI1|SI2|  LEN  |... LEN bytes of subfield data ...|
              +---+---+---+---+==================================+

              SI1 and SI2 provide a subfield ID, typically two ASCII letters with some  mnemonic  value.   Jean-
              Loup  Gailly <gzip@prep.ai.mit.edu> is maintaining a registry of subfield IDs; please send him any
              subfield ID you wish to use.  Subfield IDs with SI2 = 0 are reserved for future use.

              LEN gives the length of the subfield data, excluding the 4 initial bytes.

       The dictzip program uses 'R' for SI1, and 'A' for SI2 (i.e., "Random Access").  After the LEN field,  the
       data is arranged as follows:

       +---+---+---+---+---+---+===============================+
       |  VER  | CHLEN | CHCNT |  ... CHCNT words of data ...  |
       +---+---+---+---+---+---+===============================+

       As  per RFC 1952, all data is stored least-significant byte first.  For VER 1 of the data, all values are
       16-bits long (2 bytes), and are unsigned integers.

       XLEN (which is specified earlier in the header) is a two byte integer, so the extra field can  be  0xffff
       bytes  long,  2  bytes of which are used for the subfield ID (SI1 and SI1), and 2 bytes of which are used
       for the subfield length (LEN).  This  leaves  0xfffb  bytes  (0x7ffd  2-byte  entries  or  0x3ffe  4-byte
       entries).   Given  that the zip output buffer must be 10% + 12 bytes larger than the input buffer, we can
       store 58969 bytes per entry, or about 1.8GB if the 2-byte entries are used.  If this becomes  a  limiting
       factor, another format version can be selected and defined for 4-byte entries.

       For  compression,  the file is divided up into "chunks" of data, each chunk is less than 64kB, and can be
       compressed into an area that is also less than 64kB long (taking  incompressible  data  into  account  --
       usually  the  data  is  compressed into a block that is much smaller than the original).  The CHLEN field
       specifies the length of a "chunk" of data.  The CHCNT field specifies how many chunks are preset, and the
       CHCNT  words  of data specifies how long each chunk is after compression (i.e., in the current compressed
       file).

       To perform random access on the data, the offset and length of the data are provided to library routines.
       These  routines  determine  the  chunk  in  which  the  desired data begins, and decompresses that chunk.
       Consecutive chunks are decompressed as necessary.

TRADEOFFS

       Speed  True random file access is not realized, since any access, even for a single byte, requires that a
              64kB chunk be read and decompressed.  This is slower than accessing a flat text file, but is much,
              much faster than performing serial access on a fully compressed file.

       Space  For the textual dictionary databases we are working with, the use of 64kB chunks and maximal  LZ77
              compression  realizes  a  file  which is only about 4% larger than the same file compressed all at
              once.

OPTIONS

       -d or --decompress
              Decompress.  This is the default if the executable is called dictunzip.

       -c or --stdout
              Write output on standard output; keep original files  unchanged.   This  is  only  available  when
              decompressing (because parts of the header must be updated after a write when compressing).

       -f or --force
              Force compression or decompression even if the output file already exists.

       -h or --help
              Display help.

       -k or --keep
              Do not delete the original file.

       -l or --list
              For each compressed file, list the following fields:

                  type: dzip, gzip, or text (includes files in unknown formats)
                  crc: CRC checksum
                  date and time: from header
                  chunks: number of chunks in file
                  size: size of each uncompressed chunk
                  compr.: compressed size
                  uncompr.: uncompressed size
                  ratio: compression ratio (0.0% if unknown)
                  name: name of uncompressed file

              Unlike gzip, the compression method is not detected.

       -L or --license
              Display the dictzip license and quit.

       -t or --test
              Check  the  compressed file integrity.  This option is not implemented.  Instead, it will list the
              header information.

       -n or --no-name
              Don't save the original filename and timestamp.

       -v or --verbose
              Verbose. Display extra information during compression.

       -V or --version
              Version. Display the version number and compilation options then quit.

       -s start or --start start
              Specify the offer to start decompression, using decimal numbers.  The default is at the  beginning
              of the file.

       -e size or --size size
              Specify  the size of the portion of the file to decompress, using decimal numbers.  The default is
              the whole file.

       -S start or --Start start
              Specify the offer to start decompression, using base64 numbers.  The default is at  the  beginning
              of the file.

       -E size or --Size start
              Specify  the  size of the portion of the file to decompress, using base64 numbers.  The default is
              the whole file.

       -p prefilter or --pre prefilter
              Specify a shell command to execute as a filter before compression or  decompression  of  a  chunk.
              The  pre-  and  post-compression  filters  can be used to provide additional compression or output
              formatting.  The filters may not increase the buffer  size  significantly.   The  pre-  and  post-
              compression filters were designed to provide the most general interface possible.

       -P postfilter or --post postfilter
              Specify a shell command to execute as a filter after compression or decompression.

CREDITS

       dictzip was written by Rik Faith (faith@cs.unc.edu) and is distributed under the terms of the GNU General
       Public License.  If you need to distribute under other terms, write to the author.

       The main libraries used by this programs (zlib, regex, libmaa) are distributed under different terms,  so
       you  may  be able to use the libraries for applications which are incompatible with the GPL -- please see
       the copyright notices and license information that come with the  libraries  for  more  information,  and
       consult with your attorney to resolve these issues.

SEE ALSO

       dict(1), dictd(8), gzip(1), gunzip(1), zcat(1)

                                                   22 Jun 1997                                        DICTZIP(1)