Ubuntu Manpage: dictzip, dictunzip - compress (or expand) files, allowing random access

Provided by: dictzip_1.12.1+dfsg-8_amd64

NAME

       dictzip, dictunzip - compress (or expand) files, allowing random access

SYNOPSIS

       dictzip [options] name
       dictunzip [options] name

DESCRIPTION

       dictzip  compresses  files  using the gzip(1) algorithm (LZ77) in a manner which is completely compatible
       with the gzip file format.  An extension to the gzip file format (Extra Field, described  in  2.3.1.1  of
       RFC 1952) allows extra data to be stored in the header of a compressed file.  Programs like gzip and zcat
       will ignore this extra data.  However, dictd(8), the DICT protocol dictionary server  will  make  use  of
       this  data  to perform pseudo-random access on the file.  Files in the dictzip format should end in ".dz"
       so that they may be distinguished from  common  gzip  files  that  do  not  contain  the  special  header
       information.

       From RFC 1952, the extra field is specified as follows:

              If  the  FLG.FEXTRA  bit is set, an "extra field" is present in the header, with total length XLEN
              bytes.  It consists of a series of subfields, each of the form:

              +---+---+---+---+==================================+
              |SI1|SI2|  LEN  |... LEN bytes of subfield data ...|
              +---+---+---+---+==================================+

              SI1 and SI2 provide a subfield ID, typically two ASCII letters with some  mnemonic  value.   Jean-
              Loup  Gailly <gzip@prep.ai.mit.edu> is maintaining a registry of subfield IDs; please send him any
              subfield ID you wish to use.  Subfield IDs with SI2 = 0 are reserved for future use.

              LEN gives the length of the subfield data, excluding the 4 initial bytes.

       The dictzip program uses 'R' for SI1, and 'A' for SI2 (i.e., "Random Access").  After the LEN field,  the
       data is arranged as follows:

       +---+---+---+---+---+---+===============================+
       |  VER  | CHLEN | CHCNT |  ... CHCNT words of data ...  |
       +---+---+---+---+---+---+===============================+

       As  per RFC 1952, all data is stored least-significant byte first.  For VER 1 of the data, all values are
       16-bits long (2 bytes), and are unsigned integers.

       XLEN (which is specified earlier in the header) is a two byte integer, so the extra field can  be  0xffff
       bytes  long,  2  bytes of which are used for the subfield ID (SI1 and SI1), and 2 bytes of which are used
       for the subfield length (LEN).  This  leaves  0xfffb  bytes  (0x7ffd  2-byte  entries  or  0x3ffe  4-byte
       entries).   Given  that the zip output buffer must be 10% + 12 bytes larger than the input buffer, we can
       store 58969 bytes per entry, or about 1.8GB if the 2-byte entries are used.  If this becomes  a  limiting
       factor, another format version can be selected and defined for 4-byte entries.

       For  compression,  the file is divided up into "chunks" of data, each chunk is less than 64kB, and can be
       compressed into an area that is also less than 64kB long (taking  incompressible  data  into  account  --
       usually  the  data  is  compressed into a block that is much smaller than the original).  The CHLEN field
       specifies the length of a "chunk" of data.  The CHCNT field specifies how many chunks are preset, and the
       CHCNT  words  of data specifies how long each chunk is after compression (i.e., in the current compressed
       file).

       To perform random access on the data, the offset and length of the data are provided to library routines.
       These  routines  determine  the  chunk  in  which  the  desired data begins, and decompresses that chunk.
       Consecutive chunks are decompressed as necessary.

TRADEOFFS

       Speed  True random file access is not realized, since any access, even for a single byte, requires that a
              64kB chunk be read and decompressed.  This is slower than accessing a flat text file, but is much,
              much faster than performing serial access on a fully compressed file.

       Space  For the textual dictionary databases we are working with, the use of 64kB chunks and maximal  LZ77
              compression  realizes  a  file  which is only about 4% larger than the same file compressed all at
              once.

OPTIONS

-d or --decompress
Decompress. This is the default if the executable is called dictunzip.

-c or --stdout
Write output on standard output; keep original files unchanged. This is only available when
decompressing (because parts of the header must be updated after a write when compressing).

-f or --force
Force compression or decompression even if the output file already exists.

-h or --help
Display help.

-k or --keep
Do not delete the original file.

-l or --list
For each compressed file, list the following fields:

type: dzip, gzip, or text (includes files in unknown formats)
crc: CRC checksum
date and time: from header
chunks: number of chunks in file
size: size of each uncompressed chunk
compr.: compressed size
uncompr.: uncompressed size
ratio: compression ratio (0.0% if unknown)
name: name of uncompressed file

Unlike gzip, the compression method is not detected.

-L or --license
Display the dictzip license and quit.

-t or --test
Check the compressed file integrity. This option is not implemented. Instead, it will list the
header information.

-n or --no-name
Don't save the original filename and timestamp.

-v or --verbose
Verbose. Display extra information during compression.

-V or --version
Version. Display the version number and compilation options then quit.

-s start or --start start
Specify the offer to start decompression, using decimal numbers. The default is at the beginning
of the file.

-e size or --size size
Specify the size of the portion of the file to decompress, using decimal numbers. The default is
the whole file.

-S start or --Start start
Specify the offer to start decompression, using base64 numbers. The default is at the beginning
of the file.

-E size or --Size start
Specify the size of the portion of the file to decompress, using base64 numbers. The default is
the whole file.

-p prefilter or --pre prefilter
Specify a shell command to execute as a filter before compression or decompression of a chunk.
The pre- and post-compression filters can be used to provide additional compression or output
formatting. The filters may not increase the buffer size significantly. The pre- and post-
compression filters were designed to provide the most general interface possible.

-P postfilter or --post postfilter
Specify a shell command to execute as a filter after compression or decompression.

CREDITS