Ubuntu Manpage: bgzip - Block compression/decompression utility

Provided by: tabix_1.13+ds-2build1_amd64

NAME

       bgzip - Block compression/decompression utility

SYNOPSIS

       bgzip  [-cdfhir]  [-b  virtualOffset] [-I index_name] [-l compression_level] [-s size] [-@
       threads] [file]

DESCRIPTION

       Bgzip compresses files in a similar manner to, and compatible with, gzip(1).  The file  is
       compressed  into  a series of small (less than 64K) 'BGZF' blocks.  This allows indexes to
       be built against the compressed file and used to retrieve portions  of  the  data  without
       having to decompress the entire file.

       If  no  files are specified on the command line, bgzip will compress (or decompress if the
       -d option is used) standard input to standard output.  If a file is specified, it will  be
       compressed  (or  decompressed  with  -d).   If  the  -c option is used, the result will be
       written to standard output, otherwise when compressing bgzip will write to a new file with
       a  .gz  suffix and remove the original.  When decompressing the input file must have a .gz
       suffix, which will be  removed  to  make  the  output  name.   Again  after  decompression
       completes the input file will be removed.

OPTIONS

-b, --offset INT
Decompress to standard output from virtual file position (0-based uncompressed
offset). Implies -c and -d.

-c, --stdout
Write to standard output, keep original files unchanged.

-d, --decompress
Decompress.

-f, --force
Overwrite files without asking, or decompress files that don't have a known
compression filename extension (e.g., .gz) without asking. Use --force twice to
do both without asking.

-h, --help
Displays a help message.

-i, --index
Create a BGZF index while compressing. Unless the -I option is used, this will
have the name of the compressed file with .gzi appended to it.

-I, --index-name FILE
Index file name.

-l, --compress-level INT
Compression level to use when compressing. From 0 to 9, or -1 for the default
level set by the compression library. [-1]

-r, --reindex
Rebuild the index on an existing compressed file.

-g, --rebgzip
Try to use an existing index to create a compressed file with matching block
offsets. Note that this assumes that the same compression library and level are
in use as when making the original file. Don't use it unless you know what
you're doing.

-s, --size INT
Decompress INT bytes (uncompressed size) to standard output. Implies -c.

-@, --threads INT
Number of threads to use [1].

BGZF FORMAT

       The  BGZF  format  written by bgzip is described in the SAM format specification available
       from http://samtools.github.io/hts-specs/SAMv1.pdf.

       It makes use of a gzip feature which allows compressed  files  to  be  concatenated.   The
       input  data  is  divided into blocks which are no larger than 64 kilobytes both before and
       after compression (including compression headers).  Each block is compressed into  a  gzip
       file.   The gzip header includes an extra sub-field with identifier 'BC' and the length of
       the compressed block, including all headers.

GZI FORMAT

       The index format is a binary file listing pairs of compressed and uncompressed offsets  in
       a  BGZF  file.   Each  compressed  offset  points  to  the  start  of  a  BGZF block.  The
       uncompressed offset is the corresponding location in the uncompressed data stream.

       All values are stored as little-endian 64-bit unsigned integers.

       The file contents are:

           uint64_t number_entries

       followed by number_entries pairs of:

           uint64_t compressed_offset
           uint64_t uncompressed_offset

EXAMPLES

           # Compress stdin to stdout
           bgzip < /usr/share/dict/words > /tmp/words.gz

           # Make a .gzi index
           bgzip -r /tmp/words.gz

           # Extract part of the data using the index
           bgzip -b 367635 -s 4 /tmp/words.gz

           # Uncompress the whole file, removing the compressed copy
           bgzip -d /tmp/words.gz

AUTHOR

       The BGZF library was originally implemented by Bob Handsaker and modified by Heng  Li  for
       remote file access and in-memory caching.