Ubuntu Manpage: bgzip - Block compression/decompression utility

NAME

       bgzip - Block compression/decompression utility

       tabix - Generic indexer for TAB-delimited genome position files

SYNOPSIS

       bgzip [-cdhB] [-b virtualOffset] [-s size] [file]

       tabix  [-0lf]  [-p  gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol] [-S lineSkip] [-c
       metaChar] in.tab.bgz [region1 [region2 [...]]]

DESCRIPTION

       Tabix indexes a TAB-delimited genome position file in.tab.bgz and creates  an  index  file
       in.tab.bgz.tbi  when  region  is absent from the command-line. The input data file must be
       position sorted and compressed  by  bgzip  which  has  a  gzip(1)  like  interface.  After
       indexing,  tabix  is  able to quickly retrieve data lines overlapping regions specified in
       the format "chr:beginPos-endPos". Fast data retrieval also works over network  if  URI  is
       given  as  a  file  name  and  in this case the index file will be downloaded if it is not
       present locally.

OPTIONS OF TABIX

       -p STR    Input format for indexing. Valid values are: gff, bed, sam, vcf and psltab. This
                 option  should  not be applied together with any of -s, -b, -e, -c and -0; it is
                 not used for data retrieval because this setting is stored in  the  index  file.
                 [gff]

       -s INT    Column  of sequence name. Option -s, -b, -e, -S, -c and -0 are all stored in the
                 index file and thus not used in data retrieval. [1]

       -b INT    Column of start chromosomal position. [4]

       -e INT    Column of end chromosomal position. The end column can be the same as the  start
                 column. [5]

       -S INT    Skip first INT lines in the data file. [0]

       -c CHAR   Skip lines started with character CHAR. [#]

       -0        Specify  that  the position in the data file is 0-based (e.g. UCSC files) rather
                 than 1-based.

       -h        Print the header/meta lines.

       -B        The second argument is a BED file. When this option is in use,  the  input  file
                 may  not  be  sorted  or  indexed.  The  entire input will be read sequentially.
                 Nonetheless, with this option, the  format  of  the  input  must  be  specificed
                 correctly on the command line.

       -f        Force to overwrite the index file if it is present.

       -l        List the sequence names stored in the index file.

EXAMPLE

       (grep ^"#" in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip > sorted.gff.gz;

       tabix -p gff sorted.gff.gz;

       tabix sorted.gff.gz chr1:10,000,000-20,000,000;

NOTES

       It  is straightforward to achieve overlap queries using the standard B-tree index (with or
       without binning) implemented in all SQL databases, or the R-tree index in  PostgreSQL  and
       Oracle.  But there are still many reasons to use tabix. Firstly, tabix directly works with
       a lot of widely used TAB-delimited formats such as GFF/GTF and BED.  We  do  not  need  to
       design database schema or specialized binary formats. Data do not need to be duplicated in
       different formats, either. Secondly, tabix works on compressed data files while  most  SQL
       databases do not. The GenCode annotation GTF can be compressed down to 4%.  Thirdly, tabix
       is fast. The same indexing algorithm is known to work efficiently for an alignment with  a
       few  billion  short reads. SQL databases probably cannot easily handle data at this scale.
       Last but not the least, tabix supports remote data retrieval. One can put  the  data  file
       and  the index at an FTP or HTTP server, and other users or even web services will be able
       to get a slice without downloading the entire file.

AUTHOR

       Tabix was written by Heng Li. The BGZF library was originally implemented by Bob Handsaker
       and modified by Heng Li for remote file access and in-memory caching.

NAME

SYNOPSIS

DESCRIPTION

OPTIONS OF TABIX

EXAMPLE

NOTES

AUTHOR

SEE ALSO