lunar (1) bamtofastq.1.gz

Provided by: biobambam2_2.0.185+ds-1_amd64 bug

NAME

       bamtofastq - convert SAM, BAM or CRAM files to FastQ

SYNOPSIS

       bamtofastq [options]

DESCRIPTION

       bamtofastq  reads a SAM, BAM or CRAM file from standard input and converts it to the FastQ
       format. The output can be split into multiple files according to the  pair  flags  of  the
       reads  involved.  bamtofastq  can  collate the source reads according to their read names,
       i.e. place pairs of reads next to each other in the output. bamtofastq writes  its  output
       to  the  standard  output  channel by default. All output channels can be compressed using
       gzip.

       The following key=value pairs can be given:

       F=<stdout>: output file for the first mates of pairs if collation is active.

       F2=<stdout>: output file for the second mates of pairs if collation is active.

       S=<stdout>: output file for single end reads if collation is active.

       O=<stdout>: output file for unmatched (orphan) first mates if collation is active.

       O2=<stdout>: output file for unmatched (orphan) second mates if collation is active.

       collate=<0|1>: Valid values are

       1:     collate read pairs

       0:     output reads to standard output in the order in which they appear in the BAM file

       combs=<0|1>: print some counts after finishing collation based output

       filename=<stdin>: input file name (data is read from standard input if this option is  not
       given)

       inputformat=<bam>:  input file format All versions of bamtofastq come with support for the
       BAM input format. If the program in addition is linked to the  io_lib  package,  then  the
       following options are valid:

       bam:   BAM (see http://samtools.sourceforge.net/SAM1.pdf)

       sam:   SAM (see http://samtools.sourceforge.net/SAM1.pdf)

       cram:  CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit)

       reference=:  file  name  of the reference for CRAM input files. If this key is unset, then
       the CRAM file header will be scanned for obtaining a reference file name.

       exclude=<SECONDARY>: Do not include reads in the output that have any of the  given  flags
       set. The flags are given separated by commas. Valid flags are:

       PAIRED:
              read was paired in sequencing

       PROPER_PAIR:
              read has been mapped as part of a proper pair

       UNMAP: read was not mapped

       MUNMAP:
              mate of read was not mapped

       REVERSE:
              read was mapped to the reverse strand

       MREVERSE:
              mate of read was mapped to the reverse strand

       READ1: read was first read of a pair during sequencing

       READ2: read was second read of a pair during sequencing

       SECONDARY:
              alignment is secondary, i.e. an alternative mapping to the primary alignment in the
              same file

       QCFAIL:
              read as marked as having failed quality control

       DUP:   read  is  marked  as  a  duplicate  of  another  read  in  the   same   file   (see
              bammarkduplicates)

       SUPPLEMENTARY:
              read is marked as supplementary alignment

       disablevalidation=<0>: Valid values are

       0:     run input file validation on alignments (this is the default)

       1:     do  not  check  the validity of the input file (this may help for some broken input
              files, but it is a security risk as it can lead to the execution of arbitrary  code
              through a forged input file).

       colhlog=<18>  base  two  logarithm  of  the size of the hash table used for collation (the
       default value is 18 and should work reasonably well for most input files.  Please see  the
       biobambam paper at arxiv.org/abs/1306.0836 for details).

       colsbs=<128M>  size  of hash table overflow list in bytes (the default is 128MB and should
       work  reasonably  well  for  most  input  files.  Please  see  the  biobambam   paper   at
       arxiv.org/abs/1306.0836 for details).

       T=<bamtofastq_hostname_pid_time> file name of temporary file used for collation

       ranges=<>:  coordinate ranges selected from input. This option is only available for input
       files in BAM and CRAM format which have a corresponding index file (.bai  for  BAM,  .crai
       for  CRAM)  and  if  input  is via file (i.e. the filename argument is set).  Valid ranges
       consist of either

       whole reference sequence:
              a whole reference sequence (e.g. "chr1")

       half open interval on reference sequence:
              an interval on a reference sequence half open on the right (e.g. "chr1:50000" which
              means alignments overlapping chr1 from position 50000 to the end of chr1)

       interval on reference sequence:
              an interval on a reference sequence (e.g. "chr1:50000-60000" which means alignments
              overlapping positions 50000 to 60000 on chr1)

       For   BAM   input   multiple   ranges   are   separated   by   space   characters    (e.g.
       ranges="chr1:10000-20000 chr1:30000-40000").  CRAM input supports a single range only.

       gz=<[0|1]>: compress output files using gzip. By default output is uncompressed.

       level=<-1|0|1|9|11>:  set compression level of the output FastQ/FastA files if gz=1. Valid
       values are

       -1:    zlib/gzip default compression level

       0:     uncompressed

       1:     zlib/gzip level 1 (fast) compression

       9:     zlib/gzip level 9 (best) compression

       If libmaus has been compiled with support for  igzip  (see  https://software.intel.com/en-
       us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-
       data) then an additional valid value is

       11:    igzip compression

       fasta=<0|1>: output FastA instead of FastQ if fasta=1.

       outputperreadgroup=<0|1> split output by read group if  outputperreadgroup=1  (default  is
       0).  If  splitting by read group is performed then no output is written on standard output
       but all data is written to files. The file names will be generated using the outputdir and
       outputperreadgroupsuffix parameters and read group names.

       outputdir=<>  output  directory  if  outputperreadgroup=1. By default the output files are
       generated in the current directory.

       outputperreadgrouprgsm=<0|1> include SM  field  of  read  group  in  output  filenames  if
       outputperreadgroup=1 (default is 0)

       outputperreadgroupprefix=  add  given  prefix  ahead of file names if outputperreadgroup=1
       (default is to add no prefix)

       outputperreadgroupsuffixF=<_1.fq> output file name suffix  for  first  mates  of  complete
       pairs if outputperreadgroup=1.  Default is _1.fq if gz=0 and _1.fq.gz for gz=1.

       outputperreadgroupsuffixF2=<_2.fq>  output  file  name suffix for second mates of complete
       pairs if outputperreadgroup=1.  Default is _2.fq if gz=0 and _2.fq.gz for gz=1.

       outputperreadgroupsuffixO=<_o1.fq> output file name suffix for first mates  of  incomplete
       pairs if outputperreadgroup=1.  Default is _o1.fq if gz=0 and _o1.fq.gz for gz=1.

       outputperreadgroupsuffixO2=<_o2.fq> output file name suffix for second mates of incomplete
       pairs if outputperreadgroup=1.  Default is _o2.fq if gz=0 and _o2.fq.gz for gz=1.

       outputperreadgroupsuffixS=<_s.fq> output  file  name  suffix  for  singled  end  reads  if
       outputperreadgroup=1.  Default is _s.fq if gz=0 and _s.fq.gz for gz=1.

       tryoq=<0|1>:  use  content  of  OQ  aux  field  if  present  instead of quality field when
       converting to FastQ. By default the quality field  is  used.   This  option  is  currently
       mutually exclusive with the tags option.

       tags=<>:  provide a comma separated list of aux fields which will be copied from the input
       alignment records to the comment section of the output FastQ records.  By default  no  aux
       fields are copied.  This option is currently mutually exclusive with the tryoq option.

       split=<0>:  split  named output files into chunks of this number of reads. The output file
       names will be extended by _NNNNNN if gz=0 and by _NNNNNN.gz if gz=1 where  NNNNNN  denotes
       the  NNNNNN+1'th output file (i.e. numbers start with 000000).  The suffixes k, m, g, K, M
       and G can be used to denote that the argument is to be multiplied by 1024, 1024^2, 1024^3,
       1000, 1000^2 or 1000^3 respectively.

       cols=<>:  If  set  to  an unsigned number then wrap the sequence and quality lines at this
       number of columns. By default no wrapping is performed.

       splitprefix=<bamtofastq_split>: file prefix if split>0 and collate=0.

       casava18=<0>: produce read names as expected by the c18pe input option of fastqtobam using
       the ne aux fields produced by fastqtobam.

       maxoutput=<>:  produce no more than this number of output records.  By default there is no
       limit. This option is only active for collate=0.

AUTHOR

       Written by German Tischler.

REPORTING BUGS

       Report bugs to <germant@miltenyibiotec.de>

       Copyright © 2009-2014 German Tischler,  ©  2011-2014  Genome  Research  Limited.   License
       GPLv3+: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
       This  is free software: you are free to change and redistribute it.  There is NO WARRANTY,
       to the extent permitted by law.