Ubuntu Manpage: bamstreamingmarkduplicates

Provided by: biobambam2_2.0.185+ds-1_amd64

NAME

       bamstreamingmarkduplicates - mark duplicate reads

SYNOPSIS

       bamstreamingmarkduplicates [options]

DESCRIPTION

       bamstreamingmarkduplicates reads a coordinate sorted BAM, SAM or CRAM file, which has been
       previously processed by bamsort using  the  options  fixmates=1  and  adddupmarksupport=1,
       marks  duplicate  read  pairs  and reads and writes the resulting file in BAM, SAM or CRAM
       format. The preprocessing of the file using bamsort with the stated options is  mandatory,
       i.e.   bamstreamingmarkduplicates  will  fail without it. In contrast to bammarkduplicates
       and bammarkduplicates2 the streaming variant bamstreamingmarkduplicates processes the file
       in  a  single pass.  bamstreamingmarkduplicates cannot handle files containing orphan pair
       ends (pairs where one of the two ends is missing in the file).

       The following key=value pairs can be given:

       M=<>: file name for metrics data. By default the metrics data is written on  the  standard
       error channel.

       level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are

       -1:    zlib/gzip default compression level

       0:     uncompressed

       1:     zlib/gzip level 1 (fast) compression

       9:     zlib/gzip level 9 (best) compression

       If  libmaus  has  been compiled with support for igzip (see https://software.intel.com/en-
       us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-
       data) then an additional valid value is

       11:    igzip compression

       verbose=<1>: Valid values are

       1:     print progress report on standard error

       0:     do not print progress report

       tmpfile=<filename>: set the prefix for temporary file names

       disablevalidation=<0|1>: sets whether input validation is performed. Valid values are

       0:     validation is enabled (default)

       1:     validation is disabled

       md5=<0|1>:  md5  checksum  creation  for  output  file.  This  option can only be given if
       outputformat=bam. Then valid values are

       0:     do not compute checksum. This is the default.

       1:     compute checksum. If the md5filename key is set, then the checksum  is  written  to
              the given file. If md5filename is unset, then no checksum will be computed.

       md5filename file name for md5 checksum if md5=1.

       index=<0|1>:  compute  BAM  index  for  output  file.  This  option  can  only be given if
       outputformat=bam. Then valid values are

       0:     do not compute BAM index. This is the default.

       1:     compute BAM index. If the indexfilename key is set, then the BAM index  is  written
              to the given file. If indexfilename is unset, then no BAM index will be computed.

       indexfilename file name for output BAM index if index=1.

       inputformat=<bam>:  input  file  format.   All versions of bamstreamingmarkduplicates come
       with support for the BAM input format. If the program in addition is linked to the  io_lib
       package, then the following options are valid:

       bam:   BAM (see http://samtools.sourceforge.net/SAM1.pdf)

       sam:   SAM (see http://samtools.sourceforge.net/SAM1.pdf)

       cram:  CRAM (see http://www.ebi.ac.uk/ena/about/cram_toolkit)

       outputformat=<bam>:  output  file format.  All versions of bamstreamingmarkduplicates come
       with support for the BAM output format. If the program in addition is linked to the io_lib
       package, then the following options are valid:

       bam:   BAM (see http://samtools.sourceforge.net/SAM1.pdf)

       sam:   SAM (see http://samtools.sourceforge.net/SAM1.pdf)

       cram:  CRAM   (see   http://www.ebi.ac.uk/ena/about/cram_toolkit).   This  format  is  not
              advisable for data sorted by query name.

       I=<[stdin]>: input filename, standard input if unset.

       O=<[stdout]>: output filename, standard output if unset.

       inputthreads=<[1]>: input helper threads, only valid for inputformat=bam.

       outputthreads=<[1]>: output helper threads, only valid for outputformat=bam.

       reference=<[]>: reference FastA file for inputformat=cram and outputformat=cram. An  index
       file (.fai) is required.

       tag=<tag>  name  of auxiliary field storing tag information in string form. Read fragments
       or pairs with different tags will not be considered as  duplicates,  even  they  would  be
       according  to  their mapping coordinates. For pairs the tag field information of the first
       and second mate are concatenated to obtain the tag of the pair.

       nucltag=<tag> this option works like the tag option but  is  restricted  to  sequences  of
       nucleotides (A,C,G or T) as tags. The length of each tag sequence is not allowed to exceed
       15 bases. All tags are required to have the same length.  Each non  nucleotide  symbol  is
       mapped  to  A.  In contrast to the tag option, nucltag uses less memory for processing and
       can be expected to be faster.

       filterdupmarktags=<[0]>: remove the auxiliary fields MC, MQ, MS, and MT used for streaming
       duplicate marking when producing the output file. By default the fields are not removed.

AUTHOR

       Written by German Tischler.

REPORTING BUGS

       Report bugs to <germant@miltenyibiotec.de>

COPYRIGHT

       Copyright  ©  2009-2014  German  Tischler,  ©  2011-2014 Genome Research Limited.  License
       GPLv3+: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
       This is free software: you are free to change and redistribute it.  There is NO  WARRANTY,
       to the extent permitted by law.