Ubuntu Manpage: bammarkduplicatesopt - mark duplicate reads/alignments in BAM files

name
synopsis
description
author
reporting bugs
copyright

Provided by: biobambam2_2.0.185+ds-1_amd64

NAME

       bammarkduplicatesopt - mark duplicate reads/alignments in BAM files

SYNOPSIS

       bammarkduplicatesopt [options]

DESCRIPTION

       bammarkduplicatesopt  reads  a  SAM/BAM/CRAM  file  containing  alignments  computed by an
       aligner, marks duplicate reads/alignments using the  coordinates  of  the  alignments  and
       writes  the  marked alignments to a SAM/BAM or CRAM file. By default input is via standard
       input and output via standard output. Duplication metrics are  provided  on  the  standard
       error  channel by default. bammarkduplicatesopt scans the input BAM file twice, it is thus
       of benefit to set the I key (described below) to give the name of an input  file.  If  the
       input  BAM  file is given via standard input, then a copy of the input will be stored in a
       temporary file during the first run for  processing  in  a  second  run.  In  contrast  to
       bammarkduplicates   this   program   (bammarkduplicatesopt)   specifically  marks  optical
       duplicates. Each optical duplicate read pair is marked with an  aux  field  stating  which
       read in it's optical cluster is not considered to be an optical duplicate.

       The following key=value pairs can be given:

       I=<stdin>: name of the input file (input is read from standard input if not set). This key
       can be given multiple times. If it is given multiple times, then the input files  need  to
       be  in  coordinate sorted order and will be merged for output such that the output will be
       coordinate sorted.

       O=<stdout>: name of the output file (output is written to standard output if not set)

       M=<stderr>: name of the metrics file (metrics are written to standard error if not set)

       tmpfile=<filename>: prefix for temporary files. By default the temporary files are created
       in the current directory

       level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are

       -1:    zlib/gzip default compression level

       0:     uncompressed

       1:     zlib/gzip level 1 (fast) compression

       9:     zlib/gzip level 9 (best) compression

       If  libmaus  has  been compiled with support for igzip (see https://software.intel.com/en-
       us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-
       data) then an additional valid value is

       11:    igzip compression

       markthreads=<1>: Number of threads used during marking duplicate alignments.

       verbose=<1>: Valid values are

       1:     print progress report on standard error

       0:     do not print progress report

       mod=<1048576>:  print progress after processing every mod'th input read/alignment (default
       is 1M)

       rewritebam=<0|1|2>: compression type of the temporary file stored if I key  is  not  given
       (input is via standard input). Valid values are

       0:     temporary    data    is    compressed    using    snappy    (if    available,   see
              http://code.google.com/p/snappy/)

       1:     gzip/BAM, recompress input and store it as a BAM file

       2:     copy, produce a 1/1 copy of the input stream as a temporary file

       rewritebamlevel=<-1|0|1|9|11>: compression level of temporary BAM file if rewritebam is  1
       (see level key for possible values).

       colhashbits=<20>: base two logarithm of the size of the hash table used for collation (the
       default value is 18 and should work reasonably well for most input files.  Please see  the
       biobambam paper at arxiv.org/abs/1306.0836 for details).

       collistsize=<33554432>:  size  of  hash table overflow list in bytes (the default is 128MB
       and should work reasonably well for most input files. Please see the  biobambam  paper  at
       arxiv.org/abs/1306.0836 for details).

       fragbufsize=<50331648>:    size    of    each   fragment/pair   file   buffer   in   bytes
       (bammarkduplicatesopt uses two such buffer for detecting duplicates)

       rmdup=<0|1>: sets how duplicates are handled

       0:     duplicates will be retained in the output file and have the duplication flag set

       1:     duplicates will be removed when writing the output file

       md5=<0|1>: md5 checksum creation for output file. Valid values are

       0:     do not compute checksum. This is the default.

       1:     compute checksum. If the md5filename key is set, then the checksum  is  written  to
              the  given  file.  If  the  md5filename  key  is not given but set O key is set the
              checksum will be written as output file name with .md5  appended.   If  md5filename
              and O are unset, then no checksum will be computed.

       md5filename file name for md5 checksum if md5=1.

       index=<0|1>: compute BAM index for output file. Valid values are

       0:     do not compute BAM index. This is the default.

       1:     compute  BAM  index. If the indexfilename key is set, then the BAM index is written
              to the given file. If the indexfilename key is not given but set O key is  set  the
              checksum  will be written as output file name with .bai appended.  If indexfilename
              and O are unset, then no BAM index will be computed.

       indexfilename file name for BAM index if index=1.

       tag=<tag> name of auxiliary field storing tag information in string form.  Read  fragments
       or  pairs  with  different  tags  will not be considered as duplicates, even they would be
       according to their mapping coordinates. For pairs the tag field information of  the  first
       and second mate are concatenated to obtain the tag of the pair.

       nucltag=<tag>  this  option  works  like  the tag option but is restricted to sequences of
       nucleotides (A,C,G or T) as tags. The length of each tag sequence is not allowed to exceed
       15  bases.  All  tags are required to have the same length.  Each non nucleotide symbol is
       mapped to A. In contrast to the tag option, nucltag uses less memory  for  processing  and
       can be expected to be faster.

       D  output file name for removed duplicates if rmdup=1. By default the reads and read pairs
       marked as duplicates are discarded when rmdup=1. If the D key is set, then the sequence of
       reads  which  is not written to the output file is written to a separate file. The name of
       this separate file is the value of the D key.

       dupmd5=<0|1>: compute md5 checksum of file given by D key if rmdup=1. By  default  no  md5
       checksum is computed for this file.

       dupmd5filename:  file name for storing the md5 checksum of the D file if rmdup=1, D is set
       and dupmd5=1.

       dupindex=<0|1>: compute BAM index for file given by D key if rmdup=1. By default no  index
       is compute for this file.

       dupindexfilename:  file  name for storing the BAM index of the D file if rmdup=1, D is set
       and dupmd5=1.

       optminpixeldif=<100>: distance  (x  and  y  inside  same  tile)  inside  which  reads  are
       considered as optical duplicates

       odtag=<od>: aux field tag for storing non dup read pair name for optical duplicates

       addmatecigar=<0>: add mate cigar aux field MC and mate coordinate field mc

AUTHOR

       Written by German Tischler.

REPORTING BUGS

       Report bugs to <germant@miltenyibiotec.de>

COPYRIGHT

       Copyright  ©  2009-2017  German  Tischler,  ©  2011-2014 Genome Research Limited.  License
       GPLv3+: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
       This is free software: you are free to change and redistribute it.  There is NO  WARRANTY,
       to the extent permitted by law.