Ubuntu Manpage: bamsormadup - sort name collated SAM or BAM file by coordinate and mark duplicates or sort

Provided by: biobambam2_2.0.183+ds-1_amd64

NAME

       bamsormadup - sort name collated SAM or BAM file by coordinate and mark duplicates or sort
       SAM or BAM file by query name

SYNOPSIS

       bamsormadup [options]

DESCRIPTION

       bamsormadup has two modes of operation depending on the value  of  the  SO  parameter.  If
       SO=coordinate or if the SO key is not given, then bamsormadup reads a name collated BAM or
       SAM file from standard input, runs a fix mate process, sorts the contained  alignments  by
       coordinate, marks duplicate alignments and writes the sorted alignments to standard output
       in BAM format. An alignment file is name collated if all the alignments for one read  name
       appear consecutively in the file. If SO=queryname then the program reads a BAM or SAM file
       from standard input, sorts it by queryname and writes the sorted file on  standard  output
       in BAM format.

       The following key=value pairs can be given:

       level=<-1|0|1|9|11>: set compression level of the output BAM file. Valid values are

       -1:    zlib/gzip default compression level

       0:     uncompressed

       1:     zlib/gzip level 1 (fast) compression

       9:     zlib/gzip level 9 (best) compression

       If  libmaus  has  been compiled with support for igzip (see https://software.intel.com/en-
       us/articles/igzip-a-high-performance-deflate-compressor-with-optimizations-for-genomic-
       data) then an additional valid value is

       11:    igzip compression

       inputformat=<bam>:  set  the  input  file  format.   This  can  be  either bam or sam (see
       http://samtools.sourceforge.net/SAM1.pdf)

       threads=<[1]>: number of threads used

       M=<stderr>: name of the metrics  file  for  duplicate  marking  (metrics  are  written  to
       standard error if not set)

       tmpfile=<bamsormadup_hostname_pid_starttime>:  prefix  for temporary files. By default the
       temporary files are created in the  current  directory.   Set  tmpfile=mem:tmp_  to  store
       temporary  files  in RAM instead of on disk. Note that this may require very large amounts
       of RAM depending on the input.

       SO=<coordinate|queryname>: set the sort order. Valid values are

       coordinate
              sort alignments by coordinate. Input is assumed to be name collated.

       queryname
              sort alignments by query name. No assumption is made on the order of the input.

       reference=<>: name of reference FastA file when writing CRAM. This file will be  used  for
       filling  missing  UR  and  M5 fields of SQ header lines. It may refer to a local file or a
       file stored on an http or ftp server. The file is uncompressed on the fly if the file name
       ends  on  .gz  .  If  the REF_CACHE environment variable is set to the name of an existing
       directory, then normalised cache  files  will  be  written  to  this  directory  for  each
       reference  sequence.  The  file  names are constructed from the directory name and the MD5
       checksum of each reference sequence. This writing of cached files is omitted however, if a
       previously  existing  file  is found in the list of read only cache locations given by the
       REF_PATH environment variable.

       optminpixeldif=<100>: distance  (x  and  y  inside  same  tile)  inside  which  reads  are
       considered as optical duplicates

       rcsupport==<0>: if 1 then create rc aux field (unclipped coordinate) for mapped reads when
       sorting from query to coordinate order

       numerical==<>: store numerical index in the given file. By default numerical index is  not
       stored.

       numericalindexmod=1024: use this block size for producing numerical index

       fragmergepar=1:  number  of  threads used for merging fragment lists in duplicate marking.
       The run-time will generally benefit from an increased number here,  but  parallel  merging
       requires  a  large  number of simultaneously open files, which will cause problems on some
       systems.

       crammode=: CRAM encoding profile. See the  documentation  for  the  scramble  program  for
       possible options.

AUTHOR

       Written by German Tischler.

REPORTING BUGS

       Report bugs to <germant@miltenyibiotec.de>

COPYRIGHT

       Copyright  ©  2009-2015  German  Tischler,  ©  2011-2015 Genome Research Limited.  License
       GPLv3+: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
       This is free software: you are free to change and redistribute it.  There is NO  WARRANTY,
       to the extent permitted by law.