Provided by: samtools_1.9-4_amd64 bug

NAME

       samtools - Utilities for the Sequence Alignment/Map (SAM) format

SYNOPSIS

       samtools view -bt ref_list.txt -o aln.bam aln.sam.gz

       samtools sort -T /tmp/aln.sorted -o aln.sorted.bam aln.bam

       samtools index aln.sorted.bam

       samtools idxstats aln.sorted.bam

       samtools flagstat aln.sorted.bam

       samtools stats aln.sorted.bam

       samtools bedcov aln.sorted.bam

       samtools depth aln.sorted.bam

       samtools view aln.sorted.bam chr2:20,100,000-20,200,000

       samtools merge out.bam in1.bam in2.bam in3.bam

       samtools faidx ref.fasta

       samtools fqidx ref.fastq

       samtools tview aln.sorted.bam ref.fasta

       samtools split merged.bam

       samtools quickcheck in1.bam in2.cram

       samtools dict -a GRCh38 -s "Homo sapiens" ref.fasta

       samtools fixmate in.namesorted.sam out.bam

       samtools mpileup -C50 -f ref.fasta -r chr3:1,000-2,000 in1.bam in2.bam

       samtools flags PAIRED,UNMAP,MUNMAP

       samtools fastq input.bam > output.fastq

       samtools fasta input.bam > output.fasta

       samtools addreplacerg -r 'ID:fish' -r 'LB:1334' -r 'SM:alpha' -o output.bam input.bam

       samtools collate -o aln.name_collated.bam aln.sorted.bam

       samtools depad input.bam

       samtools markdup in.algnsorted.bam out.bam

DESCRIPTION

       Samtools  is  a  set of utilities that manipulate alignments in the BAM format. It imports
       from and exports to the SAM (Sequence Alignment/Map) format,  does  sorting,  merging  and
       indexing, and allows one to retrieve reads in any regions swiftly.

       Samtools  is  designed  to  work on a stream. It regards an input file `-' as the standard
       input (stdin) and an output file `-' as the standard output (stdout). Several commands can
       thus be combined with Unix pipes. Samtools always output warning and error messages to the
       standard error output (stderr).

       Samtools is also able to open a BAM (not SAM) file on a remote FTP or HTTP server  if  the
       BAM  file  name  starts  with  `ftp://' or `http://'.  Samtools checks the current working
       directory for the index file and will download the index upon absence. Samtools  does  not
       retrieve the entire alignment file unless it is asked to do so.

COMMANDS AND OPTIONS

       view      samtools view [options] in.sam|in.bam|in.cram [region...]

                 With  no  options  or  regions specified, prints all alignments in the specified
                 input alignment file (in SAM, BAM, or CRAM format) to  standard  output  in  SAM
                 format (with no header).

                 You  may  specify  one  or  more space-separated region specifications after the
                 input filename to restrict output to only those  alignments  which  overlap  the
                 specified  region(s).  Use of region specifications requires a coordinate-sorted
                 and indexed input file (in BAM or CRAM format).

                 The -b, -C, -1, -u, -h, -H, and -c options change the  output  format  from  the
                 default  of  headerless  SAM,  and  the  -o  and  -U options set the output file
                 name(s).

                 The -t and -T options provide  additional  reference  data.  One  of  these  two
                 options  is  required  when  SAM  input does not contain @SQ headers, and the -T
                 option is required whenever writing CRAM output.

                 The -L, -M, -r, -R, -s, -q, -l, -m, -f, -F, and -G options filter the alignments
                 that  will be included in the output to only those alignments that match certain
                 criteria.

                 The -x and -B options modify the data which is contained in each alignment.

                 Finally, the -@ option can be used to allocate additional threads to be used for
                 compression, and the -?  option requests a long help message.

       REGIONS:
                 Regions   can  be  specified  as:  RNAME[:STARTPOS[-ENDPOS]]  and  all  position
                 coordinates are 1-based.

                 Important note: when multiple regions are given, some alignments may  be  output
                 multiple times if they overlap more than one of the specified regions.

                 Examples of region specifications:

                 chr1      Output  all  alignments  mapped to the reference sequence named `chr1'
                           (i.e. @SQ SN:chr1).

                 chr2:1000000
                           The region on chr2 beginning at base position 1,000,000 and ending  at
                           the end of the chromosome.

                 chr3:1000-2000
                           The  1001bp region on chr3 beginning at base position 1,000 and ending
                           at base position 2,000 (including both end positions).

                 '*'       Output the unmapped reads at the end of  the  file.   (This  does  not
                           include  any  unmapped  reads placed on a reference sequence alongside
                           their mapped mates.)

                 .         Output all alignments.  (Mostly unnecessary as not specifying a region
                           at all has the same effect.)

       OPTIONS:

                 -b        Output in the BAM format.

                 -C        Output in the CRAM format (requires -T).

                 -1        Enable fast BAM compression (implies -b).

                 -u        Output   uncompressed   BAM.   This   option   saves   time  spent  on
                           compression/decompression and is thus preferred  when  the  output  is
                           piped to another samtools command.

                 -h        Include the header in the output.

                 -H        Output the header only.

                 -c        Instead  of  printing  the  alignments,  only count them and print the
                           total number. All filter options, such as -f, -F, and  -q,  are  taken
                           into account.

                 -?        Output long help and exit immediately.

                 -o FILE   Output to FILE [stdout].

                 -U FILE   Write  alignments  that are not selected by the various filter options
                           to FILE.  When this option is used, all alignments (or all  alignments
                           intersecting  the  regions specified) are written to either the output
                           file or this file, but never both.

                 -t FILE   A tab-delimited FILE.  Each line must contain the  reference  name  in
                           the first column and the length of the reference in the second column,
                           with one line for each  distinct  reference.   Any  additional  fields
                           beyond the second column are ignored. This file also defines the order
                           of the reference sequences in sorting. If  you  run:  `samtools  faidx
                           <ref.fa>',  the  resulting index file <ref.fa>.fai can be used as this
                           FILE.

                 -T FILE   A FASTA format reference FILE,  optionally  compressed  by  bgzip  and
                           ideally  indexed  by  samtools faidx.  If an index is not present, one
                           will be generated for you.

                 -L FILE   Only output alignments overlapping the input BED FILE [null].

                 -M        Use the multi-region iterator  on  the  union  of  the  BED  file  and
                           command-line  region  arguments.   This  avoids  re-reading  the  same
                           regions of files so can sometimes be  much  faster.   Note  this  also
                           removes  duplicate  sequences.   Without this a sequence that overlaps
                           multiple regions specified  on  the  command  line  will  be  reported
                           multiple times.

                 -r STR    Output alignments in read group STR [null].  Note that records with no
                           RG tag will also be output when using this option.  This behaviour may
                           change in a future release.

                 -R FILE   Output  alignments  in  read  groups listed in FILE [null].  Note that
                           records with no RG tag will also be output  when  using  this  option.
                           This behaviour may change in a future release.

                 -q INT    Skip alignments with MAPQ smaller than INT [0].

                 -l STR    Only output alignments in library STR [null].

                 -m INT    Only  output  alignments  with  number  of CIGAR bases consuming query
                           sequence ≥ INT [0]

                 -f INT    Only output alignments with all bits set in INT present  in  the  FLAG
                           field.   INT  can  be  specified  in  hex by beginning with `0x' (i.e.
                           /^0x[0-9A-F]+/) or in octal by beginning with  `0'  (i.e.  /^0[0-7]+/)
                           [0].

                 -F INT    Do  not output alignments with any bits set in INT present in the FLAG
                           field.  INT can be specified in  hex  by  beginning  with  `0x'  (i.e.
                           /^0x[0-9A-F]+/)  or  in  octal by beginning with `0' (i.e. /^0[0-7]+/)
                           [0].

                 -G INT    Do not output alignments with all bits set in INT present in the  FLAG
                           field.   This is the opposite of -f such that -f12 -G12 is the same as
                           no filtering at all.  INT can be specified in hex  by  beginning  with
                           `0x'  (i.e.  /^0x[0-9A-F]+/)  or  in octal by beginning with `0' (i.e.
                           /^0[0-7]+/) [0].

                 -x STR    Read tag to exclude from output (repeatable) [null]

                 -B        Collapse the backward CIGAR operation.

                 -s FLOAT  Output only a proportion of the input  alignments.   This  subsampling
                           acts  in  the  same  way  on  all of the alignment records in the same
                           template or read pair, so it never keeps a read but not its mate.

                           The integer and fractional parts of the -s INT.FRAC  option  are  used
                           separately:  the  part  after  the  decimal point sets the fraction of
                           templates/pairs to be kept, while the integer part is used as  a  seed
                           that influences which subset of reads is kept.

                           When  subsampling data that has previously been subsampled, be sure to
                           use a different seed value from those used previously; otherwise  more
                           reads will be retained than expected.

                 -@ INT    Number  of  BAM  compression threads to use in addition to main thread
                           [0].

                 -S        Ignored for compatibility with previous samtools versions.  Previously
                           this  option  was  required  if  input  was in SAM format, but now the
                           correct format is automatically detected by examining  the  first  few
                           characters of input.

       sort      samtools sort [-l level] [-m maxMem] [-o out.bam] [-O format] [-n] [-t tag] [-T
                 tmpprefix] [-@ threads] [in.sam|in.bam|in.cram]

                 Sort alignments by leftmost coordinates, or by read name when -n  is  used.   An
                 appropriate  @HD-SO  sort  order  header  tag  will  be added or an existing one
                 updated if necessary.

                 The sorted output is written to standard output by default, or to the  specified
                 file  (out.bam)  when -o is used.  This command will also create temporary files
                 tmpprefix.%d.bam as needed when the entire alignment data cannot fit into memory
                 (as controlled via the -m option).

                 Options:

                 -l INT     Set  the desired compression level for the final output file, ranging
                            from 0 (uncompressed) or 1 (fastest but  minimal  compression)  to  9
                            (best  compression  but  slowest  to  write),  similarly to gzip(1)'s
                            compression level setting.

                            If -l is not used, the default compression level will apply.

                 -m INT     Approximately the  maximum  required  memory  per  thread,  specified
                            either in bytes or with a K, M, or G suffix.  [768 MiB]

                            To  prevent  sort  from creating a huge number of temporary files, it
                            enforces a minimum value of 1M for this setting.

                 -n         Sort by read names (i.e., the QNAME field) rather than by chromosomal
                            coordinates.

                 -t TAG     Sort first by the value in the alignment tag TAG, then by position or
                            name (if also using -n).  -o FILE Write the final  sorted  output  to
                            FILE, rather than to standard output.

                 -O FORMAT  Write the final output as sam, bam, or cram.

                            By  default,  samtools  tries  to  select  a  format  based on the -o
                            filename extension; if output is to standard output or no format  can
                            be deduced, bam is selected.

                 -T PREFIX  Write  temporary files to PREFIX.nnnn.bam, or if the specified PREFIX
                            is an existing  directory,  to  PREFIX/samtools.mmm.mmm.tmp.nnnn.bam,
                            where mmm is unique to this invocation of the sort command.

                            By  default,  any  temporary  files  are written alongside the output
                            file, as out.bam.tmp.nnnn.bam, or if output is to standard output, in
                            the current directory as samtools.mmm.mmm.tmp.nnnn.bam.

                 -@ INT     Set number of sorting and compression threads.  By default, operation
                            is single-threaded.

                 Ordering Rules

                 The following rules are used for ordering records.

                 If option -t is in use, records are first sorted  by  the  value  of  the  given
                 alignment tag, and then by position or name (if using -n).  For example, “-t RG”
                 will make read group the primary sort key.  The rules for ordering by tag are:

                 ·   Records that do not have the tag are sorted before ones that do.

                 ·   If the types of the tags are different, they will be sorted so  that  single
                     character  tags  (type  A) come before array tags (type B), then string tags
                     (types H and Z), then numeric tags (types f and i).

                 ·   Numeric tags (types f and i) are compared by value.  Note  that  comparisons
                     of floating-point values are subject to issues of rounding and precision.

                 ·   String tags (types H and Z) are compared based on the binary contents of the
                     tag using the C strcmp(3) function.

                 ·   Character tags (type A) are compared by binary character value.

                 ·   No attempt is made to compare tags of other types —  notably  type  B  array
                     values will not be compared.

                 When  the  -n option is present, records are sorted by name.  Names are compared
                 so as to give a “natural” ordering — i.e.  sections  consisting  of  digits  are
                 compared numerically while all other sections are compared based on their binary
                 representation.  This means “a1” will come before “b1” and “a9” will come before
                 “a10”.   Records  with  the same name will be ordered according to the values of
                 the READ1 and READ2 flags (see flags).

                 When the -n option is not present, reads are sorted by reference  (according  to
                 the  order  of  the  @SQ header records), then by position in the reference, and
                 then by the REVERSE flag.

                 Note

                 Historically samtools sort also accepted a less flexible way of  specifying  the
                 final and temporary output filenames:

                        samtools sort [-f] [-o] in.bam out.prefix

                 This  has now been removed.  The previous out.prefix argument (and -f option, if
                 any) should be changed to an appropriate combination of -T PREFIX and  -o  FILE.
                 The previous -o option should be removed, as output defaults to standard output.

       index     samtools index [-bc] [-m INT] aln.bam|aln.cram [out.index]

                 Index  a  coordinate-sorted BAM or CRAM file for fast random access.  (Note that
                 this does not work with SAM files even if they are bgzip compressed —  to  index
                 such files, use tabix(1) instead.)

                 This  index  is needed when region arguments are used to limit samtools view and
                 similar commands to particular regions of interest.

                 If an output filename is given, the index file will  be  written  to  out.index.
                 Otherwise,  for  a CRAM file aln.cram, index file aln.cram.crai will be created;
                 for a BAM file aln.bam, either  aln.bam.bai  or  aln.bam.csi  will  be  created,
                 depending on the index format selected.

                 Options:

                 -b      Create  a  BAI  index.   This  is  currently  the default when no format
                         options are used.

                 -c      Create a CSI index.  By default, the minimum interval size for the index
                         is 2^14, which is the same as the fixed value used by the BAI format.

                 -m INT  Create a CSI index, with a minimum interval size of 2^INT.

       idxstats  samtools idxstats in.sam|in.bam|in.cram

                 Retrieve  and  print  stats  in  the index file corresponding to the input file.
                 Before calling idxstats, the input BAM file should be indexed by samtools index.

                 If run on a SAM or CRAM file or an unindexed BAM file, this command  will  still
                 produce  the  same summary statistics, but does so by reading through the entire
                 file.  This is far slower than using the BAM indices.

                 The output is TAB-delimited with each  line  consisting  of  reference  sequence
                 name,  sequence  length,  #  mapped reads and # unmapped reads. It is written to
                 stdout.

       flagstat  samtools flagstat in.sam|in.bam|in.cram

                 Does a full pass through the input file to calculate  and  print  statistics  to
                 stdout.

                 Provides  counts  for  each of 13 categories based primarily on bit flags in the
                 FLAG field. Each category in the output is broken down into QC pass and QC fail,
                 which is presented as "#PASS + #FAIL" followed by a description of the category.

                 The  first  row  of  output gives the total number of reads that are QC pass and
                 fail (according to flag bit 0x200). For example:

                   122 + 28 in total (QC-passed reads + QC-failed reads)

                 Which would indicate that there are a total of 150 reads in the input file,  122
                 of  which  are  marked  as  QC  pass  and 28 of which are marked as "not passing
                 quality controls"

                 Following this, additional categories are given for reads which are:

                         secondary
                                0x100 bit set

                         supplementary
                                0x800 bit set

                         duplicates
                                0x400 bit set

                         mapped 0x4 bit not set

                         paired in sequencing
                                0x1 bit set

                         read1  both 0x1 and 0x40 bits set

                         read2  both 0x1 and 0x80 bits set

                         properly paired
                                both 0x1 and 0x2 bits set and 0x4 bit not set

                         with itself and mate mapped
                                0x1 bit set and neither 0x4 nor 0x8 bits set

                         singletons
                                both 0x1 and 0x8 bits set and bit 0x4 not set

                 And finally, two rows are given that additionally filter on the  reference  name
                 (RNAME), mate reference name (MRNM), and mapping quality (MAPQ) fields:

                         with mate mapped to a different chr
                                0x1  bit  set and neither 0x4 nor 0x8 bits set and MRNM not equal
                                to RNAME

                         with mate mapped to a different chr (mapQ>=5)
                                0x1 bit set and neither 0x4 nor 0x8 bits set and MRNM  not  equal
                                to RNAME and MAPQ >= 5

       stats     samtools stats [options] in.sam|in.bam|in.cram [region...]

                 samtools  stats collects statistics from BAM files and outputs in a text format.
                 The output can be visualized graphically using plot-bamstats.

                 Options:

                 -c, --coverage MIN,MAX,STEP
                         Set coverage distribution to the specified range  (MIN,  MAX,  STEP  all
                         given as integers) [1,1000,1]

                 -d, --remove-dups
                         Exclude from statistics reads marked as duplicates

                 -f, --required-flag STR|INT
                         Required flag, 0 for unset. See also `samtools flags` [0]

                 -F, --filtering-flag STR|INT
                         Filtering flag, 0 for unset. See also `samtools flags` [0]

                 --GC-depth FLOAT
                         the  size  of  GC-depth  bins  (decreasing  bin  size  increases  memory
                         requirement) [2e4]

                 -h, --help
                         This help message

                 -i, --insert-size INT
                         Maximum insert size [8000]

                 -I, --id STR
                         Include only listed read group or sample name []

                 -l, --read-length INT
                         Include in the statistics only reads with the given read length []

                 -m, --most-inserts FLOAT
                         Report only the main part of inserts [0.99]

                 -P, --split-prefix STR
                         A path or string prefix to prepend to  filenames  output  when  creating
                         categorised statistics files with -S/--split.  [input filename]

                 -q, --trim-quality INT
                         The BWA trimming parameter [0]

                 -r, --ref-seq FILE
                         Reference  sequence  (required  for  GC-depth  and  mismatches-per-cycle
                         calculation).  []

                 -S, --split TAG
                         In  addition  to  the  complete  statistics,  also  output   categorised
                         statistics  based on the tagged field TAG (e.g., use --split RG to split
                         into read groups).

                         Categorised    statistics     are     written     to     files     named
                         <prefix>_<value>.bamstat, where prefix is as given by --split-prefix (or
                         the input filename by default) and value has  been  encountered  as  the
                         specified tagged field's value in one or more alignment records.

                 -t, --target-regions FILE
                         Do stats in these regions only. Tab-delimited file chr,from,to, 1-based,
                         inclusive.  []

                 -x, --sparse
                         Suppress outputting IS rows where there are no insertions.

       bedcov    samtools bedcov [options] region.bed in1.sam|in1.bam|in1.cram[...]

                 Reports the total read base count (i.e. the sum of per  base  read  depths)  for
                 each  genomic  region specified in the supplied BED file. The regions are output
                 as they appear in the BED file and are 0-based.  Counts for each alignment  file
                 supplied are reported in separate columns.

                 Options:

                 -Q INT Only count reads with mapping quality greater than INT

                 -j     Do not include deletions (D) and ref skips (N) in bedcov computation.

       depth     samtools  depth  [options]  [in1.sam|in1.bam|in1.cram [in2.sam|in2.bam|in2.cram]
                 [...]]

                 Computes the depth at each position or region.

                 Options:

                 -a      Output all positions (including those with zero depth)

                 -a -a, -aa
                         Output absolutely all positions, including unused  reference  sequences.
                         Note  that  when  used  in conjunction with a BED file the -a option may
                         sometimes operate as if -aa was specified if the reference sequence  has
                         coverage outside of the region specified in the BED file.

                 -b FILE Compute depth at list of positions or regions in specified BED FILE.  []

                 -f FILE Use  the  BAM files specified in the FILE (a file of filenames, one file
                         per line) []

                 -l INT  Ignore reads shorter than INT

                 -m, -d INT
                         Truncate reported depth at a maximum of INT reads.  [8000]. If 0,  depth
                         is  set  to  the  maximum  integer value, effectively removing any depth
                         limit.

                 -q INT  Only count reads with base quality greater than INT

                 -Q INT  Only count reads with mapping quality greater than INT

                 -r CHR:FROM-TO
                         Only report depth in specified region.

       merge     samtools merge [-nur1f] [-h inh.sam] [-R reg] [-b  <list>]  <out.bam>  <in1.bam>
                 [<in2.bam> <in3.bam> ... <inN.bam>]

                 Merge  multiple  sorted  alignment  files, producing a single sorted output file
                 that contains all the input records and maintains the existing sort order.

                 If -h is specified the @SQ headers of  input  files  will  be  merged  into  the
                 specified  header, otherwise they will be merged into a composite header created
                 from the input headers.  If in the process of merging @SQ lines  for  coordinate
                 sorted  input  files,  a conflict arises as to the order (for example input1.bam
                 has @SQ for a,b,c and input2.bam has b,a,c) then the resulting output file  will
                 need to be re-sorted back into coordinate order.

                 Unless  the  -c  or -p flags are specified then when merging @RG and @PG records
                 into the output header then any IDs found to be duplicates of  existing  IDs  in
                 the output header will have a suffix appended to them to differentiate them from
                 similar header records from other files and the read records will be updated  to
                 reflect this.

                 The  ordering  of  the records in the input files must match the usage of the -n
                 and -t command-line  options.   If  they  do  not,  the  output  order  will  be
                 undefined.  See sort for information about record ordering.

                 OPTIONS:

                 -1      Use zlib compression level 1 to compress the output.

                 -b FILE List of input BAM files, one file per line.

                 -f      Force to overwrite the output file if present.

                 -h FILE Use  the lines of FILE as `@' headers to be copied to out.bam, replacing
                         any header lines that would otherwise be copied from in1.bam.  (FILE  is
                         actually  in SAM format, though any alignment records it may contain are
                         ignored.)

                 -n      The input alignments are sorted by read names rather than by chromosomal
                         coordinates

                 -t TAG  The  input  alignments  have  been  sorted  by the value of TAG, then by
                         either position or name (if -n is given).

                 -R STR  Merge files in the specified region indicated by STR [null]

                 -r      Attach an RG tag to each alignment. The tag value is inferred from  file
                         names.

                 -u      Uncompressed BAM output

                 -c      When several input files contain @RG headers with the same ID, emit only
                         one of them (namely, the header line from the first file we find that ID
                         in)  to  the  merged  output  file.   Combining these similar headers is
                         usually the right thing to do when the  files  being  merged  originated
                         from the same file.

                         Without  -c,  all  @RG  headers  appear  in the output file, with random
                         suffixes added to their IDs where necessary to differentiate them.

                 -p      Similarly, for each @PG ID in the set of files to  merge,  use  the  @PG
                         line of the first file we find that ID in rather than adding a suffix to
                         differentiate similar IDs.

       faidx     samtools faidx <ref.fasta> [region1 [...]]

                 Index reference sequence in the FASTA format or extract subsequence from indexed
                 reference  sequence.  If  no  region is specified, faidx will index the file and
                 create <ref.fasta>.fai on the disk. If regions are specified,  the  subsequences
                 will be retrieved and printed to stdout in the FASTA format.

                 The input file can be compressed in the BGZF format.

                 The  sequences  in  the  input file should all have different names.  If they do
                 not, indexing will emit a warning about duplicate sequences and  retrieval  will
                 only produce subsequences from the first sequence with the duplicated name.

                 FASTQ  files can be read and indexed by this command.  Without using --fastq any
                 extracted subsequence will be in FASTA format.

                 Options

                 -o, --output FILE
                         Write FASTA to file rather than to stdout.

                 -n, --length INT
                         Length of FASTA sequence line.  [60]

                 -c, --continue
                         Continue working if a non-existant region is requested.

                 -r, --region-file FILE
                         Read regions from a file. Format is chr:from-to, one per line.

                 -f, --fastq
                         Read FASTQ files and output extracted sequences in FASTQ  format.   Same
                         as using samtools fqidx.

                 -i, --reverse-complement
                         Output  the  sequence  as  the  reverse complement.  When this option is
                         used, “/rc” will be appended to the sequence names.  To turn this off or
                         change the string appended, use the --mark-strand option.

                 --mark-strand TYPE
                         Append strand indicator to sequence name.  TYPE can be one of:

                         rc     Append  '/rc'  when  writing the reverse complement.  This is the
                                default.

                         no     Do not append anything.

                         sign   Append '(+)' for forward strand or '(-)' for reverse  complement.
                                This matches the output of “bedtools getfasta -s”.

                         custom,<pos>,<neg>
                                Append  string <pos> to names when writing the forward strand and
                                <neg> when writing the reverse strand.  Spaces are preserved,  so
                                it is possible to move the indicator into the comment part of the
                                description line by including a  leading  space  in  the  strings
                                <pos> and <neg>.

                 -h, --help
                         Print help message and exit.

       fqidx     samtools fqidx <ref.fastq> [region1 [...]]

                 Index reference sequence in the FASTQ format or extract subsequence from indexed
                 reference sequence. If no region is specified, fqidx will  index  the  file  and
                 create  <ref.fastq>.fai  on the disk. If regions are specified, the subsequences
                 will be retrieved and printed to stdout in the FASTQ format.

                 The input file can be compressed in the BGZF format.

                 The sequences in the input file should all have different  names.   If  they  do
                 not,  indexing  will emit a warning about duplicate sequences and retrieval will
                 only produce subsequences from the first sequence with the duplicated name.

                 samtools fqidx should only be used  on  fastq  files  with  a  small  number  of
                 entries.   Trying  to  use  it on a file containing millions of short sequencing
                 reads will produce an index that is almost as big  as  the  original  file,  and
                 searches using the index will be very slow and use a lot of memory.

                 Options

                 -o, --output FILE
                         Write FASTQ to file rather than to stdout.

                 -n, --length INT
                         Length of FASTQ sequence line.  [60]

                 -c, --continue
                         Continue working if a non-existant region is requested.

                 -r, --region-file FILE
                         Read regions from a file. Format is chr:from-to, one per line.

                 -i, --reverse-complement
                         Output  the  sequence  as  the  reverse complement.  When this option is
                         used, “/rc” will be appended to the sequence names.  To turn this off or
                         change the string appended, use the --mark-strand option.

                 --mark-strand TYPE
                         Append strand indicator to sequence name.  TYPE can be one of:

                         rc     Append  '/rc'  when  writing the reverse complement.  This is the
                                default.

                         no     Do not append anything.

                         sign   Append '(+)' for forward strand or '(-)' for reverse  complement.
                                This matches the output of “bedtools getfasta -s”.

                         custom,<pos>,<neg>
                                Append  string <pos> to names when writing the forward strand and
                                <neg> when writing the reverse strand.  Spaces are preserved,  so
                                it is possible to move the indicator into the comment part of the
                                description line by including a  leading  space  in  the  strings
                                <pos> and <neg>.

                 -h, --help
                         Print help message and exit.

       tview     samtools tview [-p chr:pos] [-s STR] [-d display] <in.sorted.bam> [ref.fasta]

                 Text  alignment  viewer (based on the ncurses library). In the viewer, press `?'
                 for help and press `g' to check the alignment start from a region in the  format
                 like  `chr10:10,000,000'  or  `=10,000,000'  when  viewing  the  same  reference
                 sequence.

                 Options:

                 -d display    Output as (H)tml or (C)urses or (T)ext

                 -p chr:pos    Go directly to this position

                 -s STR        Display only alignments from this sample or read group

       split     samtools split [options] merged.sam|merged.bam|merged.cram

                 Splits a file by read group.

                 Options:

                 -u FILE1      Put reads with no RG tag or an unrecognised RG tag into FILE1

                 -u FILE1:FILE2
                               As above, but assigns an RG tag as given in the header of FILE2

                 -f STRING     Output filename format string (see below) ["%*_%#.%."]

                 -v            Verbose output

                 Format string expansions:

                                      %%   %
                                      %*   basename
                                      %#   @RG index
                                      %!   @RG ID
                                      %.   output format filename extension

       quickcheck
                 samtools quickcheck [options] in.sam|in.bam|in.cram [ ... ]

                 Quickly check that input files appear to be intact. Checks that beginning of the
                 file  contains  a  valid  header  (all  formats)  containing at least one target
                 sequence and then seeks to the end of the file and checks  that  an  end-of-file
                 (EOF) is present and intact (BAM only).

                 Data  in  the  middle of the file is not read since that would be much more time
                 consuming, so please note that this command will not detect internal corruption,
                 but  is  useful  for testing that files are not truncated before performing more
                 intensive tasks on them.

                 This command will exit with a non-zero exit code if any input files don't have a
                 valid  header  or  are missing an EOF block. Otherwise it will exit successfully
                 (with a zero exit code).

                 Options:

                 -v      Verbose output: will additionally print the names  of  all  input  files
                         that  don't  pass  the  check  to stdout. Multiple -v options will cause
                         additional messages regarding check results to be printed to stderr.

                 -q      Quiet mode: disables warning messages on stderr about files  that  fail.
                         If  both  -q  and  -v  options are used then the appropriate level of -v
                         takes precedence.

       dict      samtools dict <ref.fasta|ref.fasta.gz>

                 Create a sequence dictionary file from a fasta file.

                 OPTIONS:

                 -a, --assembly STR
                            Specify the assembly for the AS tag.

                 -H, --no-header
                            Do not print the @HD header line.

                 -o, --output FILE
                            Output to FILE [stdout].

                 -s, --species STR
                            Specify the species for the SP tag.

                 -u, --uri STR
                            Specify the URI for the UR tag. Defaults  to  the  absolute  path  of
                            ref.fasta unless reading from stdin.

       fixmate   samtools fixmate [-rpcm] [-O format] in.nameSrt.bam out.bam

                 Fill  in  mate  coordinates,  ISIZE  and  mate  related flags from a name-sorted
                 alignment.

                 OPTIONS:

                 -r         Remove secondary and unmapped reads.

                 -p         Disable FR proper pair check.

                 -c         Add template cigar ct tag.

                 -m         Add ms (mate score) tags.  These are used by markdup  to  select  the
                            best reads to keep.

                 -O FORMAT  Write the final output as sam, bam, or cram.

                            By  default,  samtools  tries  to select a format based on the output
                            filename extension; if output is to standard output or no format  can
                            be deduced, bam is selected.

       mpileup   samtools mpileup [-EB] [-C capQcoef] [-r reg] [-f in.fa] [-l list] [-Q minBaseQ]
                 [-q minMapQ] in.bam [in2.bam [...]]

                 Generate pileup for one or multiple BAM files. Alignment records are grouped  by
                 sample  (SM)  identifiers in @RG header lines. If sample identifiers are absent,
                 each input file is regarded as one sample.

                 Samtools mpileup can still produce VCF and  BCF  output,  but  this  feature  is
                 deprecated and will be removed in a future release.  Please use bcftools mpileup
                 for this instead.  (Documentation on the deprecated  options  has  been  removed
                 from   this   manual   page,   but   older  versions  are  available  online  at
                 <http://www.htslib.org/doc/>.)

                 In the  pileup  format  (without  -u or -g),  each  line  represents  a  genomic
                 position, consisting of chromosome name, 1-based coordinate, reference base, the
                 number of reads covering the site, read  bases,  base  qualities  and  alignment
                 mapping  qualities.  Information  on  match,  mismatch,  indel,  strand, mapping
                 quality and start and end of a read are all encoded at the read base column.  At
                 this  column,  a  dot  stands  for  a match to the reference base on the forward
                 strand, a comma for a match on the reverse strand, a '>' or '<' for a  reference
                 skip, `ACGTN' for a mismatch on the forward strand and `acgtn' for a mismatch on
                 the reverse strand. A pattern  `\+[0-9]+[ACGTNacgtn]+'  indicates  there  is  an
                 insertion  between  this reference position and the next reference position. The
                 length of the insertion is given by the integer in the pattern, followed by  the
                 inserted  sequence.  Similarly,  a  pattern  `-[0-9]+[ACGTNacgtn]+' represents a
                 deletion from the reference. The deleted bases will be presented as `*'  in  the
                 following lines. Also at the read base column, a symbol `^' marks the start of a
                 read. The ASCII of the character  following  `^'  minus  33  gives  the  mapping
                 quality. A symbol `$' marks the end of a read segment.

                 Note  that there are two orthogonal ways to specify locations in the input file;
                 via -r region and -l file.  The former uses (and requires) an index to do random
                 access  while  the  latter  streams  through the file contents filtering out the
                 specified regions, requiring no index.  The two may be used in conjunction.  For
                 example  a  BED  file  containing  locations  of genes in chromosome 20 could be
                 specified using -r 20 -l chr20.bed, meaning that  the  index  is  used  to  find
                 chromosome 20 and then it is filtered for the regions listed in the bed file.

                 Input Options:

                 -6, --illumina1.3+
                           Assume the quality is in the Illumina 1.3+ encoding.

                 -A, --count-orphans
                           Do not skip anomalous read pairs in variant calling.

                 -b, --bam-list FILE
                           List of input BAM files, one file per line [null]

                 -B, --no-BAQ
                           Disable base alignment quality (BAQ) computation.  See BAQ below.

                 -C, --adjust-MQ INT
                           Coefficient  for  downgrading  mapping  quality  for  reads containing
                           excessive mismatches. Given a read with a phred-scaled  probability  q
                           of  being  generated from the mapped position, the new mapping quality
                           is  about  sqrt((INT-q)/INT)*INT.   A   zero   value   disables   this
                           functionality; if enabled, the recommended value for BWA is 50. [0]

                 -d, --max-depth INT
                           At  a  position, read maximally INT reads per input file. Setting this
                           limit reduces the amount of memory and time needed to process  regions
                           with  very high coverage.  Passing zero for this option sets it to the
                           highest possible value, effectively removing the depth limit. [8000]

                           Note that up to release 1.8, samtools would enforce  a  minimum  value
                           for  this option.  This no longer happens and the limit is set exactly
                           as specified.

                 -E, --redo-BAQ
                           Recalculate BAQ on the fly, ignore existing BQ tags.  See BAQ below.

                 -f, --fasta-ref FILE
                           The faidx-indexed reference file in the FASTA format. The file can  be
                           optionally compressed by bgzip.  [null]

                           Supplying   a  reference  file  will  enable  base  alignment  quality
                           calculation for all reads aligned to a reference in the file.  See BAQ
                           below.

                 -G, --exclude-RG FILE
                           Exclude reads from readgroups listed in FILE (one @RG-ID per line)

                 -l, --positions FILE
                           BED  or position list file containing a list of regions or sites where
                           pileup or BCF should be generated. Position  list  files  contain  two
                           columns  (chromosome  and  position)  and  start counting from 1.  BED
                           files contain at least 3 columns (chromosome, start and end  position)
                           and are 0-based half-open.
                           While  it is possible to mix both position-list and BED coordinates in
                           the same file, this is strongly  ill  advised  due  to  the  differing
                           coordinate systems. [null]

                 -q, -min-MQ INT
                           Minimum mapping quality for an alignment to be used [0]

                 -Q, --min-BQ INT
                           Minimum base quality for a base to be considered [13]

                 -r, --region STR
                           Only  generate pileup in region. Requires the BAM files to be indexed.
                           If used in conjunction with -l then considers the intersection of  the
                           two requests.  STR [all sites]

                 -R, --ignore-RG
                           Ignore RG tags. Treat all reads in one BAM as one sample.

                 --rf, --incl-flags STR|INT
                           Required flags: skip reads with mask bits unset [null]

                 --ff, --excl-flags STR|INT
                           Filter     flags:     skip     reads     with     mask     bits    set
                           [UNMAP,SECONDARY,QCFAIL,DUP]

                 -x, --ignore-overlaps
                           Disable read-pair overlap detection.

                 Output Options:

                 -o, --output FILE
                           Write pileup output to FILE,  rather  than  the  default  of  standard
                           output.

                           (The  same  short  option  is used for both the deprecated --open-prob
                           option and  --output  .   If  -o's  argument  contains  any  non-digit
                           characters  other  than  a  leading  + or - sign, it is interpreted as
                           --output.  Usually the filename extension will take care of this,  but
                           to  write  to  an  entirely  numeric filename use -o ./123 or --output
                           123.)

                 -O, --output-BP
                           Output base positions on reads.

                 -s, --output-MQ
                           Output mapping quality.

                 --output-QNAME
                           Output an extra column containing comma-separated read names.

                 -a        Output all positions, including those with zero depth.

                 -a -a, -aa
                           Output absolutely all positions, including unused reference sequences.
                           Note  that  when used in conjunction with a BED file the -a option may
                           sometimes operate as if -aa was specified if  the  reference  sequence
                           has coverage outside of the region specified in the BED file.

                 BAQ (Base Alignment Quality)

                 BAQ is the Phred-scaled probability of a read base being misaligned.  It greatly
                 helps to reduce false SNPs caused by misalignments.  BAQ is calculated using the
                 probabilistic realignment method described in the paper “Improving SNP discovery
                 by base  alignment  quality”,  Heng  Li,  Bioinformatics,  Volume  27,  Issue  8
                 <https://doi.org/10.1093/bioinformatics/btr076>

                 BAQ  is  turned  on  when  a reference file is supplied using the -f option.  To
                 disable it, use the -B option.

                 It is possible to store pre-calculated BAQ values in a SAM BQ:Z  tag.   Samtools
                 mpileup  will  use the precalculated values if it finds them.  The -E option can
                 be used to make it ignore  the  contents  of  the  BQ:Z  tag  and  force  it  to
                 recalculate the BAQ scores by making a new alignment.

       flags     samtools flags INT|STR[,...]

                 Convert between textual and numeric flag representation.

                 FLAGS:

                   0x1   PAIRED          paired-end (or multiple-segment) sequencing technology
                   0x2   PROPER_PAIR     each segment properly aligned according to the aligner
                   0x4   UNMAP           segment unmapped
                   0x8   MUNMAP          next segment in the template unmapped
                  0x10   REVERSE         SEQ is reverse complemented
                  0x20   MREVERSE        SEQ of the next segment in the template is reverse complemented
                  0x40   READ1           the first segment in the template
                  0x80   READ2           the last segment in the template
                 0x100   SECONDARY       secondary alignment
                 0x200   QCFAIL          not passing quality controls
                 0x400   DUP             PCR or optical duplicate
                 0x800   SUPPLEMENTARY   supplementary alignment

       fastq/a   samtools fastq [options] in.bam
                 samtools fasta [options] in.bam

                 Converts  a  BAM  or  CRAM  into  either  FASTQ or FASTA format depending on the
                 command invoked. The files will be automatically compressed if  the  file  names
                 have a .gz or .bgzf extension.

                 The  input  to  this  program must be collated by name.  Use samtools collate or
                 samtools sort -n to ensure this.

                 For each different QNAME, the input records are  categorised  according  to  the
                 state of the READ1 and READ2 flag bits.  The three categories used are:

                 1 : Only READ1 is set.

                 2 : Only READ2 is set.

                 0 : Either both READ1 and READ2 are set; or neither is set.

                 The exact meaning of these categories depends on the sequencing technology used.
                 It is expected that ordinary single and paired-end sequencing reads will  be  in
                 categories  1  and 2 (in the case of paired-end reads, one read of the pair will
                 be in category 1, the other in category 2).  Category 0 is essentially a “catch-
                 all” for reads that do not fit into a simple paired-end sequencing model.

                 For  each category only one sequence will be written for a given QNAME.  If more
                 than one record is available for a given QNAME and category, the first in  input
                 file  order  that  has  quality  values  will be used.  If none of the candidate
                 records has quality values, then the first in input  file  order  will  be  used
                 instead.

                 Sequences  will  be  written  to  standard output unless one of the -1,-2, or -0
                 options is used, in which case sequences for that category will  be  written  to
                 the specified file.

                 If  a singleton file is specified using the -s option then only paired sequences
                 will be output for categories 1 and 2; paired meaning that  for  a  given  QNAME
                 there  are sequences for both category 1 and 2.  If there is a sequence for only
                 one of categories 1 or 2 then it will be diverted into the specified  singletons
                 file.  This can be used to prepare fastq files for programs that cannot handle a
                 mixture of paired and singleton reads.

                 The -s option only affects category 1 and 2 records.  The output for category  0
                 will be the same irrespective of the use of this option.

                 OPTIONS:

                 -n      By  default, either '/1' or '/2' is added to the end of read names where
                         the corresponding READ1 or READ2 FLAG bit is set.  Using -n causes  read
                         names to be left as they are.

                 -N      Always  add  either  '/1' or '/2' to the end of read names even when put
                         into different files.

                 -O      Use quality values from OQ tags in preference to standard quality string
                         if available.

                 -s FILE Write singleton reads to FILE.

                 -t      Copy RG, BC and QT tags to the FASTQ header line, if they exist.

                 -T TAGLIST
                         Specify a comma-separated list of tags to copy to the FASTQ header line,
                         if they exist.

                 -1 FILE Write reads with the READ1 FLAG set (and READ2 not set) to FILE  instead
                         of outputting them.  If the -s option is used, only paired reads will be
                         written to this file.

                 -2 FILE Write reads with the READ2 FLAG set (and READ1 not set) to FILE  instead
                         of outputting them.  If the -s option is used, only paired reads will be
                         written to this file.

                 -0 FILE Write reads where the READ1 and READ2 FLAG bits set are either both  set
                         or both unset to FILE instead of outputting them.

                 -f INT  Only  output  alignments  with  all  bits set in INT present in the FLAG
                         field.  INT can be  specified  in  hex  by  beginning  with  `0x'  (i.e.
                         /^0x[0-9A-F]+/) or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].

                 -F INT  Do  not  output  alignments with any bits set in INT present in the FLAG
                         field.  INT can be  specified  in  hex  by  beginning  with  `0x'  (i.e.
                         /^0x[0-9A-F]+/) or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].

                 -G INT  Only  EXCLUDE  reads with all of the bits set in INT present in the FLAG
                         field.  INT can be  specified  in  hex  by  beginning  with  `0x'  (i.e.
                         /^0x[0-9A-F]+/) or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].

                 -i      add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)

                 -c [0..9]
                         set compression level when writing gz or bgzf fastq files.

                 --i1 FILE
                         write first index reads to FILE

                 --i2 FILE
                         write second index reads to FILE

                 --barcode-tag TAG
                         aux tag to find index reads in [default: BC]

                 --quality-tag TAG
                         aux tag to find index quality in [default: QT]

                 --index-format STR
                         string  to  describe  how  to  parse  the  barcode and quality tags. For
                         example:

                         i14i8   the first 14 characters are index 1, the next 8  characters  are
                                 index 2

                         n8i14   ignore  the  first  8 characters, and use the next 14 characters
                                 for index 1

                                 If the tag contains a separator, then the numeric  part  can  be
                                 replaced  with  '*'  to mean 'read until the separator or end of
                                 tag', for example:

                         n*i*    ignore the left part of the tag until the  separator,  then  use
                                 the second part

                 EXAMPLES

                 Output  paired reads to separate files, discarding singletons, supplementary and
                 secondary reads.  The resulting files can be used with,  for  example,  the  bwa
                 aligner.

                     samtools fastq -1 paired1.fq -2 paired2.fq -0 /dev/null -s /dev/null -n -F 0x900 in.bam

                 Output paired and singleton reads in a single file, discarding supplementary and
                 secondary reads.  To get all of the reads in a single file, it is  necessary  to
                 redirect the output of samtools fastq.  The output file is suitable for use with
                 bwa mem -p which understands interleaved files containing a  mixture  of  paired
                 and singleton reads.

                     samtools fastq -0 /dev/null -F 0x900 in.bam > all_reads.fq

                 Output  paired  reads  in  a single file, discarding supplementary and secondary
                 reads.  Save any singletons in a separate file.  Append /1 and /2 to read names.
                 This  format is suitable for use by NextGenMap when using its -p and -q options.
                 With this aligner, paired reads must be mapped separately to the singletons.

                     samtools fastq -0 /dev/null -s single.fq -N -F 0x900 in.bam > paired.fq

                 BUGS

                 o The way of specifying output files is far  to  complicated  and  easy  to  get
                   wrong.

                 o The  default  value for the -F option should really be 0x900 so that secondary
                   and supplementary reads are automatically excluded.  The existing default of 0
                   is retained for reasons of compatibility.

       collate   samtools collate [options] in.sam|in.bam|in.cram [<prefix>]

                 Shuffles  and  groups  reads together by their names.  A faster alternative to a
                 full query name sort, collate ensures that reads of the same  name  are  grouped
                 together  in  contiguous groups, but doesn't make any guarantees about the order
                 of read names between groups.

                 The output from this command should be suitable for any operation that  requires
                 all reads from the same template to be grouped together.

                 If  present, <prefix> is used to name the temporary files that collate uses when
                 sorting the data.  If neither the '-O' nor '-o' options are used, <prefix>  must
                 be  present  and  collate will use it to make an output file name by appending a
                 suffix depending on the format written (.bam by default).

                 If either the -O or -o option is used, <prefix> is  optional.   If  <prefix>  is
                 absent,  collate  will  write the temporary files to a system-dependent location
                 (/tmp on UNIX).

                 Using -f for fast mode will output only primary alignments that have either  the
                 READ1  or  READ2  flags set (but not both).  Any other alignment records will be
                 filtered out.  The collation will only work correctly if there are no more  than
                 two reads for any given QNAME after filtering.

                 Fast  mode  keeps a buffer of alignments in memory so that it can write out most
                 pairs as soon as they are found instead of  storing  them  in  temporary  files.
                 This  allows  collate  to avoid some work and so finish more quickly compared to
                 the standard mode.  The number of alignments  held  can  be  changed  using  -r,
                 storing  more alignments uses more memory but increases the number of pairs that
                 can be written early.

                 While collate normally randomises the ordering of read  pairs,  fast  mode  does
                 not.   Position-dependent  biases that would normally be broken up can remain in
                 the fast collate output.  It is therefore not a good idea to use fast mode  when
                 preparing  data  for  programs  that  expect randomly ordered paired reads.  For
                 example  using  fast  collate  instead  of  the  standard  mode  may   lead   to
                 significantly different results from aligners that estimate library insert sizes
                 on batches of reads.

                 Options:

                 -O      Output to stdout.  This option cannot be used with '-o'.

                 -o FILE Write output to FILE.  This option cannot be used with '-O'.

                 -u      Write uncompressed BAM output

                 -l INT  Compression level.  [1]

                 -n INT  Number of temporary files to use.  [64]

                 -f      Fast mode (primary alignments only).

                 -r INT  Number of reads to store in memory (for use with -f).  [10000]

       reheader  samtools reheader [-iP] in.header.sam in.bam

                 Replace the header in in.bam with the header in in.header.sam.  This command  is
                 much faster than replacing the header with a BAM→SAM→BAM conversion.

                 By  default  this  command  outputs  the  BAM  or  CRAM  file to standard output
                 (stdout), but for CRAM format files it has the option  to  perform  an  in-place
                 edit,  both  reading  and  writing  to  the  same file.  No validity checking is
                 performed on the header, nor that it is suitable to use with the  sequence  data
                 itself.

                 OPTIONS:

                 -P, --no-PG
                         Do not generate an @PG header line.

                 -i, --in-place
                         Perform  the header edit in-place, if possible.  This only works on CRAM
                         files and only if there is sufficient room to store the new header.  The
                         amount of space available will differ for each CRAM file.

       cat       samtools cat [-b list] [-h header.sam] [-o out.bam] <in1.bam> <in2.bam> [ ... ]

                 Concatenate  BAMs or CRAMs. Although this works on either BAM or CRAM, all input
                 files must be the same format as each other. The  sequence  dictionary  of  each
                 input  file  must  be identical, although this command does not check this. This
                 command uses a similar trick to reheader which enables fast BAM concatenation.

                 OPTIONS:

                 -b FOFN Read the list  of  input  BAM  or  CRAM  files  from  FOFN.   These  are
                         concatenated prior to any files specified on the command line.  Multiple
                         -b FOFN options may  be  specified  to  concatenate  multiple  lists  of
                         BAM/CRAM files.

                 -h FILE Uses  the SAM header from FILE.  By default the header is taken from the
                         first file to be concatenated.

                 -o FILE Write the concatenated output to FILE.   By  default  this  is  sent  to
                         stdout.

       rmdup     samtools rmdup [-sS] <input.srt.bam> <out.bam>

                 This command is obsolete. Use markdup instead.

                 Remove  potential PCR duplicates: if multiple read pairs have identical external
                 coordinates, only retain the pair with highest mapping quality.  In the  paired-
                 end  mode,  this  command  ONLY  works with FR orientation and requires ISIZE is
                 correctly set. It does not work for unpaired reads  (e.g.  two  ends  mapped  to
                 different chromosomes or orphan reads).

                 OPTIONS:

                 -s      Remove  duplicates  for  single-end reads. By default, the command works
                         for paired-end reads only.

                 -S      Treat paired-end reads and single-end reads.

       addreplacerg
                 samtools addreplacerg [-r rg line | -R rg ID] [-m mode] [-l level] [-o  out.bam]
                 <input.bam>

                 Adds or replaces read group tags in a file.

                 OPTIONS:

                 -r STRING
                         Allows  you  to  specify  a  read group line to append to the header and
                         applies it to the reads specified by  the  -m  option.  If  repeated  it
                         automatically adds in tabs between invocations.

                 -R STRING
                         Allows  you  to  specify  the  read group ID of an existing @RG line and
                         applies it to the reads specified.

                 -m MODE If you choose orphan_only then existing RG tags are not overwritten,  if
                         you  choose overwrite_all, existing RG tags are overwritten. The default
                         is overwrite_all.

                 -o STRING
                         Write the final output to STRING. The default is to write to stdout.

                         By default, samtools tries to  select  a  format  based  on  the  output
                         filename  extension; if output is to standard output or no format can be
                         deduced, bam is selected.

       calmd     samtools calmd [-Eeubr] [-C capQcoef] <aln.bam> <ref.fasta>

                 Generate the MD tag. If the MD tag is already present, this command will give  a
                 warning  if  the MD tag generated is different from the existing tag. Output SAM
                 by default.

                 Calmd can also read and write CRAM files although in most cases it is  pointless
                 as  CRAM recalculates MD and NM tags on the fly.  The one exception to this case
                 is where both input and output CRAM files have been / are being created with the
                 no_ref option.

                 OPTIONS:

                 -A      When  used  jointly  with  -r  this  option overwrites the original base
                         quality.

                 -e      Convert a the read base to = if it is identical to the aligned reference
                         base. Indel caller does not support the = bases at the moment.

                 -u      Output uncompressed BAM

                 -b      Output compressed BAM

                 -C INT  Coefficient  to  cap  mapping  quality  of  poorly mapped reads. See the
                         pileup command for details. [0]

                 -r      Compute the BQ tag (without -A) or cap base quality by BAQ (with -A).

                 -E      Extended  BAQ  calculation.   This   option   trades   specificity   for
                         sensitivity, though the effect is minor.

       targetcut samtools  targetcut  [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2 em2] [-f
                 ref] <in.bam>

                 This command identifies target regions  by  examining  the  continuity  of  read
                 depth,  computes  haploid  consensus sequences of targets and outputs a SAM with
                 each sequence corresponding to a target. When option -f is in use, BAQ  will  be
                 applied.  This  command  is  only designed for cutting fosmid clones from fosmid
                 pool sequencing [Ref. Kitzman et al. (2010)].

       phase     samtools phase [-AF] [-k len] [-b prefix] [-q minLOD] [-Q minBaseQ] <in.bam>

                 Call and phase heterozygous SNPs.

                 OPTIONS:

                 -A      Drop reads with ambiguous phase.

                 -b STR  Prefix of BAM output. When this option is in use, phase-0 reads will  be
                         saved  in  file STR.0.bam and phase-1 reads in STR.1.bam.  Phase unknown
                         reads will be randomly allocated to one of the two files. Chimeric reads
                         with switch errors will be saved in STR.chimeric.bam.  [null]

                 -F      Do not attempt to fix chimeric reads.

                 -k INT  Maximum length for local phasing. [13]

                 -q INT  Minimum Phred-scaled LOD to call a heterozygote. [40]

                 -Q INT  Minimum base quality to be used in het calling. [13]

       depad     samtools depad [-SsCu1] [-T ref.fa] [-o output] <in.bam>

                 Converts  a  BAM aligned against a padded reference to a BAM aligned against the
                 depadded reference.  The padded reference may contain verbatim "*" bases in  it,
                 but  "*"  bases  are also counted in the reference numbering.  This means that a
                 sequence base-call aligned against a reference "*" is considered to be  a  cigar
                 match  ("M"  or "X") operator (if the base-call is "A", "C", "G" or "T").  After
                 depadding the reference "*" bases are deleted and such  aligned  sequence  base-
                 calls  become  insertions.   Similarly  transformations  apply for deletions and
                 padding cigar operations.

                 OPTIONS:

                 -S     Ignored for compatibility with previous  samtools  versions.   Previously
                        this  option was required if input was in SAM format, but now the correct
                        format is automatically detected by examining the first few characters of
                        input.

                 -s     Output in SAM format.  The default is BAM.

                 -C     Output in CRAM format.  The default is BAM.

                 -u     Do not compress the output.  Applies to either BAM or CRAM output format.

                 -1     Enable fastest compression level.  Only works for BAM or CRAM output.

                 -T FILE
                        Provides  the padded reference file.  Note that without this the @SQ line
                        lengths will be incorrect, so for most use  cases  this  option  will  be
                        considered as mandatory.

                 -o FILE
                        Specifies the output filename.  By default output is sent to stdout.

       markdup   samtools markdup [-l length] [-r] [-s] [-T] [-S] in.algsort.bam out.bam

                 Mark  duplicate  alignments  from  a  coordinate  sorted  file that has been run
                 through fixmate with the -m option.  This program relies on the MC and  ms  tags
                 that fixmate provides.

                 -l INT     Expected maximum read length of INT bases.  [300]

                 -r         Remove duplicate reads.

                 -s         Print some basic stats.

                 -T PREFIX  Write temporary files to PREFIX.samtools.nnnn.mmmm.tmp

                 -S         Mark supplementary reads of duplicates as duplicates.

           EXAMPLE

           # The first sort can be omitted if the file is already name ordered
           samtools sort -n -o namesort.bam example.bam

           # Add ms and MC tags for markdup to use later
           samtools fixmate -m namesort.bam fixmate.bam

           # Markdup needs position order
           samtools sort -o positionsort.bam fixmate.bam

           # Finally mark duplicates
           samtools markdup positionsort.bam markdup.bam

       help, --help
                 Display  a  brief usage message listing the samtools commands available.  If the
                 name of a command is also given, e.g., samtools help view,  the  detailed  usage
                 message for that particular command is displayed.

       --version Display  the  version  numbers  and  copyright  information for samtools and the
                 important libraries used by samtools.

       --version-only
                 Display the full samtools version number in a machine-readable format.

GLOBAL OPTIONS

       Several long-options  are  shared  between  multiple  samtools  subcommands:  --input-fmt,
       --input-fmt-option,  --output-fmt, --output-fmt-option, and --reference.  The input format
       is typically auto-detected so specifying the format is usually unnecessary and the  option
       is  included  for  completeness.  Note that not all subcommands have all options.  Consult
       the subcommand help for more details.

       Format strings recognised are "sam", "bam" and "cram".  They may be followed  by  a  comma
       separated list of options as key or key=value. See below for examples.

       The  fmt-option  arguments  accept either a single option or option=value.  Note that some
       options only work on some file formats and only on read or write  streams.   If  value  is
       unspecified  for a boolean option, the value is assumed to be 1.  The valid options are as
       follows.

       level=INT
           Output only. Specifies the compression level from 1 to 9, or 0 for uncompressed.

       nthreads=INT
           Specifies the number of threads to use during encoding and/or decoding.  For BAM  this
           will be encoding only.  In CRAM the threads are dynamically shared between encoder and
           decoder.

       reference=fasta_file
           Specifies a FASTA reference file for use in CRAM encoding or decoding.  It usually  is
           not  required for decoding except in the situation of the MD5 not being obtainable via
           the REF_PATH or REF_CACHE environment variables.

       decode_md=0|1
           CRAM input only; defaults to 1 (on).  CRAM does not typically store MD  and  NM  tags,
           preferring  to generate them on the fly.  This option controls this behaviour.  It can
           be particularly useful  when  combined  with  a  file  encoded  using  store_md=1  and
           store_nm=1.

       store_md=0|1
           CRAM  output  only;  defaults  to  0 (off).  CRAM normally only stores MD tags when no
           reference is unknown and lets  the  decoder  generate  these  values  on-the-fly  (see
           decode_md).

       store_nm=0|1
           CRAM  output  only;  defaults  to  0 (off).  CRAM normally only stores NM tags when no
           reference is unknown and lets  the  decoder  generate  these  values  on-the-fly  (see
           decode_md).

       ignore_md5=0|1
           CRAM  input  only;  defaults  to  0  (off).   When enabled, md5 checksum errors on the
           reference sequence and block checksum errors within CRAM are  ignored.   Use  of  this
           option is strongly discouraged.

       required_fields=bit-field
           CRAM  input  only;  specifies  which SAM columns need to be populated.  By default all
           fields are used.  Limiting  the  decode  to  specific  columns  can  have  significant
           performance  gains.  The bit-field is a numerical value constructed from the following
           table.

                                                0x1   SAM_QNAME
                                                0x2   SAM_FLAG
                                                0x4   SAM_RNAME
                                                0x8   SAM_POS
                                               0x10   SAM_MAPQ
                                               0x20   SAM_CIGAR
                                               0x40   SAM_RNEXT
                                               0x80   SAM_PNEXT
                                              0x100   SAM_TLEN
                                              0x200   SAM_SEQ
                                              0x400   SAM_QUAL
                                              0x800   SAM_AUX
                                             0x1000   SAM_RGAUX

       name_prefix=string
           CRAM input only; defaults to output filename.  Any sequences with auto-generated  read
           names will use string as the name prefix.

       multi_seq_per_slice=0|1
           CRAM  output  only;  defaults to 0 (off).  By default CRAM generates one container per
           reference sequence, except in the case of many small references (such as a  fragmented
           assembly).

       version=major.minor
           CRAM output only.  Specifies the CRAM version number.  Acceptable values are "2.1" and
           "3.0".

       seqs_per_slice=INT
           CRAM output only; defaults to 10000.

       slices_per_container=INT
           CRAM output only; defaults to 1.  The effect of having multiple slices  per  container
           is to share the compression header block between multiple slices.  This is unlikely to
           have any significant impact unless the number  of  sequences  per  slice  is  reduced.
           (Together these two options control the granularity of random access.)

       embed_ref=0|1
           CRAM  output  only;  defaults  to  0  (off).   If  1,  this will store portions of the
           reference sequence in each  slice,  permitting  decode  without  having  requiring  an
           external copy of the reference sequence.

       no_ref=0|1
           CRAM  output  only; defaults to 0 (off).  If 1, sequences will be stored verbatim with
           no reference encoding.  This can be useful if no reference is available for the file.

       use_bzip2=0|1
           CRAM output  only;  defaults  to  0  (off).   Permits  use  of  bzip2  in  CRAM  block
           compression.

       use_lzma=0|1
           CRAM output only; defaults to 0 (off).  Permits use of lzma in CRAM block compression.

       lossy_names=0|1
           CRAM  output  only;  defaults to 0 (off).  If 1, templates with all members within the
           same CRAM slice will have their read names removed.  New names will  be  automatically
           generated during decoding.  Also see the name_prefix option.

       For example:

           samtools view --input-fmt-option decode_md=0
               --output-fmt cram,version=3.0 --output-fmt-option embed_ref
               --output-fmt-option seqs_per_slice=2000 -o foo.cram foo.bam

REFERENCE SEQUENCES

       The CRAM format requires use of a reference sequence for both reading and writing.

       When  reading  a  CRAM the @SQ headers are interrogated to identify the reference sequence
       MD5sum (M5: tag) and the local reference sequence filename (UR: tag).  Note  that  http://
       and  ftp://  based  URLs in the UR: field are not used, but local fasta filenames (with or
       without file://) can be used.

       To create a CRAM the @SQ headers will also be read to identify  the  reference  sequences,
       but  M5:  and  UR: tags may not be present. In this case the -T and -t options of samtools
       view may be used to specify the fasta or fasta.fai filenames  respectively  (provided  the
       .fasta.fai file is also backed up by a .fasta file).

       The search order to obtain a reference is:

       1. Use any local file specified by the command line options (eg -T).

       2. Look for MD5 via REF_CACHE environment variable.

       3. Look for MD5 in each element of the REF_PATH environment variable.

       4. Look for a local file listed in the UR: header tag.

ENVIRONMENT VARIABLES

       HTS_PATH
              A  colon-separated  list  of directories in which to search for HTSlib plugins.  If
              $HTS_PATH starts or ends with a colon or contains a double colon (::), the built-in
              list of directories is searched at that point in the search.

              If no HTS_PATH variable is defined, the built-in list of directories specified when
              HTSlib was built is used, which typically  includes  /usr/local/libexec/htslib  and
              similar directories.

       REF_PATH
              A  colon  separated  (semi-colon on Windows) list of locations in which to look for
              sequences identified by their MD5sums.  This can be either a list of directories or
              URLs.  Note  that if a URL is included then the colon in http:// and ftp:// and the
              optional port number will be treated as part of  the  URL  and  not  a  PATH  field
              separator.  For URLs, the text %s will be replaced by the MD5sum being read.

              If     no     REF_PATH     has     been    specified    it    will    default    to
              http://www.ebi.ac.uk/ena/cram/md5/%s and if REF_CACHE is also unset, it will be set
              to  $XDG_CACHE_HOME/hts-ref/%2s/%2s/%s.   If $XDG_CACHE_HOME is unset, $HOME/.cache
              (or a local system temporary directory if no home directory is found) will be  used
              similarly.

       REF_CACHE
              This  can  be  defined  to  a single directory housing a local cache of references.
              Upon downloading a reference it will be  stored  in  the  location  pointed  to  by
              REF_CACHE.  When reading a reference it will be looked for in this directory before
              searching REF_PATH.  To avoid many files being stored  in  the  same  directory,  a
              pathname  may  be constructed using %nums and %s notation, consuming num characters
              of the MD5sum.   For  example  /local/ref_cache/%2s/%2s/%s  will  create  2  nested
              subdirectories  with  the  filenames  in  the  deepest  directory being the last 28
              characters of the md5sum.

              The REF_CACHE directory will be searched for before  attempting  to  load  via  the
              REF_PATH  search list.  If no REF_PATH is defined, both REF_PATH and REF_CACHE will
              be automatically set (see above), but if REF_PATH is defined and REF_CACHE not then
              no local cache is used.

              To aid population of the REF_CACHE directory a script misc/seq_cache_populate.pl is
              provided in the Samtools distribution. This takes a fasta file or  a  directory  of
              fasta files and generates the MD5sum named files.

EXAMPLES

       o Import SAM to BAM when @SQ lines are present in the header:

           samtools view -bS aln.sam > aln.bam

         If @SQ lines are absent:

           samtools faidx ref.fa
           samtools view -bt ref.fa.fai aln.sam > aln.bam

         where ref.fa.fai is generated automatically by the faidx command.

       o Convert a BAM file to a CRAM file using a local reference sequence.

           samtools view -C -T ref.fa aln.bam > aln.cram

       o Attach the RG tag while merging sorted alignments:

           perl -e 'print "@RG\tID:ga\tSM:hs\tLB:ga\tPL:Illumina\n@RG\tID:454\tSM:hs\tLB:454\tPL:454\n"' > rg.txt
           samtools merge -rh rg.txt merged.bam ga.bam 454.bam

         The  value  in  a RG tag is determined by the file name the read is coming from. In this
         example, in the merged.bam, reads from ga.bam will be attached RG:Z:ga, while reads from
         454.bam will be attached RG:Z:454.

       o Convert a BAM file to a CRAM with NM and MD tags stored verbatim rather than calculating
         on the fly during CRAM decode, so that mixed data sets with MD/NM only on some  records,
         or NM calculated using different definitions of mismatch, can be decoded without change.
         The second command demonstrates how to decode such a file.  The request to not decode MD
         here is turning off auto-generation of both MD and NM; it will still emit the MD/NM tags
         on records that had these stored verbatim.

           samtools view -C --output-fmt-option store_md=1 --output-fmt-option store_nm=1 -o aln.cram aln.bam
           samtools view --input-fmt-option decode_md=0 -o aln.new.bam aln.cram

       o An alternative way of  achieving  the  above  is  listing  multiple  options  after  the
         --output-fmt or -O option.  The commands below are equivalent to the two above.

           samtools view -O cram,store_md=1,store_nm=1 -o aln.cram aln.bam
           samtools view --input-fmt cram,decode_md=0 -o aln.new.bam aln.cram

       o Call SNPs and short INDELs:

           samtools mpileup -uf ref.fa aln.bam | bcftools call -mv > var.raw.vcf
           bcftools filter -s LowQual -e '%QUAL<20 || DP>100' var.raw.vcf  > var.flt.vcf

         The  bcftools  filter  command  marks  low  quality  sites and sites with the read depth
         exceeding a limit, which should be adjusted  to  about  twice  the  average  read  depth
         (bigger  read  depths  usually indicate problematic regions which are often enriched for
         artefacts).  One may consider to add -C50 to mpileup if mapping quality is overestimated
         for  reads containing excessive mismatches. Applying this option usually helps BWA-short
         but may not other mappers.

         Individuals are identified from the SM tags in the @RG header lines. Individuals can  be
         pooled  in one alignment file; one individual can also be separated into multiple files.
         The -P option specifies that indel candidates should be collected only from read  groups
         with  the  @RG-PL tag set to ILLUMINA.  Collecting indel candidates from reads sequenced
         by an indel-prone technology may affect the performance of indel calling.

       o Generate the consensus sequence for one diploid individual:

           samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq

       o Phase one individual:

           samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out

         The calmd command is used to reduce false heterozygotes around INDELs.

       o Dump BAQ applied alignment for other SNP callers:

           samtools calmd -bAr aln.bam > aln.baq.bam

         It adds and corrects the NM and MD tags at the same time. The calmd command  also  comes
         with the -C option, the same as the one in pileup and mpileup.  Apply if it helps.

LIMITATIONS

       o Unaligned words used in bam_import.c, bam_endian.h, bam.c and bam_aux.c.

       o Samtools  paired-end  rmdup  does not work for unpaired reads (e.g. orphan reads or ends
         mapped  to  different  chromosomes).  If  this  is  a  concern,  please   use   Picard's
         MarkDuplicates which correctly handles these cases, although a little slower.

AUTHOR

       Heng Li from the Sanger Institute wrote the original C version of samtools.  Bob Handsaker
       from the Broad Institute implemented the BGZF library.  James  Bonfield  from  the  Sanger
       Institute developed the CRAM implementation.  John Marshall and Petr Danecek contribute to
       the source code and various people from the 1000 Genomes Project have contributed  to  the
       SAM format specification.

SEE ALSO

       bcftools(1), sam(5), tabix(1)

       Samtools website: <http://www.htslib.org/>
       File format specification of SAM/BAM,CRAM,VCF/BCF: <http://samtools.github.io/hts-specs>
       Samtools latest source: <https://github.com/samtools/samtools>
       HTSlib latest source: <https://github.com/samtools/htslib>
       Bcftools website: <http://samtools.github.io/bcftools>