Provided by: samtools_1.7-1_amd64 bug

NAME

       samtools - Utilities for the Sequence Alignment/Map (SAM) format

SYNOPSIS

       samtools view -bt ref_list.txt -o aln.bam aln.sam.gz

       samtools sort -T /tmp/aln.sorted -o aln.sorted.bam aln.bam

       samtools index aln.sorted.bam

       samtools idxstats aln.sorted.bam

       samtools flagstat aln.sorted.bam

       samtools stats aln.sorted.bam

       samtools bedcov aln.sorted.bam

       samtools depth aln.sorted.bam

       samtools view aln.sorted.bam chr2:20,100,000-20,200,000

       samtools merge out.bam in1.bam in2.bam in3.bam

       samtools faidx ref.fasta

       samtools tview aln.sorted.bam ref.fasta

       samtools split merged.bam

       samtools quickcheck in1.bam in2.cram

       samtools dict -a GRCh38 -s "Homo sapiens" ref.fasta

       samtools fixmate in.namesorted.sam out.bam

       samtools mpileup -C50 -gf ref.fasta -r chr3:1,000-2,000 in1.bam in2.bam

       samtools flags PAIRED,UNMAP,MUNMAP

       samtools fastq input.bam > output.fastq

       samtools fasta input.bam > output.fasta

       samtools addreplacerg -r 'ID:fish' -r 'LB:1334' -r 'SM:alpha' -o output.bam input.bam

       samtools collate aln.sorted.bam aln.name_collated.bam

       samtools depad input.bam

       samtools markdup in.algnsorted.bam out.bam

DESCRIPTION

       Samtools  is  a  set of utilities that manipulate alignments in the BAM format. It imports
       from and exports to the SAM (Sequence Alignment/Map) format,  does  sorting,  merging  and
       indexing, and allows one to retrieve reads in any regions swiftly.

       Samtools  is  designed  to  work on a stream. It regards an input file `-' as the standard
       input (stdin) and an output file `-' as the standard output (stdout). Several commands can
       thus be combined with Unix pipes. Samtools always output warning and error messages to the
       standard error output (stderr).

       Samtools is also able to open a BAM (not SAM) file on a remote FTP or HTTP server  if  the
       BAM  file  name  starts  with  `ftp://' or `http://'.  Samtools checks the current working
       directory for the index file and will download the index upon absence. Samtools  does  not
       retrieve the entire alignment file unless it is asked to do so.

COMMANDS AND OPTIONS

       view      samtools view [options] in.sam|in.bam|in.cram [region...]

                 With  no  options  or  regions specified, prints all alignments in the specified
                 input alignment file (in SAM, BAM, or CRAM format) to  standard  output  in  SAM
                 format (with no header).

                 You  may  specify  one  or  more space-separated region specifications after the
                 input filename to restrict output to only those  alignments  which  overlap  the
                 specified  region(s).  Use of region specifications requires a coordinate-sorted
                 and indexed input file (in BAM or CRAM format).

                 The -b, -C, -1, -u, -h, -H, and -c options change the  output  format  from  the
                 default  of  headerless  SAM,  and  the  -o  and  -U options set the output file
                 name(s).

                 The -t and -T options provide  additional  reference  data.  One  of  these  two
                 options  is  required  when  SAM  input does not contain @SQ headers, and the -T
                 option is required whenever writing CRAM output.

                 The -L, -M, -r, -R, -s, -q, -l, -m, -f, -F, and -G options filter the alignments
                 that  will be included in the output to only those alignments that match certain
                 criteria.

                 The -x and -B options modify the data which is contained in each alignment.

                 Finally, the -@ option can be used to allocate additional threads to be used for
                 compression, and the -?  option requests a long help message.

       REGIONS:
                 Regions   can  be  specified  as:  RNAME[:STARTPOS[-ENDPOS]]  and  all  position
                 coordinates are 1-based.

                 Important note: when multiple regions are given, some alignments may  be  output
                 multiple times if they overlap more than one of the specified regions.

                 Examples of region specifications:

                 chr1      Output  all  alignments  mapped to the reference sequence named `chr1'
                           (i.e. @SQ SN:chr1).

                 chr2:1000000
                           The region on chr2 beginning at base position 1,000,000 and ending  at
                           the end of the chromosome.

                 chr3:1000-2000
                           The  1001bp region on chr3 beginning at base position 1,000 and ending
                           at base position 2,000 (including both end positions).

                 '*'       Output the unmapped reads at the end of  the  file.   (This  does  not
                           include  any  unmapped  reads placed on a reference sequence alongside
                           their mapped mates.)

                 .         Output all alignments.  (Mostly unnecessary as not specifying a region
                           at all has the same effect.)

       OPTIONS:

                 -b        Output in the BAM format.

                 -C        Output in the CRAM format (requires -T).

                 -1        Enable fast BAM compression (implies -b).

                 -u        Output   uncompressed   BAM.   This   option   saves   time  spent  on
                           compression/decompression and is thus preferred  when  the  output  is
                           piped to another samtools command.

                 -h        Include the header in the output.

                 -H        Output the header only.

                 -c        Instead  of  printing  the  alignments,  only count them and print the
                           total number. All filter options, such as -f, -F, and  -q,  are  taken
                           into account.

                 -?        Output long help and exit immediately.

                 -o FILE   Output to FILE [stdout].

                 -U FILE   Write  alignments  that are not selected by the various filter options
                           to FILE.  When this option is used, all alignments (or all  alignments
                           intersecting  the  regions specified) are written to either the output
                           file or this file, but never both.

                 -t FILE   A tab-delimited FILE.  Each line must contain the  reference  name  in
                           the first column and the length of the reference in the second column,
                           with one line for each  distinct  reference.   Any  additional  fields
                           beyond the second column are ignored. This file also defines the order
                           of the reference sequences in sorting. If  you  run:  `samtools  faidx
                           <ref.fa>',  the  resulting index file <ref.fa>.fai can be used as this
                           FILE.

                 -T FILE   A FASTA format reference FILE,  optionally  compressed  by  bgzip  and
                           ideally  indexed  by  samtools faidx.  If an index is not present, one
                           will be generated for you.

                 -L FILE   Only output alignments overlapping the input BED FILE [null].

                 -M        Use the multi-region iterator  on  the  union  of  the  BED  file  and
                           command-line  region  arguments.   This  avoids  re-reading  the  same
                           regions of files so can sometimes be  much  faster.   Note  this  also
                           removes  duplicate  sequences.   Without this a sequence that overlaps
                           multiple regions specified  on  the  command  line  will  be  reported
                           multiple times.

                 -r STR    Only output alignments in read group STR [null].

                 -R FILE   Output alignments in read groups listed in FILE [null].

                 -q INT    Skip alignments with MAPQ smaller than INT [0].

                 -l STR    Only output alignments in library STR [null].

                 -m INT    Only  output  alignments  with  number  of CIGAR bases consuming query
                           sequence ≥ INT [0]

                 -f INT    Only output alignments with all bits set in INT present  in  the  FLAG
                           field.   INT  can  be  specified  in  hex by beginning with `0x' (i.e.
                           /^0x[0-9A-F]+/) or in octal by beginning with  `0'  (i.e.  /^0[0-7]+/)
                           [0].

                 -F INT    Do  not output alignments with any bits set in INT present in the FLAG
                           field.  INT can be specified in  hex  by  beginning  with  `0x'  (i.e.
                           /^0x[0-9A-F]+/)  or  in  octal by beginning with `0' (i.e. /^0[0-7]+/)
                           [0].

                 -G INT    Do not output alignments with all bits set in INT present in the  FLAG
                           field.   This is the opposite of -f such that -f12 -G12 is the same as
                           no filtering at all.  INT can be specified in hex  by  beginning  with
                           `0x'  (i.e.  /^0x[0-9A-F]+/)  or  in octal by beginning with `0' (i.e.
                           /^0[0-7]+/) [0].

                 -x STR    Read tag to exclude from output (repeatable) [null]

                 -B        Collapse the backward CIGAR operation.

                 -s FLOAT  Output only a proportion of the input  alignments.   This  subsampling
                           acts  in  the  same  way  on  all of the alignment records in the same
                           template or read pair, so it never keeps a read but not its mate.

                           The integer and fractional parts of the -s INT.FRAC  option  are  used
                           separately:  the  part  after  the  decimal point sets the fraction of
                           templates/pairs to be kept, while the integer part is used as  a  seed
                           that influences which subset of reads is kept.

                           When  subsampling data that has previously been subsampled, be sure to
                           use a different seed value from those used previously; otherwise  more
                           reads will be retained than expected.

                 -@ INT    Number  of  BAM  compression threads to use in addition to main thread
                           [0].

                 -S        Ignored for compatibility with previous samtools versions.  Previously
                           this  option  was  required  if  input  was in SAM format, but now the
                           correct format is automatically detected by examining  the  first  few
                           characters of input.

       sort      samtools sort [-l level] [-m maxMem] [-o out.bam] [-O format] [-n] [-t tag] [-T
                 tmpprefix] [-@ threads] [in.sam|in.bam|in.cram]

                 Sort alignments by leftmost coordinates, or by read name when -n  is  used.   An
                 appropriate  @HD-SO  sort  order  header  tag  will  be added or an existing one
                 updated if necessary.

                 The sorted output is written to standard output by default, or to the  specified
                 file  (out.bam)  when -o is used.  This command will also create temporary files
                 tmpprefix.%d.bam as needed when the entire alignment data cannot fit into memory
                 (as controlled via the -m option).

                 Options:

                 -l INT     Set  the desired compression level for the final output file, ranging
                            from 0 (uncompressed) or 1 (fastest but  minimal  compression)  to  9
                            (best  compression  but  slowest  to  write),  similarly to gzip(1)'s
                            compression level setting.

                            If -l is not used, the default compression level will apply.

                 -m INT     Approximately the  maximum  required  memory  per  thread,  specified
                            either in bytes or with a K, M, or G suffix.  [768 MiB]

                            To  prevent  sort  from creating a huge number of temporary files, it
                            enforces a minimum value of 1M for this setting.

                 -n         Sort by read names (i.e., the QNAME field) rather than by chromosomal
                            coordinates.

                 -t TAG     Sort first by the value in the alignment tag TAG, then by position or
                            name (if also using -n).  -o FILE Write the final  sorted  output  to
                            FILE, rather than to standard output.

                 -O FORMAT  Write the final output as sam, bam, or cram.

                            By  default,  samtools  tries  to  select  a  format  based on the -o
                            filename extension; if output is to standard output or no format  can
                            be deduced, bam is selected.

                 -T PREFIX  Write  temporary files to PREFIX.nnnn.bam, or if the specified PREFIX
                            is an existing  directory,  to  PREFIX/samtools.mmm.mmm.tmp.nnnn.bam,
                            where mmm is unique to this invocation of the sort command.

                            By  default,  any  temporary  files  are written alongside the output
                            file, as out.bam.tmp.nnnn.bam, or if output is to standard output, in
                            the current directory as samtools.mmm.mmm.tmp.nnnn.bam.

                 -@ INT     Set number of sorting and compression threads.  By default, operation
                            is single-threaded.

                 Ordering Rules

                 The following rules are used for ordering records.

                 If option -t is in use, records are first sorted  by  the  value  of  the  given
                 alignment tag, and then by position or name (if using -n).  For example, “-t RG”
                 will make read group the primary sort key.  The rules for ordering by tag are:

                 •   Records that do not have the tag are sorted before ones that do.

                 •   If the types of the tags are different, they will be sorted so  that  single
                     character  tags  (type  A) come before array tags (type B), then string tags
                     (types H and Z), then numeric tags (types f and i).

                 •   Numeric tags (types f and i) are compared by value.  Note  that  comparisons
                     of floating-point values are subject to issues of rounding and precision.

                 •   String tags (types H and Z) are compared based on the binary contents of the
                     tag using the C strcmp(3) function.

                 •   Character tags (type A) are compared by binary character value.

                 •   No attempt is made to compare tags of other types —  notably  type  B  array
                     values will not be compared.

                 When  the  -n option is present, records are sorted by name.  Names are compared
                 so as to give a “natural” ordering — i.e.  sections  consisting  of  digits  are
                 compared numerically while all other sections are compared based on their binary
                 representation.  This means “a1” will come before “b1” and “a9” will come before
                 “a10”.   Records  with  the same name will be ordered according to the values of
                 the READ1 and READ2 flags (see flags).

                 When the -n option is not present, reads are sorted by reference  (according  to
                 the  order  of  the  @SQ header records), then by position in the reference, and
                 then by the REVERSE flag.

                 Note

                 Historically samtools sort also accepted a less flexible way of  specifying  the
                 final and temporary output filenames:

                        samtools sort [-f] [-o] in.bam out.prefix

                 This  has now been removed.  The previous out.prefix argument (and -f option, if
                 any) should be changed to an appropriate combination of -T PREFIX and  -o  FILE.
                 The previous -o option should be removed, as output defaults to standard output.

       index     samtools index [-bc] [-m INT] aln.bam|aln.cram [out.index]

                 Index  a  coordinate-sorted BAM or CRAM file for fast random access.  (Note that
                 this does not work with SAM files even if they are bgzip compressed —  to  index
                 such files, use tabix(1) instead.)

                 This  index  is needed when region arguments are used to limit samtools view and
                 similar commands to particular regions of interest.

                 If an output filename is given, the index file will  be  written  to  out.index.
                 Otherwise,  for  a CRAM file aln.cram, index file aln.cram.crai will be created;
                 for a BAM file aln.bam, either  aln.bam.bai  or  aln.bam.csi  will  be  created,
                 depending on the index format selected.

                 Options:

                 -b      Create  a  BAI  index.   This  is  currently  the default when no format
                         options are used.

                 -c      Create a CSI index.  By default, the minimum interval size for the index
                         is 2^14, which is the same as the fixed value used by the BAI format.

                 -m INT  Create a CSI index, with a minimum interval size of 2^INT.

       idxstats  samtools idxstats in.sam|in.bam|in.cram

                 Retrieve  and  print  stats  in  the index file corresponding to the input file.
                 Before calling idxstats, the input BAM file must be indexed by samtools index.

                 The output is TAB-delimited with each  line  consisting  of  reference  sequence
                 name,  sequence  length,  #  mapped reads and # unmapped reads. It is written to
                 stdout.

       flagstat  samtools flagstat in.sam|in.bam|in.cram

                 Does a full pass through the input file to calculate  and  print  statistics  to
                 stdout.

                 Provides  counts  for  each of 13 categories based primarily on bit flags in the
                 FLAG field. Each category in the output is broken down into QC pass and QC fail,
                 which is presented as "#PASS + #FAIL" followed by a description of the category.

                 The  first  row  of  output gives the total number of reads that are QC pass and
                 fail (according to flag bit 0x200). For example:

                   122 + 28 in total (QC-passed reads + QC-failed reads)

                 Which would indicate that there are a total of 150 reads in the input file,  122
                 of  which  are  marked  as  QC  pass  and 28 of which are marked as "not passing
                 quality controls"

                 Following this, additional categories are given for reads which are:

                         secondary
                                0x100 bit set

                         supplementary
                                0x800 bit set

                         duplicates
                                0x400 bit set

                         mapped 0x4 bit not set

                         paired in sequencing
                                0x1 bit set

                         read1  both 0x1 and 0x40 bits set

                         read2  both 0x1 and 0x80 bits set

                         properly paired
                                both 0x1 and 0x2 bits set and 0x4 bit not set

                         with itself and mate mapped
                                0x1 bit set and neither 0x4 nor 0x8 bits set

                         singletons
                                both 0x1 and 0x8 bits set and bit 0x4 not set

                 And finally, two rows are given that additionally filter on the  reference  name
                 (RNAME), mate reference name (MRNM), and mapping quality (MAPQ) fields:

                         with mate mapped to a different chr
                                0x1  bit  set and neither 0x4 nor 0x8 bits set and MRNM not equal
                                to RNAME

                         with mate mapped to a different chr (mapQ>=5)
                                0x1 bit set and neither 0x4 nor 0x8 bits set and MRNM  not  equal
                                to RNAME and MAPQ >= 5

       stats     samtools stats [options] in.sam|in.bam|in.cram [region...]

                 samtools  stats collects statistics from BAM files and outputs in a text format.
                 The output can be visualized graphically using plot-bamstats.

                 Options:

                 -c, --coverage MIN,MAX,STEP
                         Set coverage distribution to the specified range  (MIN,  MAX,  STEP  all
                         given as integers) [1,1000,1]

                 -d, --remove-dups
                         Exclude from statistics reads marked as duplicates

                 -f, --required-flag STR|INT
                         Required flag, 0 for unset. See also `samtools flags` [0]

                 -F, --filtering-flag STR|INT
                         Filtering flag, 0 for unset. See also `samtools flags` [0]

                 --GC-depth FLOAT
                         the  size  of  GC-depth  bins  (decreasing  bin  size  increases  memory
                         requirement) [2e4]

                 -h, --help
                         This help message

                 -i, --insert-size INT
                         Maximum insert size [8000]

                 -I, --id STR
                         Include only listed read group or sample name []

                 -l, --read-length INT
                         Include in the statistics only reads with the given read length []

                 -m, --most-inserts FLOAT
                         Report only the main part of inserts [0.99]

                 -P, --split-prefix STR
                         A path or string prefix to prepend to  filenames  output  when  creating
                         categorised statistics files with -S/--split.  [input filename]

                 -q, --trim-quality INT
                         The BWA trimming parameter [0]

                 -r, --ref-seq FILE
                         Reference  sequence  (required  for  GC-depth  and  mismatches-per-cycle
                         calculation).  []

                 -S, --split TAG
                         In  addition  to  the  complete  statistics,  also  output   categorised
                         statistics  based on the tagged field TAG (e.g., use --split RG to split
                         into read groups).

                         Categorised    statistics     are     written     to     files     named
                         <prefix>_<value>.bamstat, where prefix is as given by --split-prefix (or
                         the input filename by default) and value has  been  encountered  as  the
                         specified tagged field's value in one or more alignment records.

                 -t, --target-regions FILE
                         Do stats in these regions only. Tab-delimited file chr,from,to, 1-based,
                         inclusive.  []

                 -x, --sparse
                         Suppress outputting IS rows where there are no insertions.

       bedcov    samtools bedcov [options] region.bed in1.sam|in1.bam|in1.cram[...]

                 Reports the total read base count (i.e. the sum of per  base  read  depths)  for
                 each  genomic  region  specified  in  the  supplied  BED  file.  Counts for each
                 alignment file supplied are reported in separate columns.

                 Options:

                 -Q INT Only count reads with mapping quality greater than INT

       depth     samtools depth  [options]  [in1.sam|in1.bam|in1.cram  [in2.sam|in2.bam|in2.cram]
                 [...]]

                 Computes the depth at each position or region.

                 Options:

                 -a      Output all positions (including those with zero depth)

                 -a -a, -aa
                         Output  absolutely  all positions, including unused reference sequences.
                         Note that when used in conjunction with a BED file  the  -a  option  may
                         sometimes  operate as if -aa was specified if the reference sequence has
                         coverage outside of the region specified in the BED file.

                 -b FILE Compute depth at list of positions or regions in specified BED FILE.  []

                 -f FILE Use the BAM files specified in the FILE (a file of filenames,  one  file
                         per line) []

                 -l INT  Ignore reads shorter than INT

                 -m, -d INT
                         Truncate reported depth at a maximum of INT reads.  [8000]

                 -q INT  Only count reads with base quality greater than INT

                 -Q INT  Only count reads with mapping quality greater than INT

                 -r CHR:FROM-TO
                         Only report depth in specified region.

       merge     samtools  merge  [-nur1f]  [-h inh.sam] [-R reg] [-b <list>] <out.bam> <in1.bam>
                 [<in2.bam> <in3.bam> ... <inN.bam>]

                 Merge multiple sorted alignment files, producing a  single  sorted  output  file
                 that contains all the input records and maintains the existing sort order.

                 If  -h  is  specified  the  @SQ  headers  of input files will be merged into the
                 specified header, otherwise they will be merged into a composite header  created
                 from  the  input headers.  If in the process of merging @SQ lines for coordinate
                 sorted input files, a conflict arises as to the order  (for  example  input1.bam
                 has  @SQ for a,b,c and input2.bam has b,a,c) then the resulting output file will
                 need to be re-sorted back into coordinate order.

                 Unless the -c or -p flags are specified then when merging @RG  and  @PG  records
                 into  the  output  header then any IDs found to be duplicates of existing IDs in
                 the output header will have a suffix appended to them to differentiate them from
                 similar  header records from other files and the read records will be updated to
                 reflect this.

                 The ordering of the records in the input files must match the usage  of  the  -n
                 and  -t  command-line  options.   If  they  do  not,  the  output  order will be
                 undefined.  See sort for information about record ordering.

                 OPTIONS:

                 -1      Use zlib compression level 1 to compress the output.

                 -b FILE List of input BAM files, one file per line.

                 -f      Force to overwrite the output file if present.

                 -h FILE Use the lines of FILE as `@' headers to be copied to out.bam,  replacing
                         any  header lines that would otherwise be copied from in1.bam.  (FILE is
                         actually in SAM format, though any alignment records it may contain  are
                         ignored.)

                 -n      The input alignments are sorted by read names rather than by chromosomal
                         coordinates

                 -t TAG  The input alignments have been sorted by  the  value  of  TAG,  then  by
                         either position or name (if -n is given).

                 -R STR  Merge files in the specified region indicated by STR [null]

                 -r      Attach  an RG tag to each alignment. The tag value is inferred from file
                         names.

                 -u      Uncompressed BAM output

                 -c      When several input files contain @RG headers with the same ID, emit only
                         one of them (namely, the header line from the first file we find that ID
                         in) to the merged output  file.   Combining  these  similar  headers  is
                         usually  the  right  thing  to do when the files being merged originated
                         from the same file.

                         Without -c, all @RG headers appear  in  the  output  file,  with  random
                         suffixes added to their IDs where necessary to differentiate them.

                 -p      Similarly,  for  each  @PG  ID in the set of files to merge, use the @PG
                         line of the first file we find that ID in rather than adding a suffix to
                         differentiate similar IDs.

       faidx     samtools faidx <ref.fasta> [region1 [...]]

                 Index reference sequence in the FASTA format or extract subsequence from indexed
                 reference sequence. If no region is specified, faidx will  index  the  file  and
                 create  <ref.fasta>.fai  on the disk. If regions are specified, the subsequences
                 will be retrieved and printed to stdout in the FASTA format.

                 The input file can be compressed in the BGZF format.

                 The sequences in the input file should all have different  names.   If  they  do
                 not,  indexing  will emit a warning about duplicate sequences and retrieval will
                 only produce subsequences from the first sequence with the duplicated name.

       tview     samtools tview [-p chr:pos] [-s STR] [-d display] <in.sorted.bam> [ref.fasta]

                 Text alignment viewer (based on the ncurses library). In the viewer,  press  `?'
                 for  help and press `g' to check the alignment start from a region in the format
                 like  `chr10:10,000,000'  or  `=10,000,000'  when  viewing  the  same  reference
                 sequence.

                 Options:

                 -d display    Output as (H)tml or (C)urses or (T)ext

                 -p chr:pos    Go directly to this position

                 -s STR        Display only alignments from this sample or read group

       split     samtools split [options] merged.sam|merged.bam|merged.cram

                 Splits a file by read group.

                 Options:

                 -u FILE1      Put reads with no RG tag or an unrecognised RG tag into FILE1

                 -u FILE1:FILE2
                               As above, but assigns an RG tag as given in the header of FILE2

                 -f STRING     Output filename format string (see below) ["%*_%#.%."]

                 -v            Verbose output

                 Format string expansions:

                                      %%   %
                                      %*   basename
                                      %#   @RG index
                                      %!   @RG ID
                                      %.   output format filename extension

       quickcheck
                 samtools quickcheck [options] in.sam|in.bam|in.cram [ ... ]

                 Quickly check that input files appear to be intact. Checks that beginning of the
                 file contains a valid header  (all  formats)  containing  at  least  one  target
                 sequence  and  then  seeks to the end of the file and checks that an end-of-file
                 (EOF) is present and intact (BAM only).

                 Data in the middle of the file is not read since that would be  much  more  time
                 consuming, so please note that this command will not detect internal corruption,
                 but is useful for testing that files are not truncated  before  performing  more
                 intensive tasks on them.

                 This command will exit with a non-zero exit code if any input files don't have a
                 valid header or are missing an EOF block. Otherwise it  will  exit  successfully
                 (with a zero exit code).

                 Options:

                 -v      Verbose  output:  will  additionally  print the names of all input files
                         that don't pass the check to stdout.  Multiple  -v  options  will  cause
                         additional messages regarding check results to be printed to stderr.

                 -q      Quiet  mode:  disables warning messages on stderr about files that fail.
                         If both -q and -v options are used then  the  appropriate  level  of  -v
                         takes precedence.

       dict      samtools dict <ref.fasta|ref.fasta.gz>

                 Create a sequence dictionary file from a fasta file.

                 OPTIONS:

                 -a, --assembly STR
                            Specify the assembly for the AS tag.

                 -H, --no-header
                            Do not print the @HD header line.

                 -o, --output FILE
                            Output to FILE [stdout].

                 -s, --species STR
                            Specify the species for the SP tag.

                 -u, --uri STR
                            Specify  the  URI  for  the  UR tag. Defaults to the absolute path of
                            ref.fasta unless reading from stdin.

       fixmate   samtools fixmate [-rpcm] [-O format] in.nameSrt.bam out.bam

                 Fill in mate coordinates, ISIZE  and  mate  related  flags  from  a  name-sorted
                 alignment.

                 OPTIONS:

                 -r         Remove secondary and unmapped reads.

                 -p         Disable FR proper pair check.

                 -c         Add template cigar ct tag.

                 -m         Add  ms  (mate  score) tags.  These are used by markdup to select the
                            best reads to keep.

                 -O FORMAT  Write the final output as sam, bam, or cram.

                            By default, samtools tries to select a format  based  on  the  output
                            filename  extension; if output is to standard output or no format can
                            be deduced, bam is selected.

       mpileup   samtools mpileup [-EBugp] [-C capQcoef]  [-r  reg]  [-f  in.fa]  [-l  list]  [-Q
                 minBaseQ] [-q minMapQ] in.bam [in2.bam [...]]

                 Generate VCF, BCF or pileup for one or multiple BAM files. Alignment records are
                 grouped by sample (SM) identifiers in @RG header lines.  If  sample  identifiers
                 are absent, each input file is regarded as one sample.

                 In  the  pileup  format  (without  -u or -g),  each  line  represents  a genomic
                 position, consisting of chromosome name, 1-based coordinate, reference base, the
                 number  of  reads  covering  the  site, read bases, base qualities and alignment
                 mapping qualities.  Information  on  match,  mismatch,  indel,  strand,  mapping
                 quality  and start and end of a read are all encoded at the read base column. At
                 this column, a dot stands for a match to  the  reference  base  on  the  forward
                 strand,  a comma for a match on the reverse strand, a '>' or '<' for a reference
                 skip, `ACGTN' for a mismatch on the forward strand and `acgtn' for a mismatch on
                 the  reverse  strand.  A  pattern  `\+[0-9]+[ACGTNacgtn]+' indicates there is an
                 insertion between this reference position and the next reference  position.  The
                 length  of the insertion is given by the integer in the pattern, followed by the
                 inserted sequence. Similarly,  a  pattern  `-[0-9]+[ACGTNacgtn]+'  represents  a
                 deletion  from  the reference. The deleted bases will be presented as `*' in the
                 following lines. Also at the read base column, a symbol `^' marks the start of a
                 read.  The  ASCII  of  the  character  following  `^' minus 33 gives the mapping
                 quality. A symbol `$' marks the end of a read segment.

                 Note that there are two orthogonal ways to specify locations in the input  file;
                 via -r region and -l file.  The former uses (and requires) an index to do random
                 access while the latter streams through the  file  contents  filtering  out  the
                 specified regions, requiring no index.  The two may be used in conjunction.  For
                 example a BED file containing locations of  genes  in  chromosome  20  could  be
                 specified  using  -r  20  -l  chr20.bed,  meaning that the index is used to find
                 chromosome 20 and then it is filtered for the regions listed in the bed file.

                 Input Options:

                 -6, --illumina1.3+
                           Assume the quality is in the Illumina 1.3+ encoding.

                 -A, --count-orphans
                           Do not skip anomalous read pairs in variant calling.

                 -b, --bam-list FILE
                           List of input BAM files, one file per line [null]

                 -B, --no-BAQ
                           Disable  probabilistic  realignment  for  the  computation   of   base
                           alignment quality (BAQ). BAQ is the Phred-scaled probability of a read
                           base being misaligned. Applying this option greatly  helps  to  reduce
                           false SNPs caused by misalignments.

                 -C, --adjust-MQ INT
                           Coefficient  for  downgrading  mapping  quality  for  reads containing
                           excessive mismatches. Given a read with a phred-scaled  probability  q
                           of  being  generated from the mapped position, the new mapping quality
                           is  about  sqrt((INT-q)/INT)*INT.   A   zero   value   disables   this
                           functionality; if enabled, the recommended value for BWA is 50. [0]

                 -d, --max-depth INT
                           At  a  position,  read  maximally  INT reads per input file. Note that
                           samtools has a minimum value of 8000/n where n is the number of  input
                           files given to mpileup.  This means the default is highly likely to be
                           increased.  Once  above  the  cross-sample  minimum  of  8000  the  -d
                           parameter will have an effect. [250]

                 -E, --redo-BAQ
                           Recalculate BAQ on the fly, ignore existing BQ tags

                 -f, --fasta-ref FILE
                           The  faidx-indexed reference file in the FASTA format. The file can be
                           optionally compressed by bgzip.  [null]

                 -G, --exclude-RG FILE
                           Exclude reads from readgroups listed in FILE (one @RG-ID per line)

                 -l, --positions FILE
                           BED or position list file containing a list of regions or sites  where
                           pileup  or  BCF  should  be generated. Position list files contain two
                           columns (chromosome and position) and  start  counting  from  1.   BED
                           files  contain at least 3 columns (chromosome, start and end position)
                           and are 0-based half-open.
                           While it is possible to mix both position-list and BED coordinates  in
                           the  same  file,  this  is  strongly  ill advised due to the differing
                           coordinate systems. [null]

                 -q, -min-MQ INT
                           Minimum mapping quality for an alignment to be used [0]

                 -Q, --min-BQ INT
                           Minimum base quality for a base to be considered [13]

                 -r, --region STR
                           Only generate pileup in region. Requires the BAM files to be  indexed.
                           If  used in conjunction with -l then considers the intersection of the
                           two requests.  STR [all sites]

                 -R, --ignore-RG
                           Ignore RG tags. Treat all reads in one BAM as one sample.

                 --rf, --incl-flags STR|INT
                           Required flags: skip reads with mask bits unset [null]

                 --ff, --excl-flags STR|INT
                           Filter    flags:     skip     reads     with     mask     bits     set
                           [UNMAP,SECONDARY,QCFAIL,DUP]

                 -x, --ignore-overlaps
                           Disable read-pair overlap detection.

                 Output Options:

                 -o, --output FILE
                           Write  pileup  or  VCF/BCF  output to FILE, rather than the default of
                           standard output.

                           (The same short option is used for both --open-prob and --output.   If
                           -o's argument contains any non-digit characters other than a leading +
                           or - sign, it  is  interpreted  as  --output.   Usually  the  filename
                           extension  will take care of this, but to write to an entirely numeric
                           filename use -o ./123 or --output 123.)

                 -g, --BCF Compute genotype likelihoods and output them in the binary call format
                           (BCF).   As  of v1.0, this is BCF2 which is incompatible with the BCF1
                           format produced by previous (0.1.x) versions of samtools.

                 -v, --VCF Compute genotype likelihoods and  output  them  in  the  variant  call
                           format (VCF).  Output is bgzip-compressed VCF unless -u option is set.

                 Output Options for mpileup format (without -g or -v):

                 -O, --output-BP
                           Output base positions on reads.

                 -s, --output-MQ
                           Output mapping quality.

                 --output-QNAME
                           Output an extra column containing comma-separated read names.

                 -a        Output all positions, including those with zero depth.

                 -a -a, -aa
                           Output absolutely all positions, including unused reference sequences.
                           Note that when used in conjunction with a BED file the -a  option  may
                           sometimes  operate  as  if -aa was specified if the reference sequence
                           has coverage outside of the region specified in the BED file.

                 Output Options for VCF/BCF format (with -g or -v):

                 -D        Output per-sample read depth [DEPRECATED - use -t DP instead]

                 -S        Output per-sample Phred-scaled strand bias P-value [DEPRECATED  -  use
                           -t SP instead]

                 -t, --output-tags LIST
                           Comma-separated  list  of  FORMAT  and  INFO  tags  to  output  (case-
                           insensitive): AD  (Allelic  depth,  FORMAT),  INFO/AD  (Total  allelic
                           depth,  INFO),  ADF  (Allelic  depths  on the forward strand, FORMAT),
                           INFO/ADF (Total allelic depths  on  the  forward  strand,  INFO),  ADR
                           (Allelic  depths  on  the  reverse  strand,  FORMAT),  INFO/ADR (Total
                           allelic depths on the reverse  strand,  INFO),  DP  (Number  of  high-
                           quality bases, FORMAT), DV (Deprecated in favor of AD; Number of high-
                           quality non-reference bases, FORMAT), DPR (Deprecated in favor of  AD;
                           Number  of  high-quality  bases  for  each  observed  allele, FORMAT),
                           INFO/DPR (Number of  high-quality  bases  for  each  observed  allele,
                           INFO), DP4 (Deprecated in favor of ADF and ADR; Number of high-quality
                           ref-forward, ref-reverse, alt-forward and alt-reverse bases,  FORMAT),
                           SP (Phred-scaled strand bias P-value, FORMAT) [null]

                 -u, --uncompressed
                           Generate uncompressed VCF/BCF output, which is preferred for piping.

                 -V        Output  per-sample  number of non-reference reads [DEPRECATED - use -t
                           DV instead]

                 Options for SNP/INDEL Genotype Likelihood Computation (for -g or -v):

                 -e, --ext-prob INT
                           Phred-scaled gap extension sequencing error probability. Reducing  INT
                           leads to longer indels. [20]

                 -F, --gap-frac FLOAT
                           Minimum fraction of gapped reads [0.002]

                 -h, --tandem-qual INT
                           Coefficient   for   modeling   homopolymer  errors.  Given  an  l-long
                           homopolymer run, the sequencing error of an indel of size s is modeled
                           as INT*s/l.  [100]

                 -I, --skip-indels
                           Do not perform INDEL calling

                 -L, --max-idepth INT
                           Skip  INDEL  calling if the average per-input-file depth is above INT.
                           [250]

                 -m, --min-ireads INT
                           Minimum number gapped reads for indel candidates INT.  [1]

                 -o, --open-prob INT
                           Phred-scaled gap open sequencing error probability. Reducing INT leads
                           to more indel calls. [40]

                           (The  same  short  option  is  used for both --open-prob and --output.
                           When -o's argument contains only an optional + or - sign  followed  by
                           the digits 0 to 9, it is interpreted as --open-prob.)

                 -p, --per-sample-mF
                           Apply  -m  and  -F  thresholds  per  sample to increase sensitivity of
                           calling.  By default both options are applied to reads pooled from all
                           samples.

                 -P, --platforms STR
                           Comma-delimited  list  of  platforms (determined by @RG-PL) from which
                           indel candidates are obtained. It  is  recommended  to  collect  indel
                           candidates from sequencing technologies that have low indel error rate
                           such as ILLUMINA. [all]

       flags     samtools flags INT|STR[,...]

                 Convert between textual and numeric flag representation.

                 FLAGS:

                   0x1   PAIRED          paired-end (or multiple-segment) sequencing technology
                   0x2   PROPER_PAIR     each segment properly aligned according to the aligner
                   0x4   UNMAP           segment unmapped
                   0x8   MUNMAP          next segment in the template unmapped
                  0x10   REVERSE         SEQ is reverse complemented
                  0x20   MREVERSE        SEQ of the next segment in the template is reverse complemented
                  0x40   READ1           the first segment in the template
                  0x80   READ2           the last segment in the template
                 0x100   SECONDARY       secondary alignment
                 0x200   QCFAIL          not passing quality controls
                 0x400   DUP             PCR or optical duplicate
                 0x800   SUPPLEMENTARY   supplementary alignment

       fastq/a   samtools fastq [options] in.bam
                 samtools fasta [options] in.bam

                 Converts a BAM or CRAM into either  FASTQ  or  FASTA  format  depending  on  the
                 command  invoked.  The  FASTQ  files  will  be  automatically  compressed if the
                 filenames have a .gz or .bgzf extention.

                 OPTIONS:

                 -n      By default, either '/1' or '/2' is added to the end of read names  where
                         the  corresponding  BAM_READ1 or BAM_READ2 flag is set.  Using -n causes
                         read names to be left as they are.

                 -N      Always add either '/1' or '/2' to the end of read names  even  when  put
                         into different files.

                 -O      Use quality values from OQ tags in preference to standard quality string
                         if available.

                 -s FILE Write singleton reads in FASTQ format  to  FILE  instead  of  outputting
                         them.

                 -t      Copy RG, BC and QT tags to the FASTQ header line, if they exist.

                 -T TAGLIST
                         Specify a comma-separated list of tags to copy to the FASTQ header line,
                         if they exist.

                 -1 FILE Write reads with the BAM_READ1 flag set to FILE  instead  of  outputting
                         them.

                 -2 FILE Write  reads  with  the BAM_READ2 flag set to FILE instead of outputting
                         them.

                 -0 FILE Write reads with both or neither of the BAM_READ1  and  BAM_READ2  flags
                         set to FILE instead of outputting them.

                 -f INT  Only  output  alignments  with  all  bits set in INT present in the FLAG
                         field.  INT can be  specified  in  hex  by  beginning  with  `0x'  (i.e.
                         /^0x[0-9A-F]+/) or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].

                 -F INT  Do  not  output  alignments with any bits set in INT present in the FLAG
                         field.  INT can be  specified  in  hex  by  beginning  with  `0x'  (i.e.
                         /^0x[0-9A-F]+/) or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].

                 -G INT  Only  EXCLUDE  reads with all of the bits set in INT present in the FLAG
                         field.  INT can be  specified  in  hex  by  beginning  with  `0x'  (i.e.
                         /^0x[0-9A-F]+/) or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].

                 -i      add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)

                 -c [0..9]
                         set compression level when writing gz or bgzf fastq files.

                 --i1 FILE
                         write first index reads to FILE

                 --i2 FILE
                         write second index reads to FILE

                 --barcode-tag TAG
                         aux tag to find index reads in [default: BC]

                 --quality-tag TAG
                         aux tag to find index quality in [default: QT]

                 --index-format STR
                         string  to  describe  how  to  parse  the  barcode and quality tags. For
                         example:

                         i14i8   the first 14 characters are index 1, the next 8  characters  are
                                 index 2

                         n8i14   ignore  the  first  8 characters, and use the next 14 characters
                                 for index 1

                                 If the tag contains a separator, then the numeric  part  can  be
                                 replaced  with  '*'  to mean 'read until the separator or end of
                                 tag', for example:

                         n*i*    ignore the left part of the tag until the  separator,  then  use
                                 the second part

       collate   samtools collate [options] in.sam|in.bam|in.cram [out.prefix]

                 Shuffles  and  groups  reads together by their names.  A faster alternative to a
                 full query name sort, collate ensures that reads of the same  name  are  grouped
                 together  in  contiguous groups, but doesn't make any guarantees about the order
                 of read names between groups.

                 The output from this command should be suitable for any operation that  requires
                 all reads from the same template to be grouped together.

                 Options:

                 -O      Output to stdout rather than to files starting with out.prefix

                 -u      Write uncompressed BAM output

                 -l INT  Compression level.  [1]

                 -n INT  Number of temporary files to use.  [64]

       reheader  samtools reheader [-iP] in.header.sam in.bam

                 Replace  the header in in.bam with the header in in.header.sam.  This command is
                 much faster than replacing the header with a BAM→SAM→BAM conversion.

                 By default this command  outputs  the  BAM  or  CRAM  file  to  standard  output
                 (stdout),  but  for  CRAM  format files it has the option to perform an in-place
                 edit, both reading and writing to  the  same  file.   No  validity  checking  is
                 performed  on  the header, nor that it is suitable to use with the sequence data
                 itself.

                 OPTIONS:

                 -P, --no-PG
                         Do not generate an @PG header line.

                 -i, --in-place
                         Perform the header edit in-place, if possible.  This only works on  CRAM
                         files and only if there is sufficient room to store the new header.  The
                         amount of space available will differ for each CRAM file.

       cat       samtools cat [-b list] [-h header.sam] [-o out.bam] <in1.bam> <in2.bam> [ ... ]

                 Concatenate BAMs or CRAMs. Although this works on either BAM or CRAM, all  input
                 files  must  be  the  same format as each other. The sequence dictionary of each
                 input file must be identical, although this command does not  check  this.  This
                 command uses a similar trick to reheader which enables fast BAM concatenation.

                 OPTIONS:

                 -b FOFN Read  the  list  of  input  BAM  or  CRAM  files  from  FOFN.  These are
                         concatenated prior to any files specified on the command line.  Multiple
                         -b  FOFN  options  may  be  specified  to  concatenate multiple lists of
                         BAM/CRAM files.

                 -h FILE Uses the SAM header from FILE.  By default the header is taken from  the
                         first file to be concatenated.

                 -o FILE Write  the  concatenated  output  to  FILE.   By default this is sent to
                         stdout.

       rmdup     samtools rmdup [-sS] <input.srt.bam> <out.bam>

                 This command is obsolete. Use markdup instead.

                 Remove potential PCR duplicates: if multiple read pairs have identical  external
                 coordinates,  only retain the pair with highest mapping quality.  In the paired-
                 end mode, this command ONLY works with FR  orientation  and  requires  ISIZE  is
                 correctly  set.  It  does  not  work for unpaired reads (e.g. two ends mapped to
                 different chromosomes or orphan reads).

                 OPTIONS:

                 -s      Remove duplicates for single-end reads. By default,  the  command  works
                         for paired-end reads only.

                 -S      Treat paired-end reads and single-end reads.

       addreplacerg
                 samtools  addreplacerg [-r rg line | -R rg ID] [-m mode] [-l level] [-o out.bam]
                 <input.bam>

                 Adds or replaces read group tags in a file.

                 OPTIONS:

                 -r STRING
                         Allows you to specify a read group line to  append  to  the  header  and
                         applies  it  to  the  reads  specified  by the -m option. If repeated it
                         automatically adds in tabs between invocations.

                 -R STRING
                         Allows you to specify the read group ID of  an  existing  @RG  line  and
                         applies it to the reads specified.

                 -m MODE If  you choose orphan_only then existing RG tags are not overwritten, if
                         you choose overwrite_all, existing RG tags are overwritten. The  default
                         is overwrite_all.

                 -o STRING
                         Write the final output to STRING. The default is to write to stdout.

                         By  default,  samtools  tries  to  select  a  format based on the output
                         filename extension; if output is to standard output or no format can  be
                         deduced, bam is selected.

       calmd     samtools calmd [-Eeubr] [-C capQcoef] <aln.bam> <ref.fasta>

                 Generate  the MD tag. If the MD tag is already present, this command will give a
                 warning if the MD tag generated is different from the existing tag.  Output  SAM
                 by default.

                 Calmd  can also read and write CRAM files although in most cases it is pointless
                 as CRAM recalculates MD and NM tags on the fly.  The one exception to this  case
                 is where both input and output CRAM files have been / are being created with the
                 no_ref option.

                 OPTIONS:

                 -A      When used jointly with -r  this  option  overwrites  the  original  base
                         quality.

                 -e      Convert a the read base to = if it is identical to the aligned reference
                         base. Indel caller does not support the = bases at the moment.

                 -u      Output uncompressed BAM

                 -b      Output compressed BAM

                 -C INT  Coefficient to cap mapping quality  of  poorly  mapped  reads.  See  the
                         pileup command for details. [0]

                 -r      Compute the BQ tag (without -A) or cap base quality by BAQ (with -A).

                 -E      Extended   BAQ   calculation.   This   option   trades  specificity  for
                         sensitivity, though the effect is minor.

       targetcut samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2  em2]  [-f
                 ref] <in.bam>

                 This  command  identifies  target  regions  by  examining the continuity of read
                 depth, computes haploid consensus sequences of targets and outputs  a  SAM  with
                 each  sequence  corresponding to a target. When option -f is in use, BAQ will be
                 applied. This command is only designed for cutting  fosmid  clones  from  fosmid
                 pool sequencing [Ref. Kitzman et al. (2010)].

       phase     samtools phase [-AF] [-k len] [-b prefix] [-q minLOD] [-Q minBaseQ] <in.bam>

                 Call and phase heterozygous SNPs.

                 OPTIONS:

                 -A      Drop reads with ambiguous phase.

                 -b STR  Prefix  of BAM output. When this option is in use, phase-0 reads will be
                         saved in file STR.0.bam and phase-1 reads in STR.1.bam.   Phase  unknown
                         reads will be randomly allocated to one of the two files. Chimeric reads
                         with switch errors will be saved in STR.chimeric.bam.  [null]

                 -F      Do not attempt to fix chimeric reads.

                 -k INT  Maximum length for local phasing. [13]

                 -q INT  Minimum Phred-scaled LOD to call a heterozygote. [40]

                 -Q INT  Minimum base quality to be used in het calling. [13]

       depad     samtools depad [-SsCu1] [-T ref.fa] [-o output] <in.bam>

                 Converts a BAM aligned against a padded reference to a BAM aligned  against  the
                 depadded  reference.  The padded reference may contain verbatim "*" bases in it,
                 but "*" bases are also counted in the reference numbering.  This  means  that  a
                 sequence  base-call  aligned against a reference "*" is considered to be a cigar
                 match ("M" or "X") operator (if the base-call is "A", "C", "G" or  "T").   After
                 depadding  the  reference  "*" bases are deleted and such aligned sequence base-
                 calls become insertions.  Similarly  transformations  apply  for  deletions  and
                 padding cigar operations.

                 OPTIONS:

                 -S     Ignored  for  compatibility  with previous samtools versions.  Previously
                        this option was required if input was in SAM format, but now the  correct
                        format is automatically detected by examining the first few characters of
                        input.

                 -s     Output in SAM format.  The default is BAM.

                 -C     Output in CRAM format.  The default is BAM.

                 -u     Do not compress the output.  Applies to either BAM or CRAM output format.

                 -1     Enable fastest compression level.  Only works for BAM or CRAM output.

                 -T FILE
                        Provides the padded reference file.  Note that without this the @SQ  line
                        lengths  will  be  incorrect,  so  for most use cases this option will be
                        considered as mandatory.

                 -o FILE
                        Specifies the output filename.  By default output is sent to stdout.

       markdup   samtools markdup [-l length] [-r] [-s] [-T] [-S] in.algsort.bam out.bam

                 Mark duplicate alignments from a  coordinate  sorted  file  that  has  been  run
                 through  fixmate  with the -m option.  This program relies on the MC and ms tags
                 that fixmate provides.

                 -l INT     Expected maximum read length of INT bases.  [300]

                 -r         Remove duplicate reads.

                 -s         Print some basic stats.

                 -T PREFIX  Write temporary files to PREFIX.samtools.nnnn.mmmm.tmp

                 -S         Mark supplementary reads of duplicates as duplicates.

           EXAMPLE

           # The first sort can be omitted if the file is already name ordered
           samtools sort -n -o namesort.bam example.bam

           # Add ms and MC tags for markdup to use later
           samtools fixmate -m namesort.bam fixmate.bam

           # Markdup needs position order
           samtools sort -o positionsort.bam fixmate.bam

           # Finally mark duplicates
           samtools markdup positionsort.bam markdup.bam

       help, --help
                 Display a brief usage message listing the samtools commands available.   If  the
                 name  of  a  command is also given, e.g., samtools help view, the detailed usage
                 message for that particular command is displayed.

       --version Display the version numbers and  copyright  information  for  samtools  and  the
                 important libraries used by samtools.

       --version-only
                 Display the full samtools version number in a machine-readable format.

GLOBAL OPTIONS

       Several  long-options  are  shared  between  multiple  samtools  subcommands: --input-fmt,
       --input-fmt-options,  --output-fmt,  --output-fmt-options,  and  --reference.   The  input
       format  is typically auto-detected so specifying the format is usually unnecessary and the
       option is included for completeness.  Note that not  all  subcommands  have  all  options.
       Consult the subcommand help for more details.

       Format  strings  recognised  are "sam", "bam" and "cram".  They may be followed by a comma
       separated list of options as key or key=value. See below for examples.

       The fmt-options arguments accept either a single option or option=value.  Note  that  some
       options  only  work  on  some file formats and only on read or write streams.  If value is
       unspecified for a boolean option, the value is assumed to be 1.  The valid options are  as
       follows.

       nthreads=INT
           Specifies  the number of threads to use during encoding and/or decoding.  For BAM this
           will be encoding only.  In CRAM the threads are dynamically shared between encoder and
           decoder.

       reference=fasta_file
           Specifies  a FASTA reference file for use in CRAM encoding or decoding.  It usually is
           not required for decoding except in the situation of the MD5 not being obtainable  via
           the REF_PATH or REF_CACHE environment variables.

       decode_md=0|1
           CRAM  input  only;  defaults to 1 (on).  CRAM does not typically store MD and NM tags,
           preferring to generate them on the fly.  This option controls this behaviour.

       ignore_md5=0|1
           CRAM input only; defaults to 0 (off).   When  enabled,  md5  checksum  errors  on  the
           reference  sequence  and  block  checksum errors within CRAM are ignored.  Use of this
           option is strongly discouraged.

       required_fields=bit-field
           CRAM input only; specifies which SAM columns need to be  populated.   By  default  all
           fields  are  used.   Limiting  the  decode  to  specific  columns can have significant
           performance gains.  The bit-field is a numerical value constructed from the  following
           table.

                                                0x1   SAM_QNAME
                                                0x2   SAM_FLAG
                                                0x4   SAM_RNAME
                                                0x8   SAM_POS
                                               0x10   SAM_MAPQ
                                               0x20   SAM_CIGAR
                                               0x40   SAM_RNEXT
                                               0x80   SAM_PNEXT
                                              0x100   SAM_TLEN
                                              0x200   SAM_SEQ
                                              0x400   SAM_QUAL
                                              0x800   SAM_AUX
                                             0x1000   SAM_RGAUX

       name_prefix=string
           CRAM  input only; defaults to output filename.  Any sequences with auto-generated read
           names will use string as the name prefix.

       multi_seq_per_slice=0|1
           CRAM output only; defaults to 0 (off).  By default CRAM generates  one  container  per
           reference  sequence, except in the case of many small references (such as a fragmented
           assembly).

       version=major.minor
           CRAM output only.  Specifies the CRAM version number.  Acceptable values are "2.1" and
           "3.0".

       seqs_per_slice=INT
           CRAM output only; defaults to 10000.

       slices_per_container=INT
           CRAM  output  only; defaults to 1.  The effect of having multiple slices per container
           is to share the compression header block between multiple slices.  This is unlikely to
           have  any  significant  impact  unless  the  number of sequences per slice is reduced.
           (Together these two options control the granularity of random access.)

       embed_ref=0|1
           CRAM output only; defaults to 0  (off).   If  1,  this  will  store  portions  of  the
           reference  sequence  in  each  slice,  permitting  decode  without having requiring an
           external copy of the reference sequence.

       no_ref=0|1
           CRAM output only; defaults to 0 (off).  If 1, sequences will be stored  verbatim  with
           no reference encoding.  This can be useful if no reference is available for the file.

       use_bzip2=0|1
           CRAM  output  only;  defaults  to  0  (off).   Permits  use  of  bzip2  in  CRAM block
           compression.

       use_lzma=0|1
           CRAM output only; defaults to 0 (off).  Permits use of lzma in CRAM block compression.

       lossy_names=0|1
           CRAM output only; defaults to 0 (off).  If 1, templates with all  members  within  the
           same  CRAM  slice will have their read names removed.  New names will be automatically
           generated during decoding.  Also see the name_prefix option.

       For example:

           samtools view --input-fmt-option decode_md=0
               --output-fmt cram,version=3.0 --output-fmt-option embed_ref
               --output-fmt-option seqs_per_slice=2000 -o foo.cram foo.bam

REFERENCE SEQUENCES

       The CRAM format requires use of a reference sequence for both reading and writing.

       When reading a CRAM the @SQ headers are interrogated to identify  the  reference  sequence
       MD5sum  (M5:  tag) and the local reference sequence filename (UR: tag).  Note that http://
       and ftp:// based URLs in the UR: field are not used, but local fasta  filenames  (with  or
       without file://) can be used.

       To  create  a  CRAM the @SQ headers will also be read to identify the reference sequences,
       but M5: and UR: tags may not be present. In this case the -T and -t  options  of  samtools
       view  may  be  used to specify the fasta or fasta.fai filenames respectively (provided the
       .fasta.fai file is also backed up by a .fasta file).

       The search order to obtain a reference is:

       1. Use any local file specified by the command line options (eg -T).

       2. Look for MD5 via REF_CACHE environment variable.

       3. Look for MD5 in each element of the REF_PATH environment variable.

       4. Look for a local file listed in the UR: header tag.

ENVIRONMENT VARIABLES

       HTS_PATH
              A colon-separated list of directories in which to search for  HTSlib  plugins.   If
              $HTS_PATH starts or ends with a colon or contains a double colon (::), the built-in
              list of directories is searched at that point in the search.

              If no HTS_PATH variable is defined, the built-in list of directories specified when
              HTSlib  was  built  is used, which typically includes /usr/local/libexec/htslib and
              similar directories.

       REF_PATH
              A colon separated (semi-colon on Windows) list of locations in which  to  look  for
              sequences identified by their MD5sums.  This can be either a list of directories or
              URLs. Note that if a URL is included then the colon in http:// and ftp://  and  the
              optional  port  number  will  be  treated  as  part of the URL and not a PATH field
              separator.  For URLs, the text %s will be replaced by the MD5sum being read.

              If    no    REF_PATH    has    been    specified     it     will     default     to
              http://www.ebi.ac.uk/ena/cram/md5/%s and if REF_CACHE is also unset, it will be set
              to $XDG_CACHE_HOME/hts-ref/%2s/%2s/%s.  If $XDG_CACHE_HOME is  unset,  $HOME/.cache
              (or  a local system temporary directory if no home directory is found) will be used
              similarly.

       REF_CACHE
              This can be defined to a single directory housing  a  local  cache  of  references.
              Upon  downloading  a  reference  it  will  be  stored in the location pointed to by
              REF_CACHE.  When reading a reference it will be looked for in this directory before
              searching  REF_PATH.   To  avoid  many  files being stored in the same directory, a
              pathname may be constructed using %nums and %s notation, consuming  num  characters
              of  the  MD5sum.   For  example  /local/ref_cache/%2s/%2s/%s  will  create 2 nested
              subdirectories with the filenames in  the  deepest  directory  being  the  last  28
              characters of the md5sum.

              The  REF_CACHE  directory  will  be  searched for before attempting to load via the
              REF_PATH search list.  If no REF_PATH is defined, both REF_PATH and REF_CACHE  will
              be automatically set (see above), but if REF_PATH is defined and REF_CACHE not then
              no local cache is used.

              To aid population of the REF_CACHE directory a script misc/seq_cache_populate.pl is
              provided  in  the  Samtools distribution. This takes a fasta file or a directory of
              fasta files and generates the MD5sum named files.

EXAMPLES

       o Import SAM to BAM when @SQ lines are present in the header:

           samtools view -bS aln.sam > aln.bam

         If @SQ lines are absent:

           samtools faidx ref.fa
           samtools view -bt ref.fa.fai aln.sam > aln.bam

         where ref.fa.fai is generated automatically by the faidx command.

       o Convert a BAM file to a CRAM file using a local reference sequence.

           samtools view -C -T ref.fa aln.bam > aln.cram

       o Attach the RG tag while merging sorted alignments:

           perl -e 'print "@RG\tID:ga\tSM:hs\tLB:ga\tPL:Illumina\n@RG\tID:454\tSM:hs\tLB:454\tPL:454\n"' > rg.txt
           samtools merge -rh rg.txt merged.bam ga.bam 454.bam

         The value in a RG tag is determined by the file name the read is coming  from.  In  this
         example, in the merged.bam, reads from ga.bam will be attached RG:Z:ga, while reads from
         454.bam will be attached RG:Z:454.

       o Call SNPs and short INDELs:

           samtools mpileup -uf ref.fa aln.bam | bcftools call -mv > var.raw.vcf
           bcftools filter -s LowQual -e '%QUAL<20 || DP>100' var.raw.vcf  > var.flt.vcf

         The bcftools filter command marks low quality  sites  and  sites  with  the  read  depth
         exceeding  a  limit,  which  should  be  adjusted  to about twice the average read depth
         (bigger read depths usually indicate problematic regions which are  often  enriched  for
         artefacts).  One may consider to add -C50 to mpileup if mapping quality is overestimated
         for reads containing excessive mismatches. Applying this option usually helps  BWA-short
         but may not other mappers.

         Individuals  are identified from the SM tags in the @RG header lines. Individuals can be
         pooled in one alignment file; one individual can also be separated into multiple  files.
         The  -P option specifies that indel candidates should be collected only from read groups
         with the @RG-PL tag set to ILLUMINA.  Collecting indel candidates from  reads  sequenced
         by an indel-prone technology may affect the performance of indel calling.

       o Generate the consensus sequence for one diploid individual:

           samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq

       o Phase one individual:

           samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out

         The calmd command is used to reduce false heterozygotes around INDELs.

       o Dump BAQ applied alignment for other SNP callers:

           samtools calmd -bAr aln.bam > aln.baq.bam

         It  adds  and corrects the NM and MD tags at the same time. The calmd command also comes
         with the -C option, the same as the one in pileup and mpileup.  Apply if it helps.

LIMITATIONS

       o Unaligned words used in bam_import.c, bam_endian.h, bam.c and bam_aux.c.

       o Samtools paired-end rmdup does not work for unpaired reads (e.g. orphan  reads  or  ends
         mapped   to   different  chromosomes).  If  this  is  a  concern,  please  use  Picard's
         MarkDuplicates which correctly handles these cases, although a little slower.

AUTHOR

       Heng Li from the Sanger Institute wrote the original C version of samtools.  Bob Handsaker
       from  the  Broad  Institute  implemented the BGZF library.  James Bonfield from the Sanger
       Institute developed the CRAM implementation.  John Marshall and Petr Danecek contribute to
       the  source  code and various people from the 1000 Genomes Project have contributed to the
       SAM format specification.

SEE ALSO

       bcftools(1), sam(5), tabix(1)

       Samtools website: <http://www.htslib.org/>
       File format specification of SAM/BAM,CRAM,VCF/BCF: <http://samtools.github.io/hts-specs>
       Samtools latest source: <https://github.com/samtools/samtools>
       HTSlib latest source: <https://github.com/samtools/htslib>
       Bcftools website: <http://samtools.github.io/bcftools>