Provided by: samtools_1.17-1_amd64 bug

NAME

       samtools-mpileup - produces "pileup" textual format from an alignment

SYNOPSIS

       samtools  mpileup  [-EB]  [-C  capQcoef]  [-r  reg] [-f in.fa] [-l list] [-Q minBaseQ] [-q
       minMapQ] in.bam [in2.bam [...]]

DESCRIPTION

       Generate text pileup output for one or multiple BAM files.  Each  input  file  produces  a
       separate group of pileup columns in the output.

       Note  that  there  are  two orthogonal ways to specify locations in the input file; via -r
       region and -l file.  The former uses (and requires) an index to do random access while the
       latter streams through the file contents filtering out the specified regions, requiring no
       index.  The two may be used in conjunction.  For example a BED file  containing  locations
       of  genes  in  chromosome 20 could be specified using -r 20 -l chr20.bed, meaning that the
       index is used to find chromosome 20 and then it is filtered for the regions listed in  the
       bed file.

   Pileup Format
       Pileup  format  consists of TAB-separated lines, with each line representing the pileup of
       reads at a single genomic position.

       Several columns contain numeric quality values encoded  as  individual  ASCII  characters.
       Each  character  can  range  from  “!” to “~” and is decoded by taking its ASCII value and
       subtracting 33; e.g., “A” encodes the numeric value 32.

       The first three columns give the position and reference:

       ○ Chromosome name.

       ○ 1-based position on the chromosome.

       ○ Reference base at this position (this will be “N” on all lines if -f/--fasta-ref has not
         been used).

       The  remaining  columns  show  the  pileup  data, and are repeated for each input BAM file
       specified:

       ○ Number of reads covering this position.

       ○ Read bases.  This encodes information on matches, mismatches,  indels,  strand,  mapping
         quality, and starts and ends of reads.

         For each read covering the position, this column contains:

         • If  this  is  the  first position covered by the read, a “^” character followed by the
           alignment's mapping quality encoded as an ASCII character.

         • A single character indicating the read base and the strand to which the read has  been
           mapped:

           Forward   Reverse                    Meaning
           ───────────────────────────────────────────────────────────────
            . dot    , comma   Base matches the reference base
            ACGTN     acgtn    Base is a mismatch to the reference base
              >         <      Reference skip (due to CIGAR “N”)
              *        */#     Deletion of the reference base (CIGAR “D”)

           Deleted  bases are shown as “*” on both strands unless --reverse-del is used, in which
           case they are shown as “#” on the reverse strand.

         • If   there   is   an   insertion    after    this    read    base,    text    matching
           “\+[0-9]+[ACGTNacgtn*#]+”: a “+” character followed by an integer giving the length of
           the insertion and then the inserted sequence.  Pads are shown as “*” unless --reverse-
           del is used, in which case pads on the reverse strand will be shown as “#”.

         • If  there  is a deletion after this read base, text matching “-[0-9]+[ACGTNacgtn]+”: a
           “-”  character  followed  by  the  deleted  reference  bases  represented   similarly.
           (Subsequent pileup lines will contain “*” for this read indicating the deleted bases.)

         • If this is the last position covered by the read, a “$” character.

       ○ Base qualities, encoded as ASCII characters.

       ○ Alignment  mapping  qualities,  encoded  as ASCII characters.  (Column only present when
         -s/--output-MQ is used.)

       ○ Comma-separated 1-based positions within the alignments, in the orientation shown in the
         input  file.  E.g., 5 indicates that it is the fifth base of the corresponding read that
         is mapped to this genomic position.  (Column only present when -O/--output-BP is used.)

       ○ Additional comma-separated read field columns,  as  selected  via  --output-extra.   The
         fields  selected  appear  in  the  same  order  as in SAM: QNAME, FLAG, RNAME, POS, MAPQ
         (displayed numerically), RNEXT, PNEXT.

       ○ Comma-separated 1-based positions within the alignments, in 5' to 3' orientation.  E.g.,
         5  indicates  that  it  is  the  fifth base of the corresponding read as produced by the
         sequencing instrument, that is mapped to this genomic  position.  (Column  only  present
         when --output-BP-5 is used.)

       ○ Additional  read  tag  field columns, as selected via --output-extra.  These columns are
         formatted as determined by --output-sep and --output-empty (comma-separated by default),
         and appear in the same order as the tags are given in --output-extra.

         Any  output  column  that  would  be  empty,  such  as a tag which is not present or the
         filtered sequence depth is zero, is reported as "*".  This ensures a  consistent  number
         of columns across all reported positions.

OPTIONS

       -6, --illumina1.3+
                 Assume the quality is in the Illumina 1.3+ encoding.

       -A, --count-orphans
                 Do  not  skip anomalous read pairs in variant calling.  Anomalous read pairs are
                 those marked in the FLAG field as paired in sequencing but without the properly-
                 paired flag set.

       -b, --bam-list FILE
                 List of input BAM files, one file per line [null]

       -B, --no-BAQ
                 Disable base alignment quality (BAQ) computation.  See BAQ below.

       -C, --adjust-MQ INT
                 Coefficient  for  downgrading  mapping  quality  for  reads containing excessive
                 mismatches. Given a read with a phred-scaled probability q  of  being  generated
                 from   the  mapped  position,  the  new  mapping  quality  is  about  sqrt((INT-
                 q)/INT)*INT.  A  zero  value  disables  this  functionality;  if  enabled,   the
                 recommended value for BWA is 50. [0]

       -d, --max-depth INT
                 At  a  position,  read  maximally  INT  reads per input file. Setting this limit
                 reduces the amount of memory and time needed to process regions with  very  high
                 coverage.   Passing  zero for this option sets it to the highest possible value,
                 effectively removing the depth limit. [8000]

                 Note that up to release 1.8, samtools would enforce a  minimum  value  for  this
                 option.  This no longer happens and the limit is set exactly as specified.

       -E, --redo-BAQ
                 Recalculate BAQ on the fly, ignore existing BQ tags.  See BAQ below.

       -f, --fasta-ref FILE
                 The faidx-indexed reference file in the FASTA format. The file can be optionally
                 compressed by bgzip.  [null]

                 Supplying a reference file will enable base alignment  quality  calculation  for
                 all reads aligned to a reference in the file.  See BAQ below.

       -G, --exclude-RG FILE
                 Exclude reads from read groups listed in FILE (one @RG-ID per line)

       -l, --positions FILE
                 BED  or position list file containing a list of regions or sites where pileup or
                 BCF should be generated. Position list files contain two columns (chromosome and
                 position)  and  start  counting  from  1.   BED files contain at least 3 columns
                 (chromosome, start and end position) and are 0-based half-open.
                 While it is possible to mix both position-list and BED coordinates in  the  same
                 file,  this  is  strongly  ill  advised due to the differing coordinate systems.
                 [null]

       -q, --min-MQ INT
                 Minimum mapping quality for an alignment to be used [0]

       -Q, --min-BQ INT
                 Minimum base quality for a base to be considered. [13]

                 Note base-quality 0 is used as a filtering mechanism for overlap  removal  which
                 marks bases as having quality zero and lets the base quality filter remove them.
                 Hence using --min-BQ 0 will make the overlapping  bases  reappear,  albeit  with
                 quality zero.

       -r, --region STR
                 Only  generate  pileup in region. Requires the BAM files to be indexed.  If used
                 in conjunction with -l then considers the intersection of the two requests.  STR
                 [all sites]

       -R, --ignore-RG
                 Ignore RG tags. Treat all reads in one BAM as one sample.

       --rf, --incl-flags STR|INT
                 Required  flags:  only include reads with any of the mask bits set [null].  Note
                 this does not override the --excl-flags option.

       --ff, --excl-flags STR|INT
                 Filter   flags:   skip   reads   with    any    of    the    mask    bits    set
                 [UNMAP,SECONDARY,QCFAIL,DUP].   Note  this  does  not  override the --incl-flags
                 option.

       -x, --ignore-overlaps-removal, --disable-overlap-removal
                 Overlap detection and removal is enabled by default.  This option turns it off.

                 When enabled, where the ends of a read-pair overlap the overlapping region  will
                 have  one  base  selected  and the duplicate base nullified by setting its phred
                 score to zero.  It will then be discarded by the --min-BQ option unless this  is
                 zero.

                 The  quality values of the retained base within an overlap will be the summation
                 of the two bases if they agree, or 0.8 times the higher of the two bases if they
                 disagree, with the base nucleotide also being the higher confident call.

       -X        Include  customized  index file as a part of arguments. See EXAMPLES section for
                 sample of usage.

       Output Options:

       -o, --output FILE
                 Write pileup output to FILE, rather than the default of standard output.

       -O, --output-BP
                 Output base positions on reads in orientation listed in the SAM  file  (left  to
                 right).

       --output-BP-5
                 Output base positions on reads in their original 5' to 3' orientation.

       -s, --output-MQ
                 Output mapping qualities encoded as ASCII characters.

       --output-QNAME
                 Output  an  extra  column  containing comma-separated read names.  Equivalent to
                 --output-extra QNAME.

       --output-extra STR
                 Output extra columns containing comma-separated values of read  fields  or  read
                 tags. The names of the selected fields have to be provided as they are described
                 in the SAM Specification (pag. 6) and will be output by the mpileup  command  in
                 the  same order as in the document (i.e.  QNAME, FLAG, RNAME,...)  The names are
                 case sensitive. Currently, only the following fields are supported:

                 QNAME, FLAG, RNAME, POS, MAPQ, RNEXT, PNEXT

                 Anything that is not on this list is treated as a potential tag,  although  only
                 two  character  tags  are  accepted.  In  the  mpileup  output,  tag columns are
                 displayed in the order they were provided by  the  user  in  the  command  line.
                 Field  and  tag  names  have  to  be provided in a comma-separated string to the
                 mpileup command.  Tags with type B (byte array)  type  are  not  supported.   An
                 absent or unsupported tag will be listed as "*".  E.g.

                 samtools mpileup --output-extra FLAG,QNAME,RG,NM in.bam

                 will display four extra columns in the mpileup output, the first being a list of
                 comma-separated read names, followed by a list of flag values, a list of RG  tag
                 values and a list of NM tag values. Field values are always displayed before tag
                 values.

       --output-sep CHAR
                 Specify a different separator character for tag value lists, when  those  values
                 might contain one or more commas (,), which is the default list separator.  This
                 option only affects columns for two-letter tags like NM;  standard  fields  like
                 FLAG or QNAME will always be separated by commas.

       --output-empty CHAR
                 Specify  a  different 'no value' character for tag list entries corresponding to
                 reads that don't have a  tag  requested  with  the  --output-extra  option.  The
                 default is *.

                 This  option only applies to rows that have at least one read in the pileup, and
                 only to columns for two-letter tags.  Columns for  empty  rows  will  always  be
                 printed as *.

       -M, --output-mods
                 Adds base modification markup into the sequence column.  This uses the Mm and Ml
                 auxiliary tags (or their uppercase  equivalents).   Any  base  in  the  sequence
                 output  may  be  followed  by  a  series of strand code quality strings enclosed
                 within square brackets where strand is "+" or "-", code is  a  single  character
                 (such  as  "m"  or  "h")  or  a  ChEBI numeric in parentheses, and quality is an
                 optional numeric quality value.  For example a "C" base with  possible  5mC  and
                 5hmC base modification may be reported as "C[+m179+h40]".

                 Quality  values  are  from  0  to  255 inclusive, representing a linear scale of
                 probability 0.0 to 1.0 in 1/256ths increments.  If quality values are absent (no
                 Ml tag) these are omitted, giving an example string of "C[+m+h]".

                 Note  the base modifications may be identified on the reverse strand, either due
                 to the native ability for this detection by the sequencing instrument or by  the
                 sequence subsequently being reverse complemented.  This can lead to modification
                 codes, such as "m" meaning 5mC, being shown for their complementary bases,  such
                 as "G[-m50]".

                 When  --output-mods is selected base modifications can appear on any base in the
                 sequence output, including during insertions.  This may make parsing the  string
                 more  complex,  so also see the --no-output-ins-mods and --no-output-ins options
                 to simplify this process.

       --no-output-ins
                 Do not output the inserted bases  in  the  sequence  column.   Usually  this  is
                 reported  as  "+length  sequence",  but  with  this  option  it  becomes  simply
                 "+length".  For example an insertion of AGT in  a  pileup  column  changes  from
                 "CCC+3AGTGCC" to "CCC+3GCC".

                 Specifying  this  option  twice also removes the "+length" portion, changing the
                 example above to "CCCGCC".

                 The  purpose  of  this  change  is  to  simplify  parsing  using  basic  regular
                 expressions,  which  traditionally  cannot  perform  counting operations.  It is
                 particularly beneficial when used  in  conjunction  with  --output-mods  as  the
                 syntax  of  the  inserted  sequence  is  adjusted  to  also report possible base
                 modifications, but see also --no-output-ins-mods as an alternative.

       --no-output-ins-mods
                 Outputs  the  inserted  bases  in  the  sequence,   but   excluding   any   base
                 modifications.  This only affects output when --output-mods is also used.

       --no-output-del
                 Do  not output deleted reference bases in the sequence column.  Normally this is
                 reported  as  "+length  sequence",  but  with  this  option  it  becomes  simply
                 "+length".   For  example  an  deletion  of 3 unknown bases (due to no reference
                 being specified) would normally be seen in a column as e.g.  "CCC-3NNNGCC",  but
                 will be reported as "CCC-3GCC" with this option.

                 Specifying  this  option  twice also removes the "-length" portion, changing the
                 example above to "CCCGCC".

                 The  purpose  of  this  change  is  to  simplify  parsing  using  basic  regular
                 expressions,  which  traditionally cannot perform counting operations.  See also
                 --no-output-ins.

       --no-output-ends
                 Removes the “^” (with mapping quality) and “$” markup from the sequence column.

       --reverse-del
                 Mark the deletions on the reverse strand with the character #,  instead  of  the
                 usual *.

       -a        Output all positions, including those with zero depth.

       -a -a, -aa
                 Output  absolutely  all  positions,  including unused reference sequences.  Note
                 that when used in conjunction with a  BED  file  the  -a  option  may  sometimes
                 operate  as  if -aa was specified if the reference sequence has coverage outside
                 of the region specified in the BED file.

       BAQ (Base Alignment Quality)

       BAQ is the Phred-scaled probability of a read base being misaligned.  It greatly helps  to
       reduce  false  SNPs  caused  by  misalignments.  BAQ is calculated using the probabilistic
       realignment method described in the paper  “Improving  SNP  discovery  by  base  alignment
       quality”,       Heng       Li,      Bioinformatics,      Volume      27,      Issue      8
       <https://doi.org/10.1093/bioinformatics/btr076>

       BAQ is turned on when a reference file is supplied using the -f option.   To  disable  it,
       use the -B option.

       It is possible to store precalculated BAQ values in a SAM BQ:Z tag.  Samtools mpileup will
       use the precalculated values if it finds them.  The -E option  can  be  used  to  make  it
       ignore the contents of the BQ:Z tag and force it to recalculate the BAQ scores by making a
       new alignment.

AUTHOR

       Written by Heng Li from the Sanger Institute.

SEE ALSO

       samtools(1), samtools-depth(1), samtools-sort(1), bcftools(1)

       Samtools website: <http://www.htslib.org/>