Provided by: samtools_1.10-3_amd64 bug

NAME

       samtools mpileup - produces "pileup" textual format from an alignment

SYNOPSIS

       samtools  mpileup  [-EB]  [-C  capQcoef]  [-r  reg] [-f in.fa] [-l list] [-Q minBaseQ] [-q
       minMapQ] in.bam [in2.bam [...]]

DESCRIPTION

       Generate text pileup output for one or multiple BAM files.  Each  input  file  produces  a
       separate group of pileup columns in the output.

       Samtools mpileup can still produce VCF and BCF output (with -g or -u), but this feature is
       deprecated and will be removed in a future release.  Please use bcftools mpileup for  this
       instead.  (Documentation on the deprecated options has been removed from this manual page,
       but older versions are available online at <http://www.htslib.org/doc/>.)

       Note that there are two orthogonal ways to specify locations in the  input  file;  via  -r
       region and -l file.  The former uses (and requires) an index to do random access while the
       latter streams through the file contents filtering out the specified regions, requiring no
       index.   The  two may be used in conjunction.  For example a BED file containing locations
       of genes in chromosome 20 could be specified using -r 20 -l chr20.bed,  meaning  that  the
       index  is used to find chromosome 20 and then it is filtered for the regions listed in the
       bed file.

   Pileup Format
       Pileup format consists of TAB-separated lines, with each line representing the  pileup  of
       reads at a single genomic position.

       Several  columns  contain  numeric  quality values encoded as individual ASCII characters.
       Each character can range from “!” to “~” and is decoded by  taking  its  ASCII  value  and
       subtracting 33; e.g., “A” encodes the numeric value 32.

       The first three columns give the position and reference:

       ○ Chromosome name.

       ○ 1-based position on the chromosome.

       ○ Reference base at this position (this will be “N” on all lines if -f/--fasta-ref has not
         been used).

       The remaining columns show the pileup data, and are  repeated  for  each  input  BAM  file
       specified:

       ○ Number of reads covering this position.

       ○ Read  bases.   This  encodes information on matches, mismatches, indels, strand, mapping
         quality, and starts and ends of reads.

         For each read covering the position, this column contains:

         • If this is the first position covered by the read, a “^”  character  followed  by  the
           alignment's mapping quality encoded as an ASCII character.

         • A  single character indicating the read base and the strand to which the read has been
           mapped:

           Forward   Reverse                    Meaning
           ───────────────────────────────────────────────────────────────

            . dot    , comma   Base matches the reference base
            ACGTN     acgtn    Base is a mismatch to the reference base
              >         <      Reference skip (due to CIGAR “N”)
              *        */#     Deletion of the reference base (CIGAR “D”)

           Deleted bases are shown as “*” on both strands unless --reverse-del is used, in  which
           case they are shown as “#” on the reverse strand.

         • If    there    is    an    insertion    after    this   read   base,   text   matching
           “\+[0-9]+[ACGTNacgtn*#]+”: a “+” character followed by an integer giving the length of
           the insertion and then the inserted sequence.  Pads are shown as “*” unless --reverse-
           del is used, in which case pads on the reverse strand will be shown as “#”.

         • If there is a deletion after this read base, text matching  “-[0-9]+[ACGTNacgtn]+”:  a
           “-”   character  followed  by  the  deleted  reference  bases  represented  similarly.
           (Subsequent pileup lines will contain “*” for this read indicating the deleted bases.)

         • If this is the last position covered by the read, a “$” character.

       ○ Base qualities, encoded as ASCII characters.

       ○ Alignment mapping qualities, encoded as ASCII characters.   (Column  only  present  when
         -s/--output-MQ is used.)

       ○ Comma-separated  1-based  positions  within the alignments, e.g., 5 indicates that it is
         the fifth base of the corresponding read  that  is  mapped  to  this  genomic  position.
         (Column only present when -O/--output-BP is used.)

       ○ Comma-separated read names.  (Column only present when --output-QNAME is used.)

       ○ Additional  columns  containing other specified read fields or tags, as selected via the
         --output-extra, --output-sep, and --output-empty options.

OPTIONS

       -6, --illumina1.3+
                 Assume the quality is in the Illumina 1.3+ encoding.

       -A, --count-orphans
                 Do not skip anomalous read pairs in variant calling.  Anomolous read  pairs  are
                 those marked in the FLAG field as paired in sequencing but without the properly-
                 paired flag set.

       -b, --bam-list FILE
                 List of input BAM files, one file per line [null]

       -B, --no-BAQ
                 Disable base alignment quality (BAQ) computation.  See BAQ below.

       -C, --adjust-MQ INT
                 Coefficient for downgrading  mapping  quality  for  reads  containing  excessive
                 mismatches.  Given  a  read with a phred-scaled probability q of being generated
                 from  the  mapped  position,  the  new  mapping  quality  is  about   sqrt((INT-
                 q)/INT)*INT.   A  zero  value  disables  this  functionality;  if  enabled,  the
                 recommended value for BWA is 50. [0]

       -d, --max-depth INT
                 At a position, read maximally INT reads  per  input  file.  Setting  this  limit
                 reduces  the  amount of memory and time needed to process regions with very high
                 coverage.  Passing zero for this option sets it to the highest  possible  value,
                 effectively removing the depth limit. [8000]

                 Note  that  up  to  release 1.8, samtools would enforce a minimum value for this
                 option.  This no longer happens and the limit is set exactly as specified.

       -E, --redo-BAQ
                 Recalculate BAQ on the fly, ignore existing BQ tags.  See BAQ below.

       -f, --fasta-ref FILE
                 The faidx-indexed reference file in the FASTA format. The file can be optionally
                 compressed by bgzip.  [null]

                 Supplying  a  reference  file will enable base alignment quality calculation for
                 all reads aligned to a reference in the file.  See BAQ below.

       -G, --exclude-RG FILE
                 Exclude reads from readgroups listed in FILE (one @RG-ID per line)

       -l, --positions FILE
                 BED or position list file containing a list of regions or sites where pileup  or
                 BCF should be generated. Position list files contain two columns (chromosome and
                 position) and start counting from 1.  BED  files  contain  at  least  3  columns
                 (chromosome, start and end position) and are 0-based half-open.
                 While  it  is possible to mix both position-list and BED coordinates in the same
                 file, this is strongly ill advised due  to  the  differing  coordinate  systems.
                 [null]

       -q, -min-MQ INT
                 Minimum mapping quality for an alignment to be used [0]

       -Q, --min-BQ INT
                 Minimum base quality for a base to be considered [13]

       -r, --region STR
                 Only  generate  pileup in region. Requires the BAM files to be indexed.  If used
                 in conjunction with -l then considers the intersection of the two requests.  STR
                 [all sites]

       -R, --ignore-RG
                 Ignore RG tags. Treat all reads in one BAM as one sample.

       --rf, --incl-flags STR|INT
                 Required flags: skip reads with mask bits unset [null]

       --ff, --excl-flags STR|INT
                 Filter flags: skip reads with mask bits set [UNMAP,SECONDARY,QCFAIL,DUP]

       -x, --ignore-overlaps
                 Disable read-pair overlap detection.

       -X        Include  customized  index file as a part of arugments. See EXAMPLES section for
                 sample of useage.

       Output Options:

       -o, --output FILE
                 Write pileup output to FILE, rather than the default of standard output.

                 (The same short option is used for both the deprecated  --open-prob  option  and
                 --output.   If  -o's  argument  contains  any  non-digit characters other than a
                 leading + or - sign, it  is  interpreted  as  --output.   Usually  the  filename
                 extension  will  take care of this, but to write to an entirely numeric filename
                 use -o ./123 or --output 123.)

       -O, --output-BP
                 Output base positions on reads.

       -s, --output-MQ
                 Output mapping quality.  Equivalent to --output-extra MAPQ.

       --output-QNAME
                 Output an extra column containing comma-separated  read  names.   Equivalent  to
                 --output-extra QNAME.

       --output-extra STR
                 Output  extra  columns  containing comma-separated values of read fields or read
                 tags. The names of the selected fields have to be provided as they are described
                 in  the  SAM Specification (pag. 6) and will be output by the mpileup command in
                 the same order as in the document (i.e.  QNAME, FLAG, RNAME,...)  The names  are
                 case sensitive. Currently, only the following fields are supported:

                 QNAME, FLAG, RNAME, POS, MAPQ, RNEXT, PNEXT

                 Anything  that  is not on this list is treated as a potential tag, although only
                 two character tags  are  accepted.  In  the  mpileup  output,  tag  columns  are
                 displayed  in  the  order  they  were  provided by the user in the command line.
                 Field and tag names have to be provided  in  a  comma-separated  string  to  the
                 mpileup command.  E.g.

                 samtools mpileup --output-extra FLAG,QNAME,RG,NM in.bam

                 will display four extra columns in the mpileup output, the first being a list of
                 comma-separated read names, followed by a list of flag values, a list of RG  tag
                 values and a list of NM tag values. Field values are always displayed before tag
                 values.

       --output-sep CHAR
                 Specify a different separtor character for tag value lists,  when  those  values
                 might contain one or more commas (,), which is the default list separator.  This
                 option only affects columns for two-letter tags like NM;  standard  fields  like
                 FLAG or QNAME will always be separated by commas.

       --output-empty CHAR
                 Specify  a  different 'no value' character for tag list entries corresponding to
                 reads that don't have a  tag  requested  with  the  --output-extra  option.  The
                 default is *.

                 This  option only applies to rows that have at least one read in the pileup, and
                 only to columns for two-letter tags.  Columns for  empty  rows  will  always  be
                 printed as *.

       --reverse-del
                 Mark  the  deletions  on the reverse strand with the character #, instead of the
                 usual *.

       -a        Output all positions, including those with zero depth.

       -a -a, -aa
                 Output absolutely all positions, including  unused  reference  sequences.   Note
                 that  when  used  in  conjunction  with  a  BED file the -a option may sometimes
                 operate as if -aa was specified if the reference sequence has  coverage  outside
                 of the region specified in the BED file.

       BAQ (Base Alignment Quality)

       BAQ  is the Phred-scaled probability of a read base being misaligned.  It greatly helps to
       reduce false SNPs caused by misalignments.  BAQ  is  calculated  using  the  probabilistic
       realignment  method  described  in  the  paper  “Improving SNP discovery by base alignment
       quality”,      Heng      Li,      Bioinformatics,      Volume      27,       Issue       8
       <https://doi.org/10.1093/bioinformatics/btr076>

       BAQ  is  turned  on when a reference file is supplied using the -f option.  To disable it,
       use the -B option.

       It is possible to store pre-calculated BAQ values in a SAM  BQ:Z  tag.   Samtools  mpileup
       will  use the precalculated values if it finds them.  The -E option can be used to make it
       ignore the contents of the BQ:Z tag and force it to recalculate the BAQ scores by making a
       new alignment.

EXAMPLES

       o Call SNPs and short INDELs:

           samtools mpileup -uf ref.fa aln.bam | bcftools call -mv > var.raw.vcf
           bcftools filter -s LowQual -e '%QUAL<20 || DP>100' var.raw.vcf  > var.flt.vcf

         The  bcftools  filter  command  marks  low  quality  sites and sites with the read depth
         exceeding a limit, which should be adjusted  to  about  twice  the  average  read  depth
         (bigger  read  depths  usually indicate problematic regions which are often enriched for
         artefacts).  One may consider to add -C50 to mpileup if mapping quality is overestimated
         for  reads containing excessive mismatches. Applying this option usually helps BWA-short
         but may not other mappers.

         Individuals are identified from the SM tags in the @RG header lines. Individuals can  be
         pooled  in one alignment file; one individual can also be separated into multiple files.
         The -P option specifies that indel candidates should be collected only from read  groups
         with  the  @RG-PL tag set to ILLUMINA.  Collecting indel candidates from reads sequenced
         by an indel-prone technology may affect the performance of indel calling.

       o Generate the consensus sequence for one diploid individual:

           samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq

       o Include customized index file as a part of arugments.

           samtools mpileup [options] -X /data_folder/in1.bam [/data_folder/in2.bam [...]] /index_folder/index1.bai [/index_folder/index2.bai [...]]

       o Phase one individual:

           samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out

         The calmd command is used to reduce false heterozygotes around INDELs.

AUTHOR

       Written by Heng Li from the Sanger Institute.

SEE ALSO

       samtools(1), samtools-depth(1), samtools-sort(1), bcftools(1)

       Samtools website: <http://www.htslib.org/>