Provided by: samtools_1.10-3_amd64 bug

NAME

       samtools mpileup - produces "pileup" textual format from an alignment

SYNOPSIS

       samtools  mpileup  [-EB]  [-C  capQcoef]  [-r reg] [-f in.fa] [-l list] [-Q minBaseQ] [-q minMapQ] in.bam
       [in2.bam [...]]

DESCRIPTION

       Generate text pileup output for one or multiple BAM files.  Each input file produces a separate group  of
       pileup columns in the output.

       Samtools mpileup can still produce VCF and BCF output (with -g or -u), but this feature is deprecated and
       will be removed in a future release.  Please use bcftools mpileup for this  instead.   (Documentation  on
       the deprecated options has been removed from this manual page, but older versions are available online at
       <http://www.htslib.org/doc/>.)

       Note that there are two orthogonal ways to specify locations in the input file;  via  -r  region  and  -l
       file.   The  former uses (and requires) an index to do random access while the latter streams through the
       file contents filtering out the  specified  regions,  requiring  no  index.   The  two  may  be  used  in
       conjunction.   For  example  a BED file containing locations of genes in chromosome 20 could be specified
       using -r 20 -l chr20.bed, meaning that the index is used to find chromosome 20 and then  it  is  filtered
       for the regions listed in the bed file.

   Pileup Format
       Pileup  format  consists  of  TAB-separated  lines,  with each line representing the pileup of reads at a
       single genomic position.

       Several columns contain numeric quality values encoded as individual ASCII  characters.   Each  character
       can  range from “!” to “~” and is decoded by taking its ASCII value and subtracting 33; e.g., “A” encodes
       the numeric value 32.

       The first three columns give the position and reference:

       ○ Chromosome name.

       ○ 1-based position on the chromosome.

       ○ Reference base at this position (this will be “N” on all lines if -f/--fasta-ref has not been used).

       The remaining columns show the pileup data, and are repeated for each input BAM file specified:

       ○ Number of reads covering this position.

       ○ Read bases.  This encodes information on matches, mismatches,  indels,  strand,  mapping  quality,  and
         starts and ends of reads.

         For each read covering the position, this column contains:

         • If  this  is  the  first  position  covered  by the read, a “^” character followed by the alignment's
           mapping quality encoded as an ASCII character.

         • A single character indicating the read base and the strand to which the read has been mapped:

           Forward   Reverse                    Meaning
           ───────────────────────────────────────────────────────────────
            . dot    , comma   Base matches the reference base
            ACGTN     acgtn    Base is a mismatch to the reference base
              >         <      Reference skip (due to CIGAR “N”)
              *        */#     Deletion of the reference base (CIGAR “D”)

           Deleted bases are shown as “*” on both strands unless --reverse-del is used, in which case  they  are
           shown as “#” on the reverse strand.

         • If  there  is  an  insertion  after  this  read  base, text matching “\+[0-9]+[ACGTNacgtn*#]+”: a “+”
           character followed by an integer giving the length of the insertion and then the  inserted  sequence.
           Pads  are shown as “*” unless --reverse-del is used, in which case pads on the reverse strand will be
           shown as “#”.

         • If there is a deletion after this read base, text matching “-[0-9]+[ACGTNacgtn]+”:  a  “-”  character
           followed by the deleted reference bases represented similarly.  (Subsequent pileup lines will contain
           “*” for this read indicating the deleted bases.)

         • If this is the last position covered by the read, a “$” character.

       ○ Base qualities, encoded as ASCII characters.

       ○ Alignment mapping qualities, encoded as ASCII characters.  (Column only present when -s/--output-MQ  is
         used.)

       ○ Comma-separated 1-based positions within the alignments, e.g., 5 indicates that it is the fifth base of
         the corresponding read that is mapped to this genomic position.  (Column only present when -O/--output-
         BP is used.)

       ○ Comma-separated read names.  (Column only present when --output-QNAME is used.)

       ○ Additional  columns containing other specified read fields or tags, as selected via the --output-extra,
         --output-sep, and --output-empty options.

OPTIONS

       -6, --illumina1.3+
                 Assume the quality is in the Illumina 1.3+ encoding.

       -A, --count-orphans
                 Do not skip anomalous read pairs in variant calling.  Anomolous read pairs are those marked  in
                 the FLAG field as paired in sequencing but without the properly-paired flag set.

       -b, --bam-list FILE
                 List of input BAM files, one file per line [null]

       -B, --no-BAQ
                 Disable base alignment quality (BAQ) computation.  See BAQ below.

       -C, --adjust-MQ INT
                 Coefficient  for downgrading mapping quality for reads containing excessive mismatches. Given a
                 read with a phred-scaled probability q of being generated from the  mapped  position,  the  new
                 mapping  quality  is  about sqrt((INT-q)/INT)*INT. A zero value disables this functionality; if
                 enabled, the recommended value for BWA is 50. [0]

       -d, --max-depth INT
                 At a position, read maximally INT reads per input file. Setting this limit reduces  the  amount
                 of  memory  and  time needed to process regions with very high coverage.  Passing zero for this
                 option sets it to the highest possible value, effectively removing the depth limit. [8000]

                 Note that up to release 1.8, samtools would enforce a minimum value for this option.   This  no
                 longer happens and the limit is set exactly as specified.

       -E, --redo-BAQ
                 Recalculate BAQ on the fly, ignore existing BQ tags.  See BAQ below.

       -f, --fasta-ref FILE
                 The  faidx-indexed reference file in the FASTA format. The file can be optionally compressed by
                 bgzip.  [null]

                 Supplying a reference file will enable base alignment quality calculation for all reads aligned
                 to a reference in the file.  See BAQ below.

       -G, --exclude-RG FILE
                 Exclude reads from readgroups listed in FILE (one @RG-ID per line)

       -l, --positions FILE
                 BED  or  position list file containing a list of regions or sites where pileup or BCF should be
                 generated. Position list files contain two columns (chromosome and position) and start counting
                 from  1.   BED  files  contain  at least 3 columns (chromosome, start and end position) and are
                 0-based half-open.
                 While it is possible to mix both position-list and BED coordinates in the same  file,  this  is
                 strongly ill advised due to the differing coordinate systems. [null]

       -q, -min-MQ INT
                 Minimum mapping quality for an alignment to be used [0]

       -Q, --min-BQ INT
                 Minimum base quality for a base to be considered [13]

       -r, --region STR
                 Only  generate  pileup in region. Requires the BAM files to be indexed.  If used in conjunction
                 with -l then considers the intersection of the two requests.  STR [all sites]

       -R, --ignore-RG
                 Ignore RG tags. Treat all reads in one BAM as one sample.

       --rf, --incl-flags STR|INT
                 Required flags: skip reads with mask bits unset [null]

       --ff, --excl-flags STR|INT
                 Filter flags: skip reads with mask bits set [UNMAP,SECONDARY,QCFAIL,DUP]

       -x, --ignore-overlaps
                 Disable read-pair overlap detection.

       -X        Include customized index file as a part of  arugments.  See  EXAMPLES  section  for  sample  of
                 useage.

       Output Options:

       -o, --output FILE
                 Write pileup output to FILE, rather than the default of standard output.

                 (The  same  short  option  is used for both the deprecated --open-prob option and --output.  If
                 -o's argument contains any non-digit characters other than  a  leading  +  or  -  sign,  it  is
                 interpreted  as  --output.  Usually the filename extension will take care of this, but to write
                 to an entirely numeric filename use -o ./123 or --output 123.)

       -O, --output-BP
                 Output base positions on reads.

       -s, --output-MQ
                 Output mapping quality.  Equivalent to --output-extra MAPQ.

       --output-QNAME
                 Output an extra column containing comma-separated read  names.   Equivalent  to  --output-extra
                 QNAME.

       --output-extra STR
                 Output  extra  columns containing comma-separated values of read fields or read tags. The names
                 of the selected fields have to be provided as they are described in the SAM Specification (pag.
                 6) and will be output by the mpileup command in the same order as in the document (i.e.  QNAME,
                 FLAG, RNAME,...)  The names are case  sensitive.  Currently,  only  the  following  fields  are
                 supported:

                 QNAME, FLAG, RNAME, POS, MAPQ, RNEXT, PNEXT

                 Anything  that  is  not on this list is treated as a potential tag, although only two character
                 tags are accepted. In the mpileup output, tag columns are displayed  in  the  order  they  were
                 provided  by the user in the command line.  Field and tag names have to be provided in a comma-
                 separated string to the mpileup command.  E.g.

                 samtools mpileup --output-extra FLAG,QNAME,RG,NM in.bam

                 will display four extra columns in the mpileup  output,  the  first  being  a  list  of  comma-
                 separated  read names, followed by a list of flag values, a list of RG tag values and a list of
                 NM tag values. Field values are always displayed before tag values.

       --output-sep CHAR
                 Specify a different separtor character for tag value lists, when those values might contain one
                 or  more commas (,), which is the default list separator.  This option only affects columns for
                 two-letter tags like NM; standard fields like FLAG or QNAME will always be separated by commas.

       --output-empty CHAR
                 Specify a different 'no value' character for tag list entries corresponding to reads that don't
                 have a tag requested with the --output-extra option. The default is *.

                 This option only applies to rows that have at least one read in the pileup, and only to columns
                 for two-letter tags.  Columns for empty rows will always be printed as *.

       --reverse-del
                 Mark the deletions on the reverse strand with the character #, instead of the usual *.

       -a        Output all positions, including those with zero depth.

       -a -a, -aa
                 Output absolutely all positions, including unused reference sequences.  Note that when used  in
                 conjunction  with a BED file the -a option may sometimes operate as if -aa was specified if the
                 reference sequence has coverage outside of the region specified in the BED file.

       BAQ (Base Alignment Quality)

       BAQ is the Phred-scaled probability of a read base being misaligned.  It greatly helps  to  reduce  false
       SNPs  caused by misalignments.  BAQ is calculated using the probabilistic realignment method described in
       the paper “Improving SNP discovery by base alignment quality”, Heng Li, Bioinformatics, Volume 27,  Issue
       8 <https://doi.org/10.1093/bioinformatics/btr076>

       BAQ  is  turned  on  when  a  reference  file is supplied using the -f option.  To disable it, use the -B
       option.

       It is possible to store pre-calculated BAQ values in a SAM BQ:Z  tag.   Samtools  mpileup  will  use  the
       precalculated  values  if it finds them.  The -E option can be used to make it ignore the contents of the
       BQ:Z tag and force it to recalculate the BAQ scores by making a new alignment.

EXAMPLES

       o Call SNPs and short INDELs:

           samtools mpileup -uf ref.fa aln.bam | bcftools call -mv > var.raw.vcf
           bcftools filter -s LowQual -e '%QUAL<20 || DP>100' var.raw.vcf  > var.flt.vcf

         The bcftools filter command marks low quality sites and sites with the read depth  exceeding  a  limit,
         which  should  be  adjusted  to about twice the average read depth (bigger read depths usually indicate
         problematic regions which are often enriched for artefacts).  One may consider to add -C50  to  mpileup
         if  mapping  quality  is  overestimated for reads containing excessive mismatches. Applying this option
         usually helps BWA-short but may not other mappers.

         Individuals are identified from the SM tags in the @RG header lines. Individuals can be pooled  in  one
         alignment  file; one individual can also be separated into multiple files. The -P option specifies that
         indel candidates should be collected only from read  groups  with  the  @RG-PL  tag  set  to  ILLUMINA.
         Collecting  indel  candidates  from  reads  sequenced  by  an  indel-prone  technology  may  affect the
         performance of indel calling.

       o Generate the consensus sequence for one diploid individual:

           samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq

       o Include customized index file as a part of arugments.

           samtools mpileup [options] -X /data_folder/in1.bam [/data_folder/in2.bam [...]] /index_folder/index1.bai [/index_folder/index2.bai [...]]

       o Phase one individual:

           samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out

         The calmd command is used to reduce false heterozygotes around INDELs.

AUTHOR

       Written by Heng Li from the Sanger Institute.

SEE ALSO

       samtools(1), samtools-depth(1), samtools-sort(1), bcftools(1)

       Samtools website: <http://www.htslib.org/>