oracular (1) samtools-view.1.gz

Provided by: samtools_1.20-3_amd64 bug

NAME

       samtools-view - views and converts SAM/BAM/CRAM files

SYNOPSIS

       samtools view [options] in.sam|in.bam|in.cram [region...]

DESCRIPTION

       With  no  options  or  regions specified, prints all alignments in the specified input alignment file (in
       SAM, BAM, or CRAM format) to standard output in SAM format (with no header).

       You may specify one or more space-separated region specifications after the input  filename  to  restrict
       output  to  only  those  alignments  which  overlap the specified region(s). Use of region specifications
       requires a coordinate-sorted and indexed input file (in BAM or CRAM format).

       The -b, -C, -1, -u, -h, -H, and -c options change the output format from the default of  headerless  SAM,
       and the -o and -U options set the output file name(s).

       The  -t  and  -T options provide additional reference data. One of these two options is required when SAM
       input does not contain @SQ headers, and the -T option is required whenever writing CRAM output.

       The -L, -M, -N, -r, -R, -d, -D, -s, -q, -l, -m, -f, -F, -G, and --rf options filter the  alignments  that
       will be included in the output to only those alignments that match certain criteria.

       The -p, option sets the UNMAP flag on filtered alignments then writes them to the output file.

       The -x, -B, --add-flags, and --remove-flags options modify the data which is contained in each alignment.

       The  -X  option can be used to allow user to specify customized index file location(s) if the data folder
       does not contain any index file. See EXAMPLES section for sample of usage.

       Finally, the -@ option can be used to allocate additional threads to be used for compression, and the  -?
       option requests a long help message.

       REGIONS:
              Regions can be specified as: RNAME[:STARTPOS[-ENDPOS]] and all position coordinates are 1-based.

              Important  note:  when multiple regions are given, some alignments may be output multiple times if
              they overlap more than one of the specified regions.

              Examples of region specifications:

              chr1      Output all alignments mapped to the reference sequence named `chr1' (i.e. @SQ SN:chr1).

              chr2:1000000
                        The region on chr2 beginning at base position 1,000,000 and ending at  the  end  of  the
                        chromosome.

              chr3:1000-2000
                        The  1001bp  region on chr3 beginning at base position 1,000 and ending at base position
                        2,000 (including both end positions).

              '*'       Output the unmapped reads at the end of the file.  (This does not include  any  unmapped
                        reads placed on a reference sequence alongside their mapped mates.)

              .         Output  all  alignments.   (Mostly unnecessary as not specifying a region at all has the
                        same effect.)

OPTIONS

       -b, --bam Output in the BAM format.

       -C, --cram
                 Output in the CRAM format (requires -T).

       -1, --fast
                 Enable fast compression.  This also changes the default output format to BAM, but this  can  be
                 overridden by the explicit format options or using a filename with a known suffix.

       -u, --uncompressed
                 Output  uncompressed  data. This also changes the default output format to BAM, but this can be
                 overridden by the explicit format options or using a filename with a known suffix.

                 This option saves time spent on compression/decompression and is thus preferred when the output
                 is piped to another samtools command.

       -h, --with-header
                 Include the header in the output.

       -H, --header-only
                 Output the header only.

       --no-header
                 When  producing SAM format, output alignment records but not headers.  This is the default; the
                 option can be used to reset the effect of -h/-H.

       -c, --count
                 Instead of printing the alignments, only count them and print  the  total  number.  All  filter
                 options,  such  as  -f,  -F,  and -q, are taken into account.  The -p option is ignored in this
                 mode.

       -?, --help
                 Output long help and exit immediately.

       -o FILE, --output FILE
                 Output to FILE [stdout].

       -U FILE, --unoutput FILE, --output-unselected FILE
                 Write alignments that are not selected by the various filter options to FILE.  When this option
                 is  used,  all alignments (or all alignments intersecting the regions specified) are written to
                 either the output file or this file, but never both.

       -p, --unmap
                 Set the UNMAP flag on alignments that are not selected by the filter options.  These alignments
                 are then written to the normal output.  This is not compatible with -U.

       -t FILE, --fai-reference FILE
                 A  tab-delimited  FILE.   Each line must contain the reference name in the first column and the
                 length of the reference in the second column, with one line for each distinct  reference.   Any
                 additional fields beyond the second column are ignored. This file also defines the order of the
                 reference sequences in sorting. If you run: `samtools faidx <ref.fa>', the resulting index file
                 <ref.fa>.fai can be used as this FILE.

       -T FILE, --reference FILE
                 A  FASTA  format reference FILE, optionally compressed by bgzip and ideally indexed by samtools
                 faidx.  If an index is not present one will be generated for you,  if  the  reference  file  is
                 local.

                 If  the  reference  file  is not local, but is accessed instead via an https://, s3:// or other
                 URL, the index file will need to be supplied by the server  alongside  the  reference.   It  is
                 possible to have the reference and index files in different locations by supplying both to this
                 option separated by the string "##idx##", for example:

                 -T ftp://x.com/ref.fa##idx##ftp://y.com/index.fa.fai

                 However, note that only the location of the reference will be stored in the output file header.
                 If  this  method is used to make CRAM files, the cram reader may not be able to find the index,
                 and may not be able to decode the file unless it can  get  the  references  it  needs  using  a
                 different method.

       -L FILE, --target-file FILE, --targets-file FILE
                 Only output alignments overlapping the input BED FILE [null].

       -M, --use-index
                 Use  the  multi-region  iterator  on the union of a BED file and command-line region arguments.
                 This avoids re-reading the same regions of files so can sometimes be much  faster.   Note  this
                 also  removes  duplicate  sequences.   Without  this  a sequence that overlaps multiple regions
                 specified on the command line will be reported multiple times.  The usage  of  a  BED  file  is
                 optional and its path has to be preceded by -L option.

       --region-file FILE, --regions-file FILE
                 Use  an  index  and  multi-region  iterator to only output alignments overlapping the input BED
                 FILE.  Equivalent to -M -L FILE or --use-index --target-file FILE.

       -N FILE, --qname-file FILE
                 Output only alignments with read names listed  in  FILE.   If  FILE  starts  with  ^  then  the
                 operation is negated and only outputs alignment with read groups not listed in FILE.  It is not
                 permissible to mix both the filter-in and filter-out style syntax in the same command.

       -r STR, --read-group STR
                 Output alignments in read group STR [null].  Note that records with no  RG  tag  will  also  be
                 output when using this option.  This behaviour may change in a future release.

       -R FILE, --read-group-file FILE
                 Output  alignments  in  read  groups  listed  in  FILE  [null].  If FILE starts with ^ then the
                 operation is negated and only outputs alignment with read names not listed in FILE.  It is  not
                 permissible  to  mix  both the filter-in and filter-out style syntax in the same command.  Note
                 that records with no RG tag will also be output when using this  option.   This  behaviour  may
                 change in a future release.

       -d STR1[:STR2], --tag STR1[:STR2]
                 Only  output  alignments  with  tag STR1 and associated value STR2, which can be a string or an
                 integer [null].  The value can be omitted, in which case only the tag is considered.

                 Note that this option does not specify a tag  type.   For  example,  use  -d  XX:42  to  select
                 alignments with an XX:i:42 field, not -d XX:i:42.

       -D STR:FILE, --tag-file STR:FILE
                 Only output alignments with tag STR and associated values listed in FILE [null].

       -q INT, --min-MQ INT
                 Skip alignments with MAPQ smaller than INT [0].

       -l STR, --library STR
                 Only output alignments in library STR [null].

       -m INT, --min-qlen INT
                 Only output alignments with number of CIGAR bases consuming query sequence ≥ INT [0]

       -e STR, --expr STR
                 Only include alignments that match the filter expression STR.  The syntax for these expressions
                 is described in the main samtools(1) man page under the FILTER EXPRESSIONS heading.

       -f FLAG, --require-flags FLAG
                 Only output alignments with all bits set in FLAG present  in  the  FLAG  field.   FLAG  can  be
                 specified  in  hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with `0'
                 (i.e. /^0[0-7]+/), as a decimal number not beginning with '0' or as a comma-separated  list  of
                 flag names.

                 For a list of flag names see samtools-flags(1).

       -F FLAG, --excl-flags FLAG, --exclude-flags FLAG
                 Do  not  output  alignments  with  any bits set in FLAG present in the FLAG field.  FLAG can be
                 specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning  with  `0'
                 (i.e.  /^0[0-7]+/),  as a decimal number not beginning with '0' or as a comma-separated list of
                 flag names.

       --rf FLAG , --incl-flags FLAG, --include-flags FLAG
                 Only output alignments with any bit set in FLAG  present  in  the  FLAG  field.   FLAG  can  be
                 specified  in  hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with `0'
                 (i.e. /^0[0-7]+/), as a decimal number not beginning with '0' or as a comma-separated  list  of
                 flag names.

       -G FLAG   Do  not  output  alignments  with  all  bits set in INT present in the FLAG field.  This is the
                 opposite of -f such that -f12 -G12 is the same as no filtering at all.  FLAG can  be  specified
                 in  hex  by  beginning  with  `0x'  (i.e. /^0x[0-9A-F]+/), in octal by beginning with `0' (i.e.
                 /^0[0-7]+/), as a decimal number not beginning with '0' or as a comma-separated  list  of  flag
                 names.

       -x STR, --remove-tag STR
                 Read  tag(s)  to  exclude from output (repeatable) [null].  This can be a single tag or a comma
                 separated list.  Alternatively the option itself can be repeated multiple times.

                 If the list starts with a `^' then it is negated and treated as a request to  remove  all  tags
                 except those in STR. The list may be empty, so -x ^ will remove all tags.

                 Note that tags will only be removed from reads that pass filtering.

       --keep-tag STR
                 This keeps only tags listed in STR and is directly equivalent to --remove-tag ^STR.  Specifying
                 an empty list will remove all tags.  If both --keep-tag and  --remove-tag  are  specified  then
                 --keep-tag has precedence.

                 Note that tags will only be removed from reads that pass filtering.

       -B, --remove-B
                 Collapse the backward CIGAR operation.

       --add-flags FLAG
                 Adds  flag(s)  to  read.   FLAG can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-
                 F]+/), in octal by beginning with `0' (i.e. /^0[0-7]+/), as a decimal number not beginning with
                 '0' or as a comma-separated list of flag names.

       --remove-flags FLAG
                 Remove flag(s) from read.  FLAG is specified in the same way as with the --add-flags option.

       --subsample FLOAT
                 Output  only  a  proportion  of  the input alignments, as specified by 0.0 ≤ FLOAT ≤ 1.0, which
                 gives the fraction of templates/pairs to be kept.  This subsampling acts in the same way on all
                 of  the  alignment  records in the same template or read pair, so it never keeps a read but not
                 its mate.

       --subsample-seed INT
                 Subsampling seed used to influence which subset of reads is kept.  When subsampling  data  that
                 has  previously  been  subsampled,  be  sure  to  use  a  different  seed value from those used
                 previously; otherwise more reads will be retained than expected.  [0]

       -s FLOAT  Subsampling shorthand option: -s INT.FRAC is equivalent  to  --subsample-seed  INT  --subsample
                 0.FRAC.

       -@ INT, --threads INT
                 Number of BAM compression threads to use in addition to main thread [0].

       -P, --fetch-pairs
                 Retrieve  pairs  even  when  the mate is outside of the requested region.  Enabling this option
                 also turns on the multi-region iterator (-M).  A region to search must be specified, either  on
                 the command-line, or using the -L option.  The input file must be an indexed regular file.

                 This  option  first scans the requested region, using the RNEXT and PNEXT fields of the records
                 that have the PAIRED flag set and pass other filtering options to find where paired  reads  are
                 located.   These  locations  are  used to build an expanded region list, and a set of QNAMEs to
                 allow from the new regions.  It will then make a second pass, collecting  all  reads  from  the
                 originally-specified  region  list together with reads from additional locations that match the
                 allowed set of QNAMEs.  Any other filtering options used will be applied  to  all  reads  found
                 during this second pass.

                 As  this  option  links  reads using RNEXT and PNEXT, it is important that these fields are set
                 accurately.  Use 'samtools fixmate' to correct them if necessary.

                 Note that this option does not work with the  -c,  --count;  -U,  --output-unselected;  or  -p,
                 --unmap options.

       -S        Ignored for compatibility with previous samtools versions.  Previously this option was required
                 if input was in SAM format, but now the correct format is automatically detected  by  examining
                 the first few characters of input.

       -X, --customized-index
                 Include customized index file as a part of arguments. See EXAMPLES section for sample of usage.

       -z FLAGs, --sanitize FLAGs
                 Perform some sanity checks on the state of SAM record fields, fixing up common mistakes made by
                 aligners.  These include soft-clipping alignments when  they  extend  beyond  the  end  of  the
                 reference,  marking  records as unmapped when they have reference * or position 0, and ensuring
                 unmapped alignments have no CIGAR or mapping quality for unmapped alignments and no MD, NM,  CG
                 or SM tags.

                 FLAGs is a comma-separated list of keywords chosen from the following list.

                 unmap  The  UNMAPPED  BAM flag. This is set for reads with position <= 0, reference name "*" or
                        reads starting beyond the end of the reference. Note CIGAR "*" is permitted  for  mapped
                        data so does not trigger this.

                 pos    Position  and  reference  name fields.  These may be cleared when a sequence is unmapped
                        due to the coordinates being beyond the end of the reference.  Selecting this may change
                        the sort order of the file, so it is not a part of the on compound argument.

                 mqual  Mapping quality.  This is set to zero for unmapped reads.

                 cigar  Modifies CIGAR fields, either by adding soft-clips for reads that overlap the end of the
                        reference or by clearing it for unmapped reads.

                 aux    For unmapped data, some auxiliary fields are meaningless and  will  be  removed.   These
                        include NM, MD, CG and SM.

                 off    Perform no sanity fixing.  This is the default

                 on     Sanitize  data  in a way that guarantees the same sort order.  This is everything except
                        for pos.

                 all    All sanitizing options, including pos.

       --no-PG   Do not add a @PG line to the header of the output file.

EXAMPLES

       o Import SAM to BAM when @SQ lines are present in the header:

           samtools view -bo aln.bam aln.sam

         If @SQ lines are absent:

           samtools faidx ref.fa
           samtools view -bt ref.fa.fai -o aln.bam aln.sam

         where ref.fa.fai is generated automatically by the faidx command.

       o Convert a BAM file to a CRAM file using a local reference sequence.

           samtools view -C -T ref.fa -o aln.cram aln.bam

       o Convert a BAM file to a CRAM with NM and MD tags stored verbatim rather than  calculating  on  the  fly
         during  CRAM  decode,  so  that mixed data sets with MD/NM only on some records, or NM calculated using
         different definitions of mismatch, can be decoded without change.  The second command demonstrates  how
         to decode such a file.  The request to not decode MD here is turning off auto-generation of both MD and
         NM; it will still emit the MD/NM tags on records that had these stored verbatim.

           samtools view -C --output-fmt-option store_md=1 --output-fmt-option store_nm=1 -o aln.cram aln.bam
           samtools view --input-fmt-option decode_md=0 -o aln.new.bam aln.cram

       o An alternative way of achieving the above is listing multiple options  after  the  --output-fmt  or  -O
         option.  The commands below are equivalent to the two above.

           samtools view -O cram,store_md=1,store_nm=1 -o aln.cram aln.bam
           samtools view --input-fmt cram,decode_md=0 -o aln.new.bam aln.cram

       o Include customized index file as a part of arguments.

           samtools view [options] -X /data_folder/data.bam /index_folder/data.bai chrM:1-10

       o Output alignments in read group grp2 (records with no RG tag will also be in the output).

           samtools view -r grp2 -o /data_folder/data.rg2.bam /data_folder/data.bam

       o Only keep reads with tag BC and were the barcode matches the barcodes listed in the barcode file.

           samtools view -D BC:barcodes.txt -o /data_folder/data.barcodes.bam /data_folder/data.bam

       o Only  keep  reads with tag RG and read group grp2.  This does almost the same than -r grp2 but will not
         keep records without the RG tag.

           samtools view -d RG:grp2 -o /data_folder/data.rg2_only.bam /data_folder/data.bam

       o Remove the actions of samtools markdup.  Clear the duplicate flag and  remove  the  dt  tag,  keep  the
         header.

           samtools view -h --remove-flags DUP -x dt -o /data_folder/dat.no_dup_markings.bam /data_folder/data.bam

AUTHOR

       Written by Heng Li from the Sanger Institute.

SEE ALSO

       samtools(1), samtools-tview(1), sam(5)

       Samtools website: <http://www.htslib.org/>