Provided by: samtools_1.19.2-1build2_amd64 bug

NAME

       samtools-sort - sorts SAM/BAM/CRAM files

SYNOPSIS

       samtools sort [options] [in.sam|in.bam|in.cram]

DESCRIPTION

       Sort  alignments  by  leftmost  coordinates,  by  read name when -n or -N are used, by tag
       contents with -t, or a minimiser-based collation order with -M.   An  appropriate  @HD  SO
       sort  order  header  tag will be added or an existing one updated if necessary, along with
       the @HD SS sub-sort header tag where appropriate.

       The sorted output is written to standard output by  default,  or  to  the  specified  file
       (out.bam) when -o is used.  This command will also create temporary files tmpprefix.%d.bam
       as needed when the entire alignment data cannot fit into memory (as controlled via the  -m
       option).

       Consider  using  samtools  collate  instead  if you need name collated data without a full
       lexicographical sort.

       Note that if the sorted output file is to be indexed  with  samtools  index,  the  default
       coordinate  sort  must  be used.  Thus the -n, -N, -t and -M options are incompatible with
       samtools index.

       When sorting by minimisier (-M), the sort order is defined  by  the  whole-read  minimiser
       value  and the offset into the read that this minimiser was observed.  This produces small
       clusters (contig-like, but unaligned) and helps to improve compression with LZ algorithms.
       This  can be improved by supplying a known reference to build a minimiser index (-I and -w
       options).

OPTIONS

       -l INT     Set the desired compression level for the final output  file,  ranging  from  0
                  (uncompressed)  or  1  (fastest but minimal compression) to 9 (best compression
                  but slowest to write), similarly to gzip(1)'s compression level setting.

                  If -l is not used, the default compression level will apply.

       -u         Set the compression level to 0, for uncompressed output.  This is a synonym for
                  -l 0.

       -m INT     Approximately the maximum required memory per thread, specified either in bytes
                  or with a K, M, or G suffix.  [768 MiB]

                  To prevent sort from creating a huge number of temporary files, it  enforces  a
                  minimum value of 1M for this setting.

       -n         Sort  by  read  names  (i.e., the QNAME field) using an alpha-numeric ordering,
                  rather than by chromosomal coordinates.  The alpha-numeric  or  “natural”  sort
                  order detects runs of digits in the strings and sorts these numerically.  Hence
                  "a7b" appears before "a12b".  Note  this  is  not  suitable  where  hexadecimal
                  values are in use.  Sets the header sub-sort (@HD SS) tag to queryname:natural.

       -N         Sort  by read names (i.e., the QNAME field) using the lexicographical ordering,
                  rather than by chromosomal coordinates.  Unlike  -n  no  detection  of  numeric
                  components  is  used,  instead  relying  purely  on  the  ASCII  value  of each
                  character.  Hence "x12" comes before "x7" as "1" is before "7" in ASCII.   This
                  is  a  more  appropriate  name sort order where all digits in names are already
                  zero-padded and/or hexadecimal values are being used.  Sets the header sub-sort
                  (@HD SS) tag to queryname:lexicographical.

       -t TAG     Sort  first by the value in the alignment tag TAG, then by position or name (if
                  also using -n or -N).

       -M         Sort unmapped reads (those in  chromosome  "*")  by  their  sequence  minimiser
                  (Schleimer  et  al., 2003; Roberts et al., 2004), also reverse complementing as
                  appropriate.  This has the effect of  collating  some  similar  data  together,
                  improving  the  compressibility  of  the unmapped sequence.  The minimiser kmer
                  size is adjusted using the -K option.  Note data compressed in this manner  may
                  need to be name collated prior to conversion back to fastq.

                  Mapped sequences are sorted by chromosome and position.

       -R         Do not use reverse strand with minimiser sort (only compatible with -M).

       -K INT     Sets the kmer size to be used in the -M option. [20]

       -I FILE    Build  a minimiser index over FILE.  The per-read minimisers produced by -M are
                  no longer sorted by their numeric value, but by the reference  coordinate  this
                  minimiser  was  found  to  come  from  (if  found  in the index).  This further
                  improves compression due to improved  sequence  similarity  between  sequences,
                  albeit with a small CPU cost of building and querying the index.  Specifying -I
                  automatically implies -M.

       -w INT     Specifies the window  size  for  building  the  minimiser  index  on  the  file
                  specified in -I.  This defaults to 100.  It may be better to set this closer to
                  50 for short-read data sets (at a higher CPU and  memory  cost),  or  for  more
                  speed up to 1000 for long-read data sets.

       -H         Squashes  base  homopolymers down to a single base pair before constructing the
                  minimiser.  This is useful for instruments where the primary source of error is
                  in the length of homopolymer.

       -o FILE    Write the final sorted output to FILE, rather than to standard output.

       -O FORMAT  Write the final output as sam, bam, or cram.

                  By  default,  samtools  tries  to  select  a  format  based  on the -o filename
                  extension; if output is to standard output or no format can be deduced, bam  is
                  selected.

       -T PREFIX  Write  temporary  files  to  PREFIX.nnnn.bam,  or if the specified PREFIX is an
                  existing  directory,  to  PREFIX/samtools.mmm.mmm.tmp.nnnn.bam,  where  mmm  is
                  unique to this invocation of the sort command.

                  By  default,  any  temporary  files  are  written alongside the output file, as
                  out.bam.tmp.nnnn.bam, or if output  is  to  standard  output,  in  the  current
                  directory as samtools.mmm.mmm.tmp.nnnn.bam.

       -@ INT     Set  number  of  sorting  and  compression  threads.   By default, operation is
                  single-threaded.

       --no-PG    Do not add a @PG line to the header of the output file.

       --template-coordinate
                  Sorts by template-coordinate, whereby the sort order (@HD SO) is unsorted,  the
                  group order (GO) is query, and the sub-sort (SS) is template-coordinate.

       Ordering Rules

       The following rules are used for ordering records.

       If  option -t is in use, records are first sorted by the value of the given alignment tag,
       and then by position or name (if using -n or -N).  For example, “-t  RG”  will  make  read
       group the primary sort key.  The rules for ordering by tag are:

       •   Records that do not have the tag are sorted before ones that do.

       •   If  the  types of the tags are different, they will be sorted so that single character
           tags (type A) come before array tags (type B), then string tags (types H and Z),  then
           numeric tags (types f and i).

       •   Numeric  tags  (types  f  and  i)  are  compared  by  value.  Note that comparisons of
           floating-point values are subject to issues of rounding and precision.

       •   String tags (types H and Z) are compared based on the binary contents of the tag using
           the C strcmp(3) function.

       •   Character tags (type A) are compared by binary character value.

       •   No  attempt  is made to compare tags of other types — notably type B array values will
           not be compared.

       When the -n or -N option is present, records are sorted by  name.   Historically  samtools
       has  used  a  “natural”  ordering  —  i.e.  sections  consisting  of  digits  are compared
       numerically while all other sections are compared based on  their  binary  representation.
       This  means  “a1”  will  come  before  “b1” and “a9” will come before “a10”.  However this
       alpha-numeric sort can be confused by runs of hexadecimal digits.   The  newer  -N  option
       adds  a  simpler  lexicographical  based name collation which does not attempt any numeric
       comparisons and may be more appropriate for some data sets.  Note care must be taken  when
       using samtools merge to ensure all files are using the same collation order.  Records with
       the same name will be ordered according to the values of the READ1 and  READ2  flags  (see
       flags).

       When the --template-coordinate option is in use, the reads are sorted by:

       1. The earlier unclipped 5' coordinate of the template.

       2. The higher unclipped 5' coordinate of the template.

       3. The library (from the read group).

       4. The molecular identifier (MI tag if present).

       5. The read name.

       6. If unpaired, or if R1 has the lower coordinates of the pair.

       When none of the above options are in use, reads are sorted by reference (according to the
       order of the @SQ header records), then by position in  the  reference,  and  then  by  the
       REVERSE flag.

       Note

       Historically  samtools  sort also accepted a less flexible way of specifying the final and
       temporary output filenames:

              samtools sort [-f] [-o] in.bam out.prefix

       This has now been removed.  The previous out.prefix  argument  (and  -f  option,  if  any)
       should be changed to an appropriate combination of -T PREFIX and -o FILE.  The previous -o
       option should be removed, as output defaults to standard output.

AUTHOR

       Written by Heng Li from the Sanger Institute with numerous subsequent modifications.

SEE ALSO

       samtools(1), samtools-collate(1), samtools-merge(1)

       Samtools website: <http://www.htslib.org/>