Provided by: samtools_1.10-3_amd64 bug

NAME

       samtools collate - shuffles and groups reads together by their names

SYNOPSIS

       samtools collate [options] in.sam|in.bam|in.cram [<prefix>]

DESCRIPTION

       Shuffles  and  groups reads together by their names.  A faster alternative to a full query
       name sort, collate ensures that reads of the same name are grouped together in  contiguous
       groups, but doesn't make any guarantees about the order of read names between groups.

       The  output from this command should be suitable for any operation that requires all reads
       from the same template to be grouped together.

       If present, <prefix> is used to name the temporary files that collate  uses  when  sorting
       the  data.   If  neither  the '-O' nor '-o' options are used, <prefix> must be present and
       collate will use it to make an output file name by appending a  suffix  depending  on  the
       format written (.bam by default).

       If  either  the  -O  or  -o  option is used, <prefix> is optional.  If <prefix> is absent,
       collate will write the temporary files to a system-dependent location (/tmp on UNIX).

       Using -f for fast mode will output only primary alignments that have either the  READ1  or
       READ2  flags  set  (but not both).  Any other alignment records will be filtered out.  The
       collation will only work correctly if there are no more than two reads for any given QNAME
       after filtering.

       Fast  mode  keeps  a buffer of alignments in memory so that it can write out most pairs as
       soon as they are found instead of storing them in temporary files.  This allows collate to
       avoid  some  work and so finish more quickly compared to the standard mode.  The number of
       alignments held can be changed using -r, storing more  alignments  uses  more  memory  but
       increases the number of pairs that can be written early.

       While  collate  normally  randomises  the  ordering  of  read  pairs,  fast mode does not.
       Position-dependent biases that would normally be broken up can remain in the fast  collate
       output.  It is therefore not a good idea to use fast mode when preparing data for programs
       that expect randomly ordered paired reads.  For example using fast collate instead of  the
       standard  mode  may  lead  to  significantly different results from aligners that estimate
       library insert sizes on batches of reads.

OPTIONS

       -O      Output to stdout.  This option cannot be used with '-o'.

       -o FILE Write output to FILE.  This option cannot be used with '-O'.

       -u      Write uncompressed BAM output

       -l INT  Compression level.  [1]

       -n INT  Number of temporary files to use.  [64]

       -f      Fast mode (primary alignments only).

       -r INT  Number of reads to store in memory (for use with -f).  [10000]

       --no-PG Do not add a @PG line to the header of the output file.

AUTHOR

       Written by Heng Li from the Sanger Institute and extended by Andrew Whitwham.

SEE ALSO

       samtools(1), samtools-sort(1)

       Samtools website: <http://www.htslib.org/>