Provided by: samtools_1.20-3_amd64 bug

NAME

       samtools-split - splits a file by read group.

SYNOPSIS

       samtools split [options] merged.sam|merged.bam|merged.cram

DESCRIPTION

       Splits  a  file  by  read  group,  or  a specified tag, producing one or more output files
       matching a common prefix (by default based on the input filename).

       Unless the -d option is used, the file will be split according to the @RG tags  listed  in
       the  header.   Records  without  an  RG tag or with an RG tag undefined in the header will
       cause the program to exit with an error unless the -u option is used.

       RG values defined in the header but with no records  will  produce  an  output  file  only
       containing a header.

       If  the  -d  TAG option is used, the file will be split on the value in the given aux tag.
       Only string (type Z) and integer (type i in SAM, plus equivalents in  BAM/CRAM)  tags  are
       currently supported.  Unless the -u option is used, the program will exit with an error if
       it finds a record without the given tag.

       Note that attempting to split on a tag with high cardinality may result in the creation of
       a large number of output files.  To prevent this, the -M option can be used to set a limit
       on the number of splits made.

       Using -d RG behaves in a similar way to the default (without -d), opening an  output  file
       for  each  @RG  line in the header.  However, unlike the default, new output files will be
       opened for any RG tags found in the alignment records  irrespective  of  if  they  have  a
       matching header @RG line.

       The -u option may be used to specify the output filename for any records with a missing or
       unrecognised tag.  This option will always write out a file even if there are no records.

       Output format defaults to BAM.  For SAM or CRAM then either set the format with  --output-
       fmt or use -f to set the file extension e.g.  -f %*_%#.sam.

OPTIONS

       -u FILE1      Put reads with no tag or an unrecognised tag into FILE1

       -h FILE2      Use  the  header  from  FILE2  when writing the file given in the -u option.
                     This header completely replaces the one from the input  file.   It  must  be
                     compatible  with  the  input  file header, which means it must have the same
                     number of references listed in the @SQ lines and the references must  be  in
                     the same order and have the same lengths.

       -f STRING     Output filename format string (see below) ["%*_%#.%."]

       -d TAG        Split  reads  by  TAG  value  into  distinct files. Only the TAG key must be
                     supplied with the option. The value of the TAG has  to  be  a  string  (i.e.
                     key:Z:value) or an integer (key:i:value).

                     Using  this option changes the default filename format string to "%*_%!.%.",
                     so that tag values appear in the output file names.  This can be  overridden
                     by using the -f option.

       -p NUMBER     Pad  numeric values in %# and %! format expansions to this many digits using
                     leading zeros.  For %!, only integer tag values will be padded.  String  tag
                     values will be left unchanged, even if the value only includes digits.

       -M,--max-split NUM
                     Limit  the  number  of  files created by the -d option to NUM (default 100).
                     This prevents accidents where trying to split on a tag with high cardinality
                     could  result  in the creation of a very large number of output files.  Once
                     the file limit is reached, any tag values not already seen will  be  treated
                     as unmatched and the program will exit with an error unless the -u option is
                     in use.

                     If desired, the limit can be removed using -M -1, although in  practice  the
                     number of outputs will still be restricted by system limits on the number of
                     files that can be open at once.

                     If splitting by read group, and the read group count in the header is higher
                     than the requested limit then the limit will be raised to match.

       -v            Verbose output

       --no-PG       Do not add a @PG line to the header of the output file.

       Format string expansions:

                 %%   %
                 %*   basename
                 %#   index (of @RG in the header, or count of TAG values seen so far)
                 %!   @RG ID or TAG value
                 %.   output format filename extension

       -@, --threads INT
              Number of input/output compression threads to use in addition to main thread [0].

AUTHOR

       Written by Martin Pollard from the Sanger Institute.

SEE ALSO

       samtools(1), samtools-addreplacerg(1)

       Samtools website: <http://www.htslib.org/>