Provided by: staden-io-lib-utils_1.15.0-1.1build2_amd64 bug

NAME

       scramble - Converts between the SAM, BAM and CRAM file formats.

SYNOPSIS

       scramble  [options] [input_file [output_file]]

DESCRIPTION

       scramble  converts  between  various next-gen sequencing alignment file formats, including
       SAM, BAM and CRAM. It can either act as a pipe reading stdin and writing to stdout, or  on
       named files.

       When  operating  as  a  pipe  the input type defaults to SAM or BAM, requiring the -I cram
       option to indicate input is in CRAM format is appropriate. The output defaults to BAM, but
       can  be  adjusted  by  using  the  -O format option. When given filenames the file type is
       automatically chosen based on the filename suffix.

OPTIONS

       -I format
              Selects the input format, where format is one of sam, bam or cram.  Use  this  when
              reading via a pipe to avoid input bytes being consumed when attempting to detect if
              the input is in SAM or BAM format.

       -O format
              Selects the output format, where format is one of sam, bam or cram.

       -1 to -9
              Sets the compression level from 1 (low compression, fast) to 9  (high  compression,
              slow) when writing in BAM or CRAM format. This is only used during writing.

       -0 or -u
              Writes  uncompressed  data.  In  BAM  this  still uses BGZF containers, but with no
              internal compression. In CRAM it stores blocks in RAW format  instead.  The  option
              has no effect on SAM output.

       -j     CRAM  encoding  only.   Add bzip2 to the list of compression codes potentially used
              during CRAM creation.

       -Z     CRAM encoding only.  Add lzma to the list of  compression  codes  potentially  used
              during  CRAM  creation.  Given the slow compression speed of lzma, this may only be
              used where it gives a significant advantage over zlib or  bzip2,  but  with  higher
              compression  levels  (-7)  this weighting is ignored as LZMA decompression speed is
              acceptable, albeit still slower than zlib.

       -m     CRAM decoding only.  Generate  MD:Z:  and  NM:I:  auxiliary  fields  based  on  the
              reference-based compression.

       -M     CRAM encoding only.  Forcibly pack sequences from multiple references into the same
              slice.  Normally CRAM will start a new slice when changing from  one  reference  to
              another,  but  will  still  automatically  switch  to multi-reference slices if the
              number of sequences per slice becomes too small.

       -R range
              Currently for CRAM input only, but SAM/BAM support is  pending.  This  indicates  a
              reference  sequence  name  and  optionally  a  start  and  end location within that
              reference, using the syntax ref_name or ref_name:start-end. For efficient operation
              the CRAM file needs a .crai format index (built using the cram_index program).

       -r ref.fa
              CRAM  encoding  only.   Use this to specify the reference fasta file.  Note that if
              the input SAM or BAM file a file: or local file system based URI specified  in  the
              @SQ headers then this option may not be necessary.

       -s number
              CRAM  encoding  only.   Specifies  the  number of sequecnes per slice.  Defaults to
              10000.

       -S number
              CRAM encoding only.   Specifies the number of slices per container.  Defaults to 1.

       -t     BAM and CRAM only.  Specifies the number of compression or  decompression  threads,
              adaptively   shared  between  both  encoding  and  decoding.   Defaults  to  1  (no
              threading).

       -V version_string
              CRAM encoding only.  Sets the CRAM file format version. Supported values are "2.0",
              "2.1" and "3.0".

       -e     CRAM  encoding only. Embed snippets of the reference sequence in every slice.  This
              means the files can be decoded without needing to specify the reference fasta file.

       -E     CRAM encoding only. Embed snippets of the consensus sequence in every slice.   This
              operates  as  per  the  -e  option, but the consensus is generated from the aligned
              data.  This does not therefore require  a  reference  to  be  known  during  encode
              (although it is still a mandatory part of the specification that the SQ SAM headers
              have an M5 field).  It also means the files  can  be  decoded  without  needing  to
              specify the reference fasta file.

       -x     CRAM  encoding only.  Omit reference based compression and instead store details of
              every base verbatim.

       -B     Experimental, encoding only.  When storing quality  values,  bin  into  8  discrete
              values  (plus 0), as typically used by modern Illumina instruments.  (Note that the
              bins may not be precisely the same ranges.)

       -!     CRAM v3.0 and above decoding only. Do not check CRCs.  This option should  only  be
              used when attempting to recover from a data corruption.

       -q     Do not append @PG header lines with the scramble program name and arguments.

       -X mode
              Encode  CRAM using a set of predefined parameters defined by mode.  This are one of
              fast, normal / default, small or archive.

              fast (param: -1 -s 1000)
                     Lightweight compression for speed and  small  slice  size  for  quick  fine-
                     grained random access.

              normal (param: none)
                     Default  mode.   This  is  the  same  as not specifying -X.  For version 3.1
                     onwards this enables the name tokeniser ("-T").

              small (param v3.0: -j -s 25000,  param v3.1 or v4: -Tf -s 25000)
                     Optimise for smaller files, with larger slices.

              archive (param v3.0: -j -s 100000,  param v3.1 or v4: -Tfa -s 100000)
                     Optimised for smallest files, intended for data archival.  This uses a large
                     slice  size and will have poorer random access. At level 7 onwards this also
                     enables lzma compression if compiled in ("-Z").

       -d tag-list
              Discard all auxiliary tags except those listed in  tag-list.   The  list  is  comma
              separated  and contains the two letter tag codes specified as-is or with simplified
              regular expressions.  Character classes such as "[A-W]" are permitted, but not with
              the  negation  code  "^". Also "." is a synonym for any legal tag character.  Hence
              "[A-Z][A-Z0-9]" represents all tag types belonging to the official namespace.

              The option may be specified more than once, but it cannot be mixed with -D.

       -D tag-list
              Discard auxiliary tags listed in tag-list, keeping everything else.   The  list  is
              comma  separated  and  contains only the two letter tag codes.  As with -d tag-list
              can be specified using a simplified regular expression.  This means -D  ..  removes
              all auxiliary tags.

              The option may be specified more than once, but it cannot be mixed with -d.

EXAMPLES

       To convert a BAM file from stdin to CRAM on stdout, using reference MT.fa.

           some_command | scramble -I bam -O cram -r MT.fa | some_command

       The default CRAM output format is version 3.0.  The command below enables the experimental
       newer compression codecs (NB: do not use this in production) using  the  "small"  profile,
       while  also  removing all tag types reserved for local/private use.  (Also consider -d [A-
       Z][A-Z0-9] instead of the -D arguments.)

           scramble -V 3.1 -X small -D [a-zXYZ]. -D.[a-z] in.cram out.cram

AUTHOR

       James Bonfield, Wellcome Trust Sanger Institute

                                         December 6 2022                              scramble(1)