Ubuntu Manpage: razers3 - Faster, fully sensitive read mapping

Provided by: seqan-apps_2.4.0+dfsg-12ubuntu2_amd64

NAME

       razers3 - Faster, fully sensitive read mapping

SYNOPSIS

       razers3 [OPTIONS] <GENOME FILE> <READS FILE>
       razers3 [OPTIONS] <GENOME FILE> <PE-READS FILE1> <PE-READS FILE2>

DESCRIPTION

       RazerS  3  is  a  versatile full-sensitive read mapper based on k-mer counting and seeding
       filters. It  supports  single  and  paired-end  mapping,  shared-memory  parallelism,  and
       optimally  parametrizes  the  filter  based  on  a  user-defined  minimal sensitivity. See
       http://www.seqan.de/projects/razers for more information.

       Input to RazerS 3 is a reference genome file and either one file with single-end reads  or
       two  files  containing  left  or right mates of paired-end reads. Use - to read single-end
       reads from stdin.

       (c) Copyright 2009-2014 by David Weese.

REQUIRED ARGUMENTS

       ARGUMENT 0 INPUT_FILE
              A reference  genome  file.  Valid  filetypes  are:  .sam[.*],  .raw[.*],  .gbk[.*],
              .frn[.*],  .fq[.*],  .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*], .faa[.*], .fa[.*],
              .embl[.*], and .bam, where * is any of the following extensions: gz, bz2, and  bgzf
              for transparent (de)compression.

       READS List of INPUT_FILE's
              Either  one  (single-end)  or  two  (paired-end)  read  files. Valid filetypes are:
              .sam[.*], .raw[.*], .gbk[.*], .frn[.*], .fq[.*],  .fna[.*],  .ffn[.*],  .fastq[.*],
              .fasta[.*], .faa[.*], .fa[.*], .embl[.*], and .bam, where * is any of the following
              extensions: gz, bz2, and bgzf for transparent (de)compression.

OPTIONS

       -h, --help
              Display the help message.

       --version
              Display version information.

   Main Options:
       -i, --percent-identity DOUBLE
              Percent identity threshold. In range [50..100]. Default: 95.

       -rr, --recognition-rate DOUBLE
              Percent recognition rate. In range [80..100]. Default: 100.

       -ng, --no-gaps
              Allow only mismatches, no indels. Default: allow both.

       -f, --forward
              Map reads only to forward strands.

       -r, --reverse
              Map reads only to reverse strands.

       -m, --max-hits INTEGER
              Output only <NUM> of the best hits. In range [1..inf]. Default: 100.

       --unique
              Output only unique best matches (-m 1 -dr 0 -pa).

       -tr, --trim-reads INTEGER
              Trim reads to given length. Default: off. In range [14..inf].

       -o, --output OUTPUT_FILE
              Mapping result filename (use - to dump to stdout in razers format). Default: <READS
              FILE>.razers.  Valid filetypes are: .sam, .razers, .gff, .fasta, .fa, .eland, .bam,
              and .afg.

       -v, --verbose
              Verbose mode.

       -vv, --vverbose
              Very verbose mode.

   Paired-end Options:
       -ll, --library-length INTEGER
              Paired-end library length. In range [1..inf]. Default: 220.

       -le, --library-error INTEGER
              Paired-end library length tolerance. In range [0..inf]. Default: 50.

   Output Format Options:
       -a, --alignment
              Dump the alignment for each match (only razer or fasta format).

       -pa, --purge-ambiguous
              Purge reads with more than <max-hits> best matches.

       -dr, --distance-range INTEGER
              Only consider matches with at most NUM more errors compared to the  best.  Default:
              output all.

       -gn, --genome-naming INTEGER
              Select  how genomes are named (see Naming section below). In range [0..1]. Default:
              0.

       -rn, --read-naming INTEGER
              Select how reads are named (see Naming section below). In range [0..3]. Default: 0.

       --full-readid
              Use the whole read id (don't clip after whitespace).

       -so, --sort-order INTEGER
              Select how matches are  sorted  (see  Sorting  section  below).  In  range  [0..1].
              Default: 0.

       -pf, --position-format INTEGER
              Select  begin/end  position  numbering  (see  Coordinate  section  below). In range
              [0..1]. Default: 0.

       -ds, --dont-shrink-alignments
              Disable alignment shrinking in SAM.  This is required for generating a gold mapping
              for Rabema.

   Filtration Options:
       -fl, --filter STRING
              Select k-mer filter. One of pigeonhole and swift. Default: pigeonhole.

       -mr, --mutation-rate DOUBLE
              Set the percent mutation rate (pigeonhole). In range [0..20]. Default: 5.

       -ol, --overlap-length INTEGER
              Manually set the overlap length of adjacent k-mers (pigeonhole). In range [0..inf].

       -pd, --param-dir STRING
              Read user-computed parameter files in the directory <DIR> (swift).

       -t, --threshold INTEGER
              Manually set minimum k-mer count threshold (swift). In range [1..inf].

       -tl, --taboo-length INTEGER
              Set taboo length (swift). In range [1..inf]. Default: 1.

       -s, --shape STRING
              Manually set k-mer shape.

       -oc, --overabundance-cut INTEGER
              Set k-mer overabundance cut ratio. In range [0..1]. Default: 1.

       -rl, --repeat-length INTEGER
              Skip simple-repeats of length <NUM>. In range [1..inf]. Default: 1000.

       -lf, --load-factor DOUBLE
              Set  the  load  factor  for  the  open  addressing  k-mer index. In range [1..inf].
              Default: 1.6.

   Verification Options:
       -mN, --match-N
              N matches all other characters. Default: N matches nothing.

       -ed, --error-distr STRING
              Write error distribution to FILE.

       -mf, --mismatch-file STRING
              Write mismatch patterns to FILE.

   Misc Options:
       -cm, --compact-mult DOUBLE
              Multiply compaction threshold by this value after reaching and compacting. In range
              [0..inf]. Default: 2.2.

       -ncf, --no-compact-frac DOUBLE
              Don't compact if in this last fraction of genome. In range [0..1]. Default: 0.05.

   Parallelism Options:
       -tc, --thread-count INTEGER
              Set  the  number of threads to use (0 to force sequential mode). In range [0..inf].
              Default: 1.

       -pws, --parallel-window-size INTEGER
              Collect candidates in windows of this length. In range [1..inf]. Default: 500000.

       -pvs, --parallel-verification-size INTEGER
              Verify candidates in packages of this size. In range [1..inf]. Default: 100.

       -pvmpc, --parallel-verification-max-package-count INTEGER
              Largest number of packages to  create  for  verification  per  thread-1.  In  range
              [1..inf]. Default: 100.

       -amms, --available-matches-memory-size INTEGER
              Bytes of main memory available for storing matches. In range [-1..inf]. Default: 0.

       -mhst, --match-histo-start-threshold INTEGER
              When to start histogram. In range [1..inf]. Default: 5.

FORMATS, NAMING, SORTING, AND COORDINATE SCHEMES

       RazerS 3 supports various output formats. The output format is detected automatically from
       the file name suffix.

       .razers
              Razer format

       .fa, .fasta
              Enhanced Fasta format

       .eland Eland format

       .gff   GFF format

       .sam   SAM format

       .bam   BAM format

       .afg   Amos AFG format

              By default, reads and contigs are referred by their Fasta ids given  in  the  input
              files. With the -gn and -rn options this behaviour can be changed:

       0      Use Fasta id.

       1      Enumerate beginning with 1.

       2      Use the read sequence (only for short reads!).

       3      Use the Fasta id, do NOT append /L or /R for mate pairs.

              The  way  matches  are sorted in the output file can be changed with the -so option
              for the following formats: razers, fasta, sam, and afg. Primary and secondary  sort
              keys are:

       0      1. read number, 2. genome position

       1      1. genome position, 2. read number

              The  coordinate  space used for begin and end positions can be changed with the -pf
              option for the razer and fasta formats:

       0      Gap space. Gaps between characters are counted from 0.

       1      Position space. Characters are counted from 1.

EXAMPLES

       razers3 -i 96 -tc 12 -o mapped.razers hg18.fa reads.fq
              Map single-end reads with 4% error rate using 12 threads.

       razers3 -i 95 -no-gaps -o mapped.razers hg18.fa reads.fq.gz
              Map single-end gzipped reads with 5% error rate and no indels.

       razers3 -i 94 -rr 95 -tc 12 -ll 280 --le 80 -o mapped.razers hg18.fa reads_1.fq reads_2.fq
              Map paired-end reads with up to 6% errors, 95% sensitivity, 12  threads,  and  only
              output aligned pairs with an outer distance of 200-360bp.