Ubuntu Manpage: gsnap - Genomic Short-read Nucleotide Alignment Program

Provided by: gmap_2012-06-12-1ubuntu1_amd64

NAME

       gsnap - Genomic Short-read Nucleotide Alignment Program

SYNOPSIS

       gsnap -dDB [OPTION]... [QUERY]...

DESCRIPTION

       Align the sequences QUERY to the reference DB.  With no QUERY, read standard input.

OPTIONS

   Input options
       -D, --dir=directory
              Genome directory

       -d, --db=STRING
              Genome database

       -k, --kmer=INT
              kmer  size  to use in genome database (allowed values: 16 or less).  If not specified, the program
              will find the highest available kmer size in the genome database

       --basesize=INT
              Base size to use in genome database.   If  not  specified,  the  program  will  find  the  highest
              available base size in the genome database within selected k-mer size

       --sampling=INT
              Sampling  to  use  in  genome  database.   If  not  specified,  the program will find the smallest
              available sampling value in the genome database within selected basesize and k-mer size

       -q, --part=INT/INT
              Process only the i-th out of every n sequences e.g., 0/100 or 99/100 (useful for distributing jobs
              to a computer farm).

       --input-buffer=INT
              Size of input buffer (program reads this many sequences at a time for efficiency) (default 1000)

       --barcode-length=INT
              Amount of barcode to remove from start of read (default 0)

       -o, --orientation=STRING
              Orientation of paired-end reads Allowed values: FR (fwd-rev, or  typical  Illumina;  default),  RF
              (rev-fwd, for circularized inserts), or FF (fwd-fwd, same strand)

       --fastq-id-start=INT
              Starting position of identifier in FASTQ header, space-delimited (>= 1)

       --fastq-id-end=INT
              Ending position of identifier in FASTQ header, space-delimited (>= 1)
               Examples:
               @HWUSI-EAS100R:6:73:941:1973#0/1
                start=1, end=1 (default)
                 => identifier is HWUSI-EAS100R:6:73:941:1973#0
               @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
                start=1, end=1
                 => identifier is SRR001666.1
                start=2, end=2
                 => identifier is 071112_SLXA-EAS1_s_7:5:1:817:345
                start=1, end=2
                 => identifier is SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345

       --filter-chastity=STRING
              Skips  reads  marked  by  the  Illumina  chastity program.  Expecting a string after the accession
              having a 'Y' after the first colon, like this:
               @accession 1:Y:0:CTTGTA where the 'Y' signifies filtering by chastity.   Values:  off  (default),
              either,  both.   For  'either',  a  'Y'  on either end of a paired-end read will be filtered.  For
              'both', a 'Y' is required on both ends of a paired-end read (or on the only end  of  a  single-end
              read).

   Computation options
       Note:   GSNAP   has   an   ultrafast   algorithm   for   calculating   mismatches  up  to  and  including
       ((readlength+2)/kmer - 2) ("ultrafast mismatches"). The program will run fastest if max-mismatches  (plus
       suboptimal-levels)  is  within  that value.  Also, indels, especially end indels, take longer to compute,
       although the algorithm is still designed to be fast.

       -B, --batch=INT
               Mode     Offsets       Positions       Genome
                 0      allocate      mmap            mmap
                 1      allocate      mmap & preload  mmap
                 2      allocate      mmap & preload  mmap & preload (default)
                 3      allocate      allocate        mmap & preload
                 4      allocate      allocate        allocate
                 5      expand        allocate        allocate

              Note: For a single sequence, all data structures use mmap.  If mmap not available and allocate not
              chosen, then will use fileio (very slow)

       -m, --max-mismatches=FLOAT
              Maximum number of mismatches allowed (if not specified, then defaults to the  ultrafast  level  of
              ((readlength+2)/kmer  -  2))  If specified between 0.0 and 1.0, then treated as a fraction of each
              read length. Otherwise, treated as an integral number of mismatches (including indel and  splicing
              penalties) For RNA-Seq, you may need to increase this value slightly to align reads extending past
              the ends of an exon.

       --query-unk-mismatch=INT
              Whether to count unknown (N) characters in the query as a mismatch (0=no (default), 1=yes)

       --genome-unk-mismatch=INT
              Whether to count unknown (N) characters in the genome as a mismatch (0=no, 1=yes (default))

       --terminal-threshold=INT
              Threshold  for  searching  for a terminal alignment (from one end of the read to the best possible
              position at the other end) (default 2).  For example, if this value is 2, then if GSNAP  finds  an
              exact  or  1-mismatch  alignment,  it  will  not try to find a terminal alignment.  Note that this
              default value may not be low enough if you want to  obtain  terminal  alignments  for  very  short
              reads,  although such reads probably don't have enough specificity for terminal alignments anyway.
              To turn off terminal alignments, set this to a high value,  greater  than  the  value  for  --max-
              mismatches.

       -i, --indel-penalty=INT
              Penalty  for an indel (default 2).  Counts against mismatches allowed. To find indels, make indel-
              penalty less than or equal to max-mismatches.  A value < 2 can lead to  false  positives  at  read
              ends

       --indel-endlength=INT
              Minimum length at end required for indel alignments (default 4)

       -y, --max-middle-insertions=INT
              Maximum number of middle insertions allowed (default 9)

       -z, --max-middle-deletions=INT
              Maximum number of middle deletions allowed (default 30)

       -Y, --max-end-insertions=INT
              Maximum number of end insertions allowed (default 3)

       -Z, --max-end-deletions=INT
              Maximum number of end deletions allowed (default 6)

       -M, --suboptimal-levels=INT
              Report suboptimal hits beyond best hit (default 0) All hits with best score plus suboptimal-levels
              are reported

       -a, --adapter-strip=STRING
              Method  for  removing  adapters  from  reads.  Currently  allowed values: off, paired.  Default is
              "paired", which removes adapters from paired-end reads if a concordant or paired alignment  cannot
              be found from the original read.  To turn off, use the value "off".

       --trim-mismatch-score=INT
              Score  to  use  for mismatches when trimming at ends (default is -3; to turn off trimming, specify
              0). Warning: turning trimming off will give false positive mismatches at the ends of reads

       --trim-indel-score=INT
              Score to use for indels when trimming at ends (default is -4; to turn off  trimming,  specify  0).
              Warning: turning trimming off will give false positive indels at the ends of reads

       -V, --snpsdir=STRING
              Directory for SNPs index files (created using snpindex) (default is location of genome index files
              specified using -D and -d)

       -v, --use-snps=STRING
              Use  database  containing  known  SNPs  (in  <STRING>.iit,  built  previously  using snpindex) for
              tolerance to SNPs

       --cmetdir=STRING
              Directory for methylcytosine index files (created using cmetindex) default is location  of  genome
              index files specified using -D, -V, and -d)

       --atoidir=STRING
              Directory  for  A-to-I  RNA  editing index files (created using atoiindex) (default is location of
              genome index files specified using -D, -V, and -d)

       --mode=STRING
              Alignment mode: standard  (default),  cmet-stranded,  cmet-nonstranded,  atoi-stranded,  or  atoi-
              nonstranded.  Non-standard  modes  requires  you to have previously run the cmetindex or atoiindex
              programs on the genome

       --tallydir=STRING
              Directory for tally IIT file to resolve concordant multiple results (default is location of genome
              index files specified using -D and -d).  Note:  can  just  give  full  path  name  to  --use-tally
              instead.

       --use-tally=STRING
              Use this tally IIT file to resolve concordant multiple results

       --runlengthdir=STRING
              Directory  for  runlength  IIT file to resolve concordant multiple results (default is location of
              genome index files  specified  using  -D  and  -d).   Note:  can  just  give  full  path  name  to
              --use-runlength instead.

       --use-runlength=STRING
              Use this runlength IIT file to resolve concordant multiple results

       -t, --nthreads=INT
              Number of worker threads

   Options for GMAP alignment within GSNAP
       --gmap-mode=STRING
              Cases  to  use GMAP for complex alignments containing multiple splices or indels.  Allowed values:
              none, pairsearch, indel_knownsplice, terminal, improve (or multiple values, separated by  commas).
              Default: all on, i.e., pairsearch,indel_knownsplice,terminal,improve

       --trigger-score-for-gmap=INT
              Try GMAP pairsearch on nearby genomic regions if best score (the total of both ends if paired-end)
              exceeds this value (default 5)

       --max-gmap-pairsearch=INT
              Perform  GMAP  pairsearch  on  nearby genomic regions up to this many many candidate ends (default
              10). Requires pairsearch in --gmap-mode

       --max-gmap-terminal=INT
              Perform GMAP terminal on nearby genomic regions up  to  this  many  candidate  ends  (default  5).
              Requires terminal in --gmap-mode

       --max-gmap-improvement=INT
              Perform  GMAP  improvement  on  nearby genomic regions up to this many candidate ends (default 5).
              Requires improve in --gmap-mode

       --microexon-spliceprob=FLOAT
              Allow microexons only if one of the splice site probabilities is greater than this value  (default
              0.90)

   Splicing options for RNA-Seq
       -N, --novelsplicing=INT
              Look for novel splicing (0=no (default), 1=yes)

       --splicingdir=STRING
              Directory  for  splicing  involving known sites or known introns, as specified by the -s or --use-
              splicing flag (default is directory computed from -D and -d flags).   Note:  can  just  give  full
              pathname to the -s flag instead.

       -s, --use-splicing=STRING
              Look  for  splicing  involving  known  sites  or known introns (in <STRING>.iit), at short or long
              distances.  See README instructions for the distinction between known sites and known introns

       --ambig-splice-noclip
              For ambiguous known splicing at ends of the read, do not clip  at  the  splice  site,  but  extend
              instead  into  the  intron. This flag makes sense only if you provide the --use-splicing flag, and
              you are trying to eliminate all soft clipping with --trim-mismatch-score=0

       -w, --localsplicedist=INT
              Definition of local novel splicing event (default 200000)

       -e, --local-splice-penalty=INT
              Penalty for a local splice (default 0). Counts against mismatches allowed

       -E, --distant-splice-penalty=INT
              Penalty for a distant splice (default 1). A distant splice is one where the intron length  exceeds
              the  value of -w, or --localsplicedist, or is an inversion, scramble, or translocation between two
              different chromosomes Counts against mismatches allowed

       -K, --distant-splice-endlength=INT
              Minimum length at end required for distant spliced alignments (default  16,  min  allowed  is  the
              value of -k, or kmer size)

       -l, --shortend-splice-endlength=INT
              Minimum  length  at  end  required  for  short-end spliced alignments (default 2) but unless known
              splice sites are provided with the -s flag, GSNAP may still need the end length to be the value of
              -k, or kmer size to find a given splice

       --distant-splice-identity=FLOAT
              Minimum identity at end required for distant spliced alignments (default 0.95)

       --antistranded-penalty=INT
              (Not currently  implemented)  Penalty  for  antistranded  splicing  when  using  stranded  RNA-Seq
              protocols.  A  positive  value,  such  as  1, expects antisense on the first read and sense on the
              second read. Default is 0, which treats sense and antisense equally well

       --merge-distant-samechr
              Report distant splices on the same chromosome as a single splice, if  possible.   Will  produce  a
              single  SAM  line instead of two SAM lines, which is also done for translocations, inversions, and
              scramble events

   Options for paired-end reads
       --pairmax-dna=INT
              Max total genomic length for DNA-Seq paired reads, or other reads without splicing (default 1000).
              Used if -N or -s is not specified.

       --pairmax-rna=INT
              Max total genomic length for RNA-Seq paired reads,  or  other  reads  that  could  have  a  splice
              (default  200000).  Used  if  -N  or  -s  is  specified.   Should probably match the value for -w,
              --localsplicedist.

       --pairexpect=INT
              Expected paired-end length, used for calling splices in medial part of paired-end  reads  (default
              200)

       --pairdev=INT
              Allowable  deviation  from  expected paired-end length, used for calling splices in medial part of
              paired-end reads (default 25)

   Options for quality scores
       --quality-protocol=STRING
              Protocol for input quality scores. Allowed values:

               illumina (ASCII 64-126) (equivalent to -J 64 -j -31)
               sanger   (ASCII 33-126) (equivalent to -J 33 -j 0)

              Default is sanger (no quality print shift) SAM output files should have quality scores  in  sanger
              protocol

              Or you can customize this behavior with these flags:

       -J, --quality-zero-score=INT
              FASTQ  quality  scores  are  zero  at  this  ASCII  value  (default is 33 for sanger protocol; for
              Illumina, select 64)

       -j, --quality-print-shift=INT
              Shift FASTQ quality scores by this amount in output (default is 0 for sanger protocol;  to  change
              Illumina input to Sanger output, select -31)

   Output options
       -n, --npaths=INT
              Maximum number of paths to print (default 100).

       -Q, --quiet-if-excessive
              If more than maximum number of paths are found, then nothing is printed.

       -O, --ordered
              Print output in same order as input (relevant only if there is more than one worker thread)

       --show-refdiff
              For GSNAP output in SNP-tolerant alignment, shows all differences relative to the reference genome
              as  lower  case  (otherwise, it shows all differences relative to both the reference and alternate
              genome)

       --clip-overlap
              For paired-end reads whose alignments overlap, clip the overlapping region.

       --print-snps
              Print detailed information about SNPs in reads  (works  only  if  -v  also  selected)  (not  fully
              implemented yet)

       --failsonly
              Print only failed alignments, those with no results

       --nofails
              Exclude printing of failed alignments

       --fails-as-input
              Print completely failed alignments as input FASTA or FASTQ format

       -A, --format=STRING
              Another  format  type,  other  than  default.   Currently  implemented:  sam Also allowed, but not
              installed at compile-time: goby (To install, need to re-compile with appropriate options)

       --output-buffer-size=INT
              Buffer size, in queries, for output thread (default 1000).  When  the  number  of  results  to  be
              printed exceeds this size, the worker threads are halted until the backlog is cleared

   Options for SAM output
       --no-sam-headers
              Do not print headers beginning with '@'

       --sam-headers-batch=INT
              Print headers only for this batch, as specified by -q

       --sam-use-0M
              Insert  0M  in  CIGAR  between adjacent insertions and deletions Required by Picard, but can cause
              errors in other tools

       --sam-multiple-primaries
              Allows multiple alignments to be marked as primary if they have equally good mapping scores

       --read-group-id=STRING
              Value to put into read-group id (RG-ID) field

       --read-group-name=STRING
              Value to put into read-group name (RG-SM) field

       --read-group-library=STRING
              Value to put into read-group library (RG-LB) field

       --read-group-platform=STRING
              Value to put into read-group library (RG-PL) field

   Help options
       --version
              Show version

       --help Show this help message

ENVIRONMENT

       GMAPDB genome directory (eqivalent to -D)

FILES

       ~/.gmaprc
              configuration file

AUTHOR

       Thomas D. Wu and Colin K. Watanabe

REPORTING BUGS

       Report bugs to Thomas Wu <twu@gene.com>.

COPYRIGHT

       Copyright 2005 Genentech, Inc. All rights reserved.

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

ENVIRONMENT

FILES

AUTHOR

REPORTING BUGS

COPYRIGHT

SEE ALSO