Provided by: gmap_2012-06-12-1ubuntu1_amd64 bug

NAME

       gsnap - Genomic Short-read Nucleotide Alignment Program

SYNOPSIS

       gsnap -dDB [OPTION]... [QUERY]...

DESCRIPTION

       Align the sequences QUERY to the reference DB.  With no QUERY, read standard input.

OPTIONS

   Input options
       -D, --dir=directory
              Genome directory

       -d, --db=STRING
              Genome database

       -k, --kmer=INT
              kmer  size  to  use  in  genome  database  (allowed  values:  16  or less).  If not
              specified, the program will find the highest available  kmer  size  in  the  genome
              database

       --basesize=INT
              Base  size  to use in genome database.  If not specified, the program will find the
              highest available base size in the genome database within selected k-mer size

       --sampling=INT
              Sampling to use in genome database.  If not specified, the program  will  find  the
              smallest  available  sampling value in the genome database within selected basesize
              and k-mer size

       -q, --part=INT/INT
              Process only the i-th out of every n sequences e.g., 0/100 or  99/100  (useful  for
              distributing jobs to a computer farm).

       --input-buffer=INT
              Size  of  input buffer (program reads this many sequences at a time for efficiency)
              (default 1000)

       --barcode-length=INT
              Amount of barcode to remove from start of read (default 0)

       -o, --orientation=STRING
              Orientation of paired-end reads Allowed values: FR (fwd-rev, or  typical  Illumina;
              default), RF (rev-fwd, for circularized inserts), or FF (fwd-fwd, same strand)

       --fastq-id-start=INT
              Starting position of identifier in FASTQ header, space-delimited (>= 1)

       --fastq-id-end=INT
              Ending position of identifier in FASTQ header, space-delimited (>= 1)
               Examples:
               @HWUSI-EAS100R:6:73:941:1973#0/1
                start=1, end=1 (default)
                 => identifier is HWUSI-EAS100R:6:73:941:1973#0
               @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
                start=1, end=1
                 => identifier is SRR001666.1
                start=2, end=2
                 => identifier is 071112_SLXA-EAS1_s_7:5:1:817:345
                start=1, end=2
                 => identifier is SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345

       --filter-chastity=STRING
              Skips  reads marked by the Illumina chastity program.  Expecting a string after the
              accession having a 'Y' after the first colon, like this:
               @accession 1:Y:0:CTTGTA where the 'Y' signifies filtering  by  chastity.   Values:
              off  (default),  either,  both.   For 'either', a 'Y' on either end of a paired-end
              read will be filtered.  For 'both', a 'Y' is required on both ends of a  paired-end
              read (or on the only end of a single-end read).

   Computation options
       Note:  GSNAP  has  an  ultrafast  algorithm for calculating mismatches up to and including
       ((readlength+2)/kmer - 2) ("ultrafast  mismatches").  The  program  will  run  fastest  if
       max-mismatches  (plus  suboptimal-levels)  is within that value.  Also, indels, especially
       end indels, take longer to compute, although the algorithm is still designed to be fast.

       -B, --batch=INT
               Mode     Offsets       Positions       Genome
                 0      allocate      mmap            mmap
                 1      allocate      mmap & preload  mmap
                 2      allocate      mmap & preload  mmap & preload (default)
                 3      allocate      allocate        mmap & preload
                 4      allocate      allocate        allocate
                 5      expand        allocate        allocate

              Note: For a single sequence, all data structures use mmap.  If mmap  not  available
              and allocate not chosen, then will use fileio (very slow)

       -m, --max-mismatches=FLOAT
              Maximum  number  of  mismatches  allowed  (if  not  specified, then defaults to the
              ultrafast level of ((readlength+2)/kmer - 2)) If specified  between  0.0  and  1.0,
              then  treated  as a fraction of each read length. Otherwise, treated as an integral
              number of mismatches (including indel and splicing penalties) For RNA-Seq, you  may
              need  to  increase this value slightly to align reads extending past the ends of an
              exon.

       --query-unk-mismatch=INT
              Whether to count unknown (N) characters in the query as a mismatch (0=no (default),
              1=yes)

       --genome-unk-mismatch=INT
              Whether  to  count  unknown (N) characters in the genome as a mismatch (0=no, 1=yes
              (default))

       --terminal-threshold=INT
              Threshold for searching for a terminal alignment (from one end of the read  to  the
              best  possible  position at the other end) (default 2).  For example, if this value
              is 2, then if GSNAP finds an exact or 1-mismatch alignment, it will not try to find
              a  terminal  alignment.   Note that this default value may not be low enough if you
              want to obtain terminal alignments  for  very  short  reads,  although  such  reads
              probably don't have enough specificity for terminal alignments anyway.  To turn off
              terminal alignments, set this to a high value, greater than the  value  for  --max-
              mismatches.

       -i, --indel-penalty=INT
              Penalty  for  an  indel  (default  2).   Counts against mismatches allowed. To find
              indels, make indel-penalty less than or equal to max-mismatches.  A value <  2  can
              lead to false positives at read ends

       --indel-endlength=INT
              Minimum length at end required for indel alignments (default 4)

       -y, --max-middle-insertions=INT
              Maximum number of middle insertions allowed (default 9)

       -z, --max-middle-deletions=INT
              Maximum number of middle deletions allowed (default 30)

       -Y, --max-end-insertions=INT
              Maximum number of end insertions allowed (default 3)

       -Z, --max-end-deletions=INT
              Maximum number of end deletions allowed (default 6)

       -M, --suboptimal-levels=INT
              Report  suboptimal  hits  beyond best hit (default 0) All hits with best score plus
              suboptimal-levels are reported

       -a, --adapter-strip=STRING
              Method for removing adapters from reads. Currently  allowed  values:  off,  paired.
              Default  is  "paired", which removes adapters from paired-end reads if a concordant
              or paired alignment cannot be found from the original read.  To turn off,  use  the
              value "off".

       --trim-mismatch-score=INT
              Score  to  use  for  mismatches  when  trimming at ends (default is -3; to turn off
              trimming, specify 0). Warning:  turning  trimming  off  will  give  false  positive
              mismatches at the ends of reads

       --trim-indel-score=INT
              Score to use for indels when trimming at ends (default is -4; to turn off trimming,
              specify 0). Warning: turning trimming off will give false positive  indels  at  the
              ends of reads

       -V, --snpsdir=STRING
              Directory  for  SNPs  index  files (created using snpindex) (default is location of
              genome index files specified using -D and -d)

       -v, --use-snps=STRING
              Use database  containing  known  SNPs  (in  <STRING>.iit,  built  previously  using
              snpindex) for tolerance to SNPs

       --cmetdir=STRING
              Directory  for  methylcytosine  index  files  (created  using cmetindex) default is
              location of genome index files specified using -D, -V, and -d)

       --atoidir=STRING
              Directory for A-to-I RNA editing index files (created using atoiindex) (default  is
              location of genome index files specified using -D, -V, and -d)

       --mode=STRING
              Alignment mode: standard (default), cmet-stranded, cmet-nonstranded, atoi-stranded,
              or atoi-nonstranded. Non-standard modes requires you to  have  previously  run  the
              cmetindex or atoiindex programs on the genome

       --tallydir=STRING
              Directory  for  tally  IIT  file to resolve concordant multiple results (default is
              location of genome index files specified using -D and -d).   Note:  can  just  give
              full path name to --use-tally instead.

       --use-tally=STRING
              Use this tally IIT file to resolve concordant multiple results

       --runlengthdir=STRING
              Directory for runlength IIT file to resolve concordant multiple results (default is
              location of genome index files specified using -D and -d).   Note:  can  just  give
              full path name to --use-runlength instead.

       --use-runlength=STRING
              Use this runlength IIT file to resolve concordant multiple results

       -t, --nthreads=INT
              Number of worker threads

   Options for GMAP alignment within GSNAP
       --gmap-mode=STRING
              Cases  to  use  GMAP  for complex alignments containing multiple splices or indels.
              Allowed values: none, pairsearch, indel_knownsplice, terminal, improve (or multiple
              values,      separated      by     commas).      Default:     all     on,     i.e.,
              pairsearch,indel_knownsplice,terminal,improve

       --trigger-score-for-gmap=INT
              Try GMAP pairsearch on nearby genomic regions if best score (the total of both ends
              if paired-end) exceeds this value (default 5)

       --max-gmap-pairsearch=INT
              Perform  GMAP  pairsearch  on nearby genomic regions up to this many many candidate
              ends (default 10). Requires pairsearch in --gmap-mode

       --max-gmap-terminal=INT
              Perform GMAP terminal on nearby genomic regions up  to  this  many  candidate  ends
              (default 5). Requires terminal in --gmap-mode

       --max-gmap-improvement=INT
              Perform  GMAP  improvement on nearby genomic regions up to this many candidate ends
              (default 5).  Requires improve in --gmap-mode

       --microexon-spliceprob=FLOAT
              Allow microexons only if one of the splice site probabilities is greater than  this
              value (default 0.90)

   Splicing options for RNA-Seq
       -N, --novelsplicing=INT
              Look for novel splicing (0=no (default), 1=yes)

       --splicingdir=STRING
              Directory  for splicing involving known sites or known introns, as specified by the
              -s or --use-splicing flag (default is directory computed from  -D  and  -d  flags).
              Note: can just give full pathname to the -s flag instead.

       -s, --use-splicing=STRING
              Look  for  splicing  involving  known  sites or known introns (in <STRING>.iit), at
              short or long distances.  See README instructions for the distinction between known
              sites and known introns

       --ambig-splice-noclip
              For  ambiguous  known splicing at ends of the read, do not clip at the splice site,
              but extend instead into the intron. This flag makes sense only if you  provide  the
              --use-splicing flag, and you are trying to eliminate all soft clipping with --trim-
              mismatch-score=0

       -w, --localsplicedist=INT
              Definition of local novel splicing event (default 200000)

       -e, --local-splice-penalty=INT
              Penalty for a local splice (default 0). Counts against mismatches allowed

       -E, --distant-splice-penalty=INT
              Penalty for a distant splice (default 1). A distant splice is one where the  intron
              length exceeds the value of -w, or --localsplicedist, or is an inversion, scramble,
              or translocation  between  two  different  chromosomes  Counts  against  mismatches
              allowed

       -K, --distant-splice-endlength=INT
              Minimum  length  at  end  required  for distant spliced alignments (default 16, min
              allowed is the value of -k, or kmer size)

       -l, --shortend-splice-endlength=INT
              Minimum length at end required for short-end spliced  alignments  (default  2)  but
              unless  known  splice sites are provided with the -s flag, GSNAP may still need the
              end length to be the value of -k, or kmer size to find a given splice

       --distant-splice-identity=FLOAT
              Minimum identity at end required for distant spliced alignments (default 0.95)

       --antistranded-penalty=INT
              (Not currently implemented) Penalty for antistranded splicing when  using  stranded
              RNA-Seq protocols. A positive value, such as 1, expects antisense on the first read
              and sense on the second read. Default  is  0,  which  treats  sense  and  antisense
              equally well

       --merge-distant-samechr
              Report  distant  splices  on  the  same chromosome as a single splice, if possible.
              Will produce a single SAM line instead of two SAM lines, which  is  also  done  for
              translocations, inversions, and scramble events

   Options for paired-end reads
       --pairmax-dna=INT
              Max  total genomic length for DNA-Seq paired reads, or other reads without splicing
              (default 1000).  Used if -N or -s is not specified.

       --pairmax-rna=INT
              Max total genomic length for RNA-Seq paired reads, or other reads that could have a
              splice  (default 200000). Used if -N or -s is specified.  Should probably match the
              value for -w, --localsplicedist.

       --pairexpect=INT
              Expected paired-end length, used for calling splices in medial part  of  paired-end
              reads (default 200)

       --pairdev=INT
              Allowable  deviation  from  expected paired-end length, used for calling splices in
              medial part of paired-end reads (default 25)

   Options for quality scores
       --quality-protocol=STRING
              Protocol for input quality scores. Allowed values:

               illumina (ASCII 64-126) (equivalent to -J 64 -j -31)
               sanger   (ASCII 33-126) (equivalent to -J 33 -j 0)

              Default is sanger (no quality print shift) SAM output  files  should  have  quality
              scores in sanger protocol

              Or you can customize this behavior with these flags:

       -J, --quality-zero-score=INT
              FASTQ  quality  scores  are  zero  at  this  ASCII  value (default is 33 for sanger
              protocol; for Illumina, select 64)

       -j, --quality-print-shift=INT
              Shift FASTQ quality scores by this amount  in  output  (default  is  0  for  sanger
              protocol; to change Illumina input to Sanger output, select -31)

   Output options
       -n, --npaths=INT
              Maximum number of paths to print (default 100).

       -Q, --quiet-if-excessive
              If more than maximum number of paths are found, then nothing is printed.

       -O, --ordered
              Print output in same order as input (relevant only if there is more than one worker
              thread)

       --show-refdiff
              For GSNAP output in SNP-tolerant alignment, shows all differences relative  to  the
              reference  genome  as  lower  case (otherwise, it shows all differences relative to
              both the reference and alternate genome)

       --clip-overlap
              For paired-end reads whose alignments overlap, clip the overlapping region.

       --print-snps
              Print detailed information about SNPs in reads (works only  if  -v  also  selected)
              (not fully implemented yet)

       --failsonly
              Print only failed alignments, those with no results

       --nofails
              Exclude printing of failed alignments

       --fails-as-input
              Print completely failed alignments as input FASTA or FASTQ format

       -A, --format=STRING
              Another  format type, other than default.  Currently implemented: sam Also allowed,
              but not installed at compile-time:  goby  (To  install,  need  to  re-compile  with
              appropriate options)

       --output-buffer-size=INT
              Buffer  size,  in  queries,  for  output  thread (default 1000). When the number of
              results to be printed exceeds this size, the worker threads are  halted  until  the
              backlog is cleared

   Options for SAM output
       --no-sam-headers
              Do not print headers beginning with '@'

       --sam-headers-batch=INT
              Print headers only for this batch, as specified by -q

       --sam-use-0M
              Insert  0M  in  CIGAR between adjacent insertions and deletions Required by Picard,
              but can cause errors in other tools

       --sam-multiple-primaries
              Allows multiple alignments to be marked  as  primary  if  they  have  equally  good
              mapping scores

       --read-group-id=STRING
              Value to put into read-group id (RG-ID) field

       --read-group-name=STRING
              Value to put into read-group name (RG-SM) field

       --read-group-library=STRING
              Value to put into read-group library (RG-LB) field

       --read-group-platform=STRING
              Value to put into read-group library (RG-PL) field

   Help options
       --version
              Show version

       --help Show this help message

ENVIRONMENT

       GMAPDB genome directory (eqivalent to -D)

FILES

       ~/.gmaprc
              configuration file

AUTHOR

       Thomas D. Wu and Colin K. Watanabe

REPORTING BUGS

       Report bugs to Thomas Wu <twu@gene.com>.

COPYRIGHT

       Copyright 2005 Genentech, Inc. All rights reserved.

SEE ALSO

       gmap_setup(1), gmap(1)
       http://research-pub.gene.com/gmap/