Provided by: gmap_2012-06-12-1ubuntu1_amd64 

NAME
gsnap - Genomic Short-read Nucleotide Alignment Program
SYNOPSIS
gsnap -dDB [OPTION]... [QUERY]...
DESCRIPTION
Align the sequences QUERY to the reference DB. With no QUERY, read standard input.
OPTIONS
Input options
-D, --dir=directory
Genome directory
-d, --db=STRING
Genome database
-k, --kmer=INT
kmer size to use in genome database (allowed values: 16 or less). If not specified, the program
will find the highest available kmer size in the genome database
--basesize=INT
Base size to use in genome database. If not specified, the program will find the highest
available base size in the genome database within selected k-mer size
--sampling=INT
Sampling to use in genome database. If not specified, the program will find the smallest
available sampling value in the genome database within selected basesize and k-mer size
-q, --part=INT/INT
Process only the i-th out of every n sequences e.g., 0/100 or 99/100 (useful for distributing jobs
to a computer farm).
--input-buffer=INT
Size of input buffer (program reads this many sequences at a time for efficiency) (default 1000)
--barcode-length=INT
Amount of barcode to remove from start of read (default 0)
-o, --orientation=STRING
Orientation of paired-end reads Allowed values: FR (fwd-rev, or typical Illumina; default), RF
(rev-fwd, for circularized inserts), or FF (fwd-fwd, same strand)
--fastq-id-start=INT
Starting position of identifier in FASTQ header, space-delimited (>= 1)
--fastq-id-end=INT
Ending position of identifier in FASTQ header, space-delimited (>= 1)
Examples:
@HWUSI-EAS100R:6:73:941:1973#0/1
start=1, end=1 (default)
=> identifier is HWUSI-EAS100R:6:73:941:1973#0
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
start=1, end=1
=> identifier is SRR001666.1
start=2, end=2
=> identifier is 071112_SLXA-EAS1_s_7:5:1:817:345
start=1, end=2
=> identifier is SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345
--filter-chastity=STRING
Skips reads marked by the Illumina chastity program. Expecting a string after the accession
having a 'Y' after the first colon, like this:
@accession 1:Y:0:CTTGTA where the 'Y' signifies filtering by chastity. Values: off (default),
either, both. For 'either', a 'Y' on either end of a paired-end read will be filtered. For
'both', a 'Y' is required on both ends of a paired-end read (or on the only end of a single-end
read).
Computation options
Note: GSNAP has an ultrafast algorithm for calculating mismatches up to and including
((readlength+2)/kmer - 2) ("ultrafast mismatches"). The program will run fastest if max-mismatches (plus
suboptimal-levels) is within that value. Also, indels, especially end indels, take longer to compute,
although the algorithm is still designed to be fast.
-B, --batch=INT
Mode Offsets Positions Genome
0 allocate mmap mmap
1 allocate mmap & preload mmap
2 allocate mmap & preload mmap & preload (default)
3 allocate allocate mmap & preload
4 allocate allocate allocate
5 expand allocate allocate
Note: For a single sequence, all data structures use mmap. If mmap not available and allocate not
chosen, then will use fileio (very slow)
-m, --max-mismatches=FLOAT
Maximum number of mismatches allowed (if not specified, then defaults to the ultrafast level of
((readlength+2)/kmer - 2)) If specified between 0.0 and 1.0, then treated as a fraction of each
read length. Otherwise, treated as an integral number of mismatches (including indel and splicing
penalties) For RNA-Seq, you may need to increase this value slightly to align reads extending past
the ends of an exon.
--query-unk-mismatch=INT
Whether to count unknown (N) characters in the query as a mismatch (0=no (default), 1=yes)
--genome-unk-mismatch=INT
Whether to count unknown (N) characters in the genome as a mismatch (0=no, 1=yes (default))
--terminal-threshold=INT
Threshold for searching for a terminal alignment (from one end of the read to the best possible
position at the other end) (default 2). For example, if this value is 2, then if GSNAP finds an
exact or 1-mismatch alignment, it will not try to find a terminal alignment. Note that this
default value may not be low enough if you want to obtain terminal alignments for very short
reads, although such reads probably don't have enough specificity for terminal alignments anyway.
To turn off terminal alignments, set this to a high value, greater than the value for --max-
mismatches.
-i, --indel-penalty=INT
Penalty for an indel (default 2). Counts against mismatches allowed. To find indels, make indel-
penalty less than or equal to max-mismatches. A value < 2 can lead to false positives at read
ends
--indel-endlength=INT
Minimum length at end required for indel alignments (default 4)
-y, --max-middle-insertions=INT
Maximum number of middle insertions allowed (default 9)
-z, --max-middle-deletions=INT
Maximum number of middle deletions allowed (default 30)
-Y, --max-end-insertions=INT
Maximum number of end insertions allowed (default 3)
-Z, --max-end-deletions=INT
Maximum number of end deletions allowed (default 6)
-M, --suboptimal-levels=INT
Report suboptimal hits beyond best hit (default 0) All hits with best score plus suboptimal-levels
are reported
-a, --adapter-strip=STRING
Method for removing adapters from reads. Currently allowed values: off, paired. Default is
"paired", which removes adapters from paired-end reads if a concordant or paired alignment cannot
be found from the original read. To turn off, use the value "off".
--trim-mismatch-score=INT
Score to use for mismatches when trimming at ends (default is -3; to turn off trimming, specify
0). Warning: turning trimming off will give false positive mismatches at the ends of reads
--trim-indel-score=INT
Score to use for indels when trimming at ends (default is -4; to turn off trimming, specify 0).
Warning: turning trimming off will give false positive indels at the ends of reads
-V, --snpsdir=STRING
Directory for SNPs index files (created using snpindex) (default is location of genome index files
specified using -D and -d)
-v, --use-snps=STRING
Use database containing known SNPs (in <STRING>.iit, built previously using snpindex) for
tolerance to SNPs
--cmetdir=STRING
Directory for methylcytosine index files (created using cmetindex) default is location of genome
index files specified using -D, -V, and -d)
--atoidir=STRING
Directory for A-to-I RNA editing index files (created using atoiindex) (default is location of
genome index files specified using -D, -V, and -d)
--mode=STRING
Alignment mode: standard (default), cmet-stranded, cmet-nonstranded, atoi-stranded, or atoi-
nonstranded. Non-standard modes requires you to have previously run the cmetindex or atoiindex
programs on the genome
--tallydir=STRING
Directory for tally IIT file to resolve concordant multiple results (default is location of genome
index files specified using -D and -d). Note: can just give full path name to --use-tally
instead.
--use-tally=STRING
Use this tally IIT file to resolve concordant multiple results
--runlengthdir=STRING
Directory for runlength IIT file to resolve concordant multiple results (default is location of
genome index files specified using -D and -d). Note: can just give full path name to
--use-runlength instead.
--use-runlength=STRING
Use this runlength IIT file to resolve concordant multiple results
-t, --nthreads=INT
Number of worker threads
Options for GMAP alignment within GSNAP
--gmap-mode=STRING
Cases to use GMAP for complex alignments containing multiple splices or indels. Allowed values:
none, pairsearch, indel_knownsplice, terminal, improve (or multiple values, separated by commas).
Default: all on, i.e., pairsearch,indel_knownsplice,terminal,improve
--trigger-score-for-gmap=INT
Try GMAP pairsearch on nearby genomic regions if best score (the total of both ends if paired-end)
exceeds this value (default 5)
--max-gmap-pairsearch=INT
Perform GMAP pairsearch on nearby genomic regions up to this many many candidate ends (default
10). Requires pairsearch in --gmap-mode
--max-gmap-terminal=INT
Perform GMAP terminal on nearby genomic regions up to this many candidate ends (default 5).
Requires terminal in --gmap-mode
--max-gmap-improvement=INT
Perform GMAP improvement on nearby genomic regions up to this many candidate ends (default 5).
Requires improve in --gmap-mode
--microexon-spliceprob=FLOAT
Allow microexons only if one of the splice site probabilities is greater than this value (default
0.90)
Splicing options for RNA-Seq
-N, --novelsplicing=INT
Look for novel splicing (0=no (default), 1=yes)
--splicingdir=STRING
Directory for splicing involving known sites or known introns, as specified by the -s or --use-
splicing flag (default is directory computed from -D and -d flags). Note: can just give full
pathname to the -s flag instead.
-s, --use-splicing=STRING
Look for splicing involving known sites or known introns (in <STRING>.iit), at short or long
distances. See README instructions for the distinction between known sites and known introns
--ambig-splice-noclip
For ambiguous known splicing at ends of the read, do not clip at the splice site, but extend
instead into the intron. This flag makes sense only if you provide the --use-splicing flag, and
you are trying to eliminate all soft clipping with --trim-mismatch-score=0
-w, --localsplicedist=INT
Definition of local novel splicing event (default 200000)
-e, --local-splice-penalty=INT
Penalty for a local splice (default 0). Counts against mismatches allowed
-E, --distant-splice-penalty=INT
Penalty for a distant splice (default 1). A distant splice is one where the intron length exceeds
the value of -w, or --localsplicedist, or is an inversion, scramble, or translocation between two
different chromosomes Counts against mismatches allowed
-K, --distant-splice-endlength=INT
Minimum length at end required for distant spliced alignments (default 16, min allowed is the
value of -k, or kmer size)
-l, --shortend-splice-endlength=INT
Minimum length at end required for short-end spliced alignments (default 2) but unless known
splice sites are provided with the -s flag, GSNAP may still need the end length to be the value of
-k, or kmer size to find a given splice
--distant-splice-identity=FLOAT
Minimum identity at end required for distant spliced alignments (default 0.95)
--antistranded-penalty=INT
(Not currently implemented) Penalty for antistranded splicing when using stranded RNA-Seq
protocols. A positive value, such as 1, expects antisense on the first read and sense on the
second read. Default is 0, which treats sense and antisense equally well
--merge-distant-samechr
Report distant splices on the same chromosome as a single splice, if possible. Will produce a
single SAM line instead of two SAM lines, which is also done for translocations, inversions, and
scramble events
Options for paired-end reads
--pairmax-dna=INT
Max total genomic length for DNA-Seq paired reads, or other reads without splicing (default 1000).
Used if -N or -s is not specified.
--pairmax-rna=INT
Max total genomic length for RNA-Seq paired reads, or other reads that could have a splice
(default 200000). Used if -N or -s is specified. Should probably match the value for -w,
--localsplicedist.
--pairexpect=INT
Expected paired-end length, used for calling splices in medial part of paired-end reads (default
200)
--pairdev=INT
Allowable deviation from expected paired-end length, used for calling splices in medial part of
paired-end reads (default 25)
Options for quality scores
--quality-protocol=STRING
Protocol for input quality scores. Allowed values:
illumina (ASCII 64-126) (equivalent to -J 64 -j -31)
sanger (ASCII 33-126) (equivalent to -J 33 -j 0)
Default is sanger (no quality print shift) SAM output files should have quality scores in sanger
protocol
Or you can customize this behavior with these flags:
-J, --quality-zero-score=INT
FASTQ quality scores are zero at this ASCII value (default is 33 for sanger protocol; for
Illumina, select 64)
-j, --quality-print-shift=INT
Shift FASTQ quality scores by this amount in output (default is 0 for sanger protocol; to change
Illumina input to Sanger output, select -31)
Output options
-n, --npaths=INT
Maximum number of paths to print (default 100).
-Q, --quiet-if-excessive
If more than maximum number of paths are found, then nothing is printed.
-O, --ordered
Print output in same order as input (relevant only if there is more than one worker thread)
--show-refdiff
For GSNAP output in SNP-tolerant alignment, shows all differences relative to the reference genome
as lower case (otherwise, it shows all differences relative to both the reference and alternate
genome)
--clip-overlap
For paired-end reads whose alignments overlap, clip the overlapping region.
--print-snps
Print detailed information about SNPs in reads (works only if -v also selected) (not fully
implemented yet)
--failsonly
Print only failed alignments, those with no results
--nofails
Exclude printing of failed alignments
--fails-as-input
Print completely failed alignments as input FASTA or FASTQ format
-A, --format=STRING
Another format type, other than default. Currently implemented: sam Also allowed, but not
installed at compile-time: goby (To install, need to re-compile with appropriate options)
--output-buffer-size=INT
Buffer size, in queries, for output thread (default 1000). When the number of results to be
printed exceeds this size, the worker threads are halted until the backlog is cleared
Options for SAM output
--no-sam-headers
Do not print headers beginning with '@'
--sam-headers-batch=INT
Print headers only for this batch, as specified by -q
--sam-use-0M
Insert 0M in CIGAR between adjacent insertions and deletions Required by Picard, but can cause
errors in other tools
--sam-multiple-primaries
Allows multiple alignments to be marked as primary if they have equally good mapping scores
--read-group-id=STRING
Value to put into read-group id (RG-ID) field
--read-group-name=STRING
Value to put into read-group name (RG-SM) field
--read-group-library=STRING
Value to put into read-group library (RG-LB) field
--read-group-platform=STRING
Value to put into read-group library (RG-PL) field
Help options
--version
Show version
--help Show this help message
ENVIRONMENT
GMAPDB genome directory (eqivalent to -D)
FILES
~/.gmaprc
configuration file
AUTHOR
Thomas D. Wu and Colin K. Watanabe
REPORTING BUGS
Report bugs to Thomas Wu <twu@gene.com>.
COPYRIGHT
Copyright 2005 Genentech, Inc. All rights reserved.
SEE ALSO
gmap_setup(1), gmap(1)
http://research-pub.gene.com/gmap/
GMAP 2012-06-12 Jun 2012 GSNAP(1)