Provided by: gmap_2017-11-15-1_amd64
NAME
gsnap - Genomic Short-read Nucleotide Alignment Program
SYNOPSIS
gsnap [OPTIONS...] <FASTA file>, or cat <FASTA file> | gmap [OPTIONS...]
OPTIONS
Input options (must include -d) -D, --dir=directory Genome directory. Default (as specified by --with-gmapdb to the configure program) is /var/cache/gmap -d, --db=STRING Genome database --use-sarray=INT Whether to use a suffix array, which will give increased speed. Allowed values: 0 (no), 1 (yes, plus GSNAP/GMAP algorithm, default), or 2 (yes, and use only suffix array algorithm). Note that suffix arrays will bias against SNP alleles in SNP-tolerant alignment. If there is a conflict between this flag and the flag --speed, the --speed flag takes precedence -k, --kmer=INT kmer size to use in genome database (allowed values: 16 or less) If not specified, the program will find the highest available kmer size in the genome database --sampling=INT Sampling to use in genome database. If not specified, the program will find the smallest available sampling value in the genome database within selected k-mer size -q, --part=INT/INT Process only the i-th out of every n sequences e.g., 0/100 or 99/100 (useful for distributing jobs to a computer farm). --input-buffer-size=INT Size of input buffer (program reads this many sequences at a time for efficiency) (default 1000) --barcode-length=INT Amount of barcode to remove from start of read (default 0) --orientation=STRING Orientation of paired-end reads Allowed values: FR (fwd-rev, or typical Illumina; default), RF (rev-fwd, for circularized inserts), or FF (fwd-fwd, same strand) --fastq-id-start=INT Starting position of identifier in FASTQ header, space-delimited (>= 1) --fastq-id-end=INT Ending position of identifier in FASTQ header, space-delimited (>= 1) Examples: @HWUSI-EAS100R:6:73:941:1973#0/1 start=1, end=1 (default) => identifier is HWUSI-EAS100R:6:73:941:1973#0 @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 start=1, end=1 => identifier is SRR001666.1 start=2, end=2 => identifier is 071112_SLXA-EAS1_s_7:5:1:817:345 start=1, end=2 => identifier is SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 --force-single-end When multiple FASTQ files are provided on the command line, GSNAP assumes they are matching paired-end files. This flag treats each file as single-end. --filter-chastity=STRING Skips reads marked by the Illumina chastity program. Expecting a string after the accession having a 'Y' after the first colon, like this: @accession 1:Y:0:CTTGTA where the 'Y' signifies filtering by chastity. Values: off (default), either, both. For 'either', a 'Y' on either end of a paired-end read will be filtered. For 'both', a 'Y' is required on both ends of a paired-end read (or on the only end of a single-end read). --allow-pe-name-mismatch Allows accession names of reads to mismatch in paired-end files --gunzip Uncompress gzipped input files --bunzip2 Uncompress bzip2-compressed input files Computation options --speed=INT Speed mode (default = 3) Mode Suffix array Hash table 4 On Off 3 On Only if suffix array yields incomplete answers 2 On In addition to suffix array 1 Off Yes Note: There is a tradeoff between speed and accuracy, so slower speed can give better answers. Levels 1 and 2 are about the same, while level 3 is about 4 times faster, and then level 4 is 5 times faster than level 3 However, accuracy of level 3 is better than level 4 and almost the same as level 2, so mode 3 is generally recommended and the default. If there is a conflict between this value and the flag --use-sarray, this takes precedence -B, --batch=INT Batch mode (default = 2) Mode Offsets Positions Genome Suffix array 0 see note mmap mmap mmap 1 see note mmap & preload mmap mmap 2 see note mmap & preload mmap & preload mmap & preload 3 see note allocate mmap & preload mmap & preload (default) 4 see note allocate allocate mmap & preload 5 see note allocate allocate allocate Note: For a single sequence, all data structures use mmap If mmap not available and allocate not chosen, then will use fileio (very slow) Note about offsets: Expansion of offsets can be controlled independently by the --expand-offsets flag. However, offsets are accessed relatively fast in this version of GSNAP. --use-shared-memory=INT If 1 (default), then allocated memory is shared among all processes on this node. If 0, then each process has private allocated memory --preload-shared-memory Load files indicated by --batch mode into shared memory for use by other GMAP/GSNAP processes on this node, and then exit. Ignore any input files. --unload-shared-memory Unload files indicated by --batch mode into shared memory, or allow them to be unloaded when existing GMAP/GSNAP processes on this node are finished with them. Ignore any input files. --expand-offsets=INT Whether to expand the genomic offsets index Values: 0 (no, default), or 1 (yes). Expansion gives faster alignment, but requires more memory -m, --max-mismatches=FLOAT Maximum number of mismatches allowed (if not specified, then defaults to the ultrafast level of ((readlength+index_interval-1)/kmer - 2)) (By default, the genome index interval is 3, but this can be changed by providing a different value for -q to gmap_build when processing the genome.) If specified between 0.0 and 1.0, then treated as a fraction of each read length. Otherwise, treated as an integral number of mismatches (including indel and splicing penalties) For RNA-Seq, you may need to increase this value slightly to align reads extending past the ends of an exon. --min-coverage=FLOAT Minimum coverage required for an alignment. If specified between 0.0 and 1.0, then treated as a fraction of each read length. Otherwise, treated as an integral number of base pairs. Default value is 0.0. --query-unk-mismatch=INT Whether to count unknown (N) characters in the query as a mismatch (0=no (default), 1=yes) --genome-unk-mismatch=INT Whether to count unknown (N) characters in the genome as a mismatch (0=no, 1=yes (default)) -i, --indel-penalty=INT Penalty for an indel (default 2). Counts against mismatches allowed. To find indels, make indel-penalty less than or equal to max-mismatches. A value < 2 can lead to false positives at read ends --indel-endlength=INT Minimum length at end required for indel alignments (default 4) -y, --max-middle-insertions=INT Maximum number of middle insertions allowed (default is readlength - indel-endlength) -z, --max-middle-deletions=INT Maximum number of middle deletions allowed (default 30) -Y, --max-end-insertions=INT Maximum number of end insertions allowed (default 3) -Z, --max-end-deletions=INT Maximum number of end deletions allowed (default 6) -M, --suboptimal-levels=INT Report suboptimal hits beyond best hit (default 0) All hits with best score plus suboptimal-levels are reported -a, --adapter-strip=STRING Method for removing adapters from reads. Currently allowed values: off, paired. Default is "off". To turn on, specify "paired", which removes adapters from paired-end reads if they appear to be present. --trim-mismatch-score=INT Score to use for mismatches when trimming at ends. To turn off trimming, specify 0. Default is -3 for both RNA-Seq and DNA-Seq. Warning: Turning trimming off in RNA-Seq can give false positive mismatches at the ends of reads --trim-indel-score=INT Score to use for indels when trimming at ends. To turn off trimming, specify 0. Default is -2 for both RNA-Seq and DNA-Seq. Warning: Turning trimming off in RNA-Seq can give false positive indels at the ends of reads --end-detail=STRING Amount of alignment detail at ends of read: high, medium, or low (default) Note: medium detail could increase speed by 20% or so, but will miss some splices at the ends of reads -V, --snpsdir=STRING Directory for SNPs index files (created using snpindex) (default is location of genome index files specified using -D and -d) -v, --use-snps=STRING Use database containing known SNPs (in <STRING>.iit, built previously using snpindex) for tolerance to SNPs --cmetdir=STRING Directory for methylcytosine index files (created using cmetindex) (default is location of genome index files specified using -D, -V, and -d) --atoidir=STRING Directory for A-to-I RNA editing index files (created using atoiindex) (default is location of genome index files specified using -D, -V, and -d) --mode=STRING Alignment mode: standard (default), cmet-stranded, cmet-nonstranded, atoi-stranded, atoi-nonstranded, ttoc-stranded, or ttoc-nonstranded. Non-standard modes requires you to have previously run the cmetindex or atoiindex programs (which also cover the ttoc modes) on the genome -t, --nthreads=INT Number of worker threads --max-anchors=INT Controls number of candidate segments returned by the complete set algorithm Default is 10. Can be increased to higher values to solve alignments with evenly spaced mismatches at close distances. However, higher values will cause GSNAP to run more slowly. A value of 1000, for example, slows down the program by a factor of 10 or so. Therefore, change this value only if absolutely necessary. Options for GMAP alignment within GSNAP --gmap-mode=STRING Cases to use GMAP for complex alignments containing multiple splices or indels Allowed values: none, all, pairsearch, terminal, improve (or multiple values, separated by commas). Default: all, i.e., pairsearch,terminal,improve --trigger-score-for-gmap=INT Try GMAP pairsearch on nearby genomic regions if best score (the total of both ends if paired-end) exceeds this value (default 5) --gmap-min-match-length=INT Keep GMAP hit only if it has this many consecutive matches (default 20) --gmap-allowance=INT Extra mismatch/indel score allowed for GMAP alignments (default 3) --max-gmap-pairsearch=INT Perform GMAP pairsearch on nearby genomic regions up to this many many candidate ends (default 50). Requires pairsearch in --gmap-mode --max-gmap-terminal=INT Perform GMAP terminal on nearby genomic regions up to this many candidate ends (default 50). Requires terminal in --gmap-mode --max-gmap-improvement=INT Perform GMAP improvement on nearby genomic regions up to this many candidate ends (default 5). Requires improve in --gmap-mode Splicing options for DNA-Seq --find-dna-chimeras=INT Look for distant splicing in DNA-Seq data (0=no (default), 1=yes) Automatically inactivated for RNA-Seq data if -N or -s are specified) Splicing options for RNA-Seq -N, --novelsplicing=INT Look for novel splicing (0=no (default), 1=yes) --splicingdir=STRING Directory for splicing involving known sites or known introns, as specified by the -s or --use-splicing flag (default is directory computed from -D and -d flags). Note: can just give full pathname to the -s flag instead. -s, --use-splicing=STRING Look for splicing involving known sites or known introns (in <STRING>.iit), at short or long distances See README instructions for the distinction between known sites and known introns --ambig-splice-noclip For ambiguous known splicing at ends of the read, do not clip at the splice site, but extend instead into the intron. This flag makes sense only if you provide the --use-splicing flag, and you are trying to eliminate all soft clipping with --trim-mismatch-score=0 -w, --localsplicedist=INT Definition of local novel splicing event (default 200000) --novelend-splicedist=INT Distance to look for novel splices at the ends of reads (default 50000) -e, --local-splice-penalty=INT Penalty for a local splice (default 0). Counts against mismatches allowed -E, --distant-splice-penalty=INT Penalty for a distant splice (default 1). A distant splice is one where the intron length exceeds the value of -w, or --localsplicedist, or is an inversion, scramble, or translocation between two different chromosomes Counts against mismatches allowed -K, --distant-splice-endlength=INT Minimum length at end required for distant spliced alignments (default 20, min allowed is the value of -k, or kmer size) -l, --shortend-splice-endlength=INT Minimum length at end required for short-end spliced alignments (default 2, but unless known splice sites are provided with the -s flag, GSNAP may still need the end length to be the value of -k, or kmer size to find a given splice --distant-splice-identity=FLOAT Minimum identity at end required for distant spliced alignments (default 0.95) --antistranded-penalty=INT (Not currently implemented, since it leads to poor results) Penalty for antistranded splicing when using stranded RNA-Seq protocols. A positive value, such as 1, expects antisense on the first read and sense on the second read. Default is 0, which treats sense and antisense equally well --merge-distant-samechr Report distant splices on the same chromosome as a single splice, if possible. Will produce a single SAM line instead of two SAM lines, which is also done for translocations, inversions, and scramble events Options for paired-end reads --pairmax-dna=INT Max total genomic length for DNA-Seq paired reads, or other reads without splicing (default 1000). Used if -N or -s is not specified. This value is also used for circular chromosomes when splicing in linear chromosomes is allowed --pairmax-rna=INT Max total genomic length for RNA-Seq paired reads, or other reads that could have a splice (default 200000). Used if -N or -s is specified. Should probably match the value for -w, --localsplicedist. --pairexpect=INT Expected paired-end length, used for calling splices in medial part of paired-end reads (default 500). Was turned off in previous versions, but reinstated. --pairdev=INT Allowable deviation from expected paired-end length, used for calling splices in medial part of paired-end reads (default 100). Was turned off in previous versions, but reinstated. Options for quality scores --quality-protocol=STRING Protocol for input quality scores. Allowed values: illumina (ASCII 64-126) (equivalent to -J 64 -j -31) sanger (ASCII 33-126) (equivalent to -J 33 -j 0) Default is sanger (no quality print shift) SAM output files should have quality scores in sanger protocol Or you can customize this behavior with these flags: -J, --quality-zero-score=INT FASTQ quality scores are zero at this ASCII value (default is 33 for sanger protocol; for Illumina, select 64) -j, --quality-print-shift=INT Shift FASTQ quality scores by this amount in output (default is 0 for sanger protocol; to change Illumina input to Sanger output, select -31) Output options -n, --npaths=INT Maximum number of paths to print (default 100). -Q, --quiet-if-excessive If more than maximum number of paths are found, then nothing is printed. -O, --ordered Print output in same order as input (relevant only if there is more than one worker thread) --show-refdiff For GSNAP output in SNP-tolerant alignment, shows all differences relative to the reference genome as lower case (otherwise, it shows all differences relative to both the reference and alternate genome) --clip-overlap For paired-end reads whose alignments overlap, clip the overlapping region. --merge-overlap For paired-end reads whose alignments overlap, merge the two ends into a single end (beta implementation) --print-snps Print detailed information about SNPs in reads (works only if -v also selected) (not fully implemented yet) --failsonly Print only failed alignments, those with no results --nofails Exclude printing of failed alignments -A, --format=STRING Another format type, other than default. Currently implemented: sam, m8 (BLAST tabular format) --split-output=STRING Basename for multiple-file output, separately for nomapping, halfmapping_uniq, halfmapping_mult, unpaired_uniq, unpaired_mult, paired_uniq, paired_mult, concordant_uniq, and concordant_mult results -o, --output-file=STRING File name for a single stream of output results. --failed-input=STRING Print completely failed alignments as input FASTA or FASTQ format, to the given file, appending .1 or .2, for paired-end data. If the --split-output flag is also given, this file is generated in addition to the output in the .nomapping file. --append-output When --split-output or --failed-input is given, this flag will append output to the existing files. Otherwise, the default is to create new files. --order-among-best=STRING Among alignments tied with the best score, order those alignments in this order. Allowed values: genomic, random (default) --output-buffer-size=INT Buffer size, in queries, for output thread (default 1000). When the number of results to be printed exceeds this size, the worker threads are halted until the backlog is cleared Options for SAM output --no-sam-headers Do not print headers beginning with '@' --add-paired-nomappers Add nomapper lines as needed to make all paired-end results alternate between first end and second end --paired-flag-means-concordant=INT Whether the paired bit in the SAM flags means concordant only (1) or paired plus concordant (0, default) --sam-headers-batch=INT Print headers only for this batch, as specified by -q --sam-use-0M Insert 0M in CIGAR between adjacent insertions and deletions Required by Picard, but can cause errors in other tools --sam-multiple-primaries Allows multiple alignments to be marked as primary if they have equally good mapping scores --force-xs-dir For RNA-Seq alignments, disallows XS:A:? when the sense direction is unclear, and replaces this value arbitrarily with XS:A:+. May be useful for some programs, such as Cufflinks, that cannot handle XS:A:?. However, if you use this flag, the reported value of XS:A:+ in these cases will not be meaningful. --md-lowercase-snp In MD string, when known SNPs are given by the -v flag, prints difference nucleotides as lower-case when they, differ from reference but match a known alternate allele --extend-soft-clips Extends alignments through soft clipped regions --action-if-cigar-error Action to take if there is a disagreement between CIGAR length and sequence length Allowed values: ignore, warning, noprint (default), abort --read-group-id=STRING Value to put into read-group id (RG-ID) field --read-group-name=STRING Value to put into read-group name (RG-SM) field --read-group-library=STRING Value to put into read-group library (RG-LB) field --read-group-platform=STRING Value to put into read-group library (RG-PL) field Help options --check Check compiler assumptions --version Show version --help Show this help message Other tools of GMAP suite are located in /usr/lib/gmap