Provided by: gmap_2012-06-12-1ubuntu1_amd64
NAME
gmap - Genomic Mapping and Alignment Program
SYNOPSIS
gmap -dDB|-gFASTA [OPTION]... [QUERY]...
DESCRIPTION
Align the sequences QUERY to the reference, specified with -d or -g. With no QUERY, read standard input.
OPTIONS
Input options -D, --dir=directory Genome directory -d, --db=STRING Genome database. If argument is '?' (with the quotes), this command lists available databases. -k, --kmer=INT kmer size to use in genome database (allowed values: 16 or less). If not specified, the program will find the highest available kmer size in the genome database --basesize=INT Base size to use in genome database. If not specified, the program will find the highest available base size in the genome database within selected k-mer size --sampling=INT Sampling to use in genome database. If not specified, the program will find the smallest available sampling value in the genome database within selected basesize and k-mer size -G, --genomefull Use full genome (all ASCII chars allowed; built explicitly during setup), not compressed version -g, --gseg=filename User-supplied genomic segment -1, --selfalign Align one sequence against itself in FASTA format via stdin (Useful for getting protein translation of a nucleotide sequence) -2, --pairalign Align two sequences in FASTA format via stdin, first one being genomic and second one being cDNA --cmdline=STRING,STRING Align these two sequences provided on the command line, first one being genomic and second one being cDNA -q, --part=INT/INT Process only the i-th out of every n sequences e.g., 0/100 or 99/100 (useful for distributing jobs to a computer farm). --input-buffer=INT Size of input buffer (program reads this many sequences at a time for efficiency) (default 1000) Computation options -B, --batch=INT Mode Offsets Positions Genome 0 allocate mmap mmap 1 allocate mmap & preload mmap 2 allocate mmap & preload mmap & preload (default) 3 allocate allocate mmap & preload 4 allocate allocate allocate 5 expand allocate allocate Note: For a single sequence, all data structures use mmap. If mmap not available and allocate not chosen, then will use fileio (very slow) --nosplicing Turns off splicing (useful for aligning genomic sequences onto a genome) --min-intronlength=INT Min length for one internal intron (default 9). Below this size, a genomic gap will be considered a deletion rather than an intron. -K, --intronlength=INT Max length for one internal intron (default 1000000) -w, --localsplicedist=INT Max length for known splice sites at ends of sequence (default 200000) -L, --totallength=INT Max total intron length (default 2400000) -x, --chimera-margin=INT Amount of unaligned sequence that triggers search for the remaining sequence (default 40). Enables alignment of chimeric reads, and may help with some non- chimeric reads. To turn off, set to a large value (greater than the query length). -t, --nthreads=INT Number of worker threads -C, --chrsubsetfile=filename User-supplied chromosome subset file -c, --chrsubset=string Chromosome subset to search -z, --direction=STRING cDNA direction (sense_force, antisense_force, sense_filter, antisense_filter, or auto (default)) -H, --trimendexons=INT Trim end exons with fewer than given number of matches (in nt, default 12) --cross-species For cross-species alignments, use a more sensitive search for canonical splicing --canonical-mode=INT Reward for canonical and semi-canonical introns 0=low reward, 1=high reward (default), 2=low reward for high-identity sequences and high reward otherwise --allow-close-indels=INT Allow an insertion and deletion close to each other (0=no, 1=yes (default), 2=only for high-quality alignments) --microexon-spliceprob=FLOAT Allow microexons only if one of the splice site probabilities is greater than this value (default 0.90) --cmetdir=STRING Directory for methylcytosine index files (created using cmetindex) (default is location of genome index files specified using -D, -V, and -d) --atoidir=STRING Directory for A-to-I RNA editing index files (created using atoiindex) (default is location of genome index files specified using -D, -V, and -d) --mode=STRING Alignment mode: standard (default), cmet-stranded, cmet-nonstranded, atoi-stranded, or atoi-nonstranded. Non-standard modes requires you to have previously run the cmetindex or atoiindex programs on the genome -p, --prunelevel Pruning level: 0=no pruning (default), 1=poor seqs, 2=repetitive seqs, 3=poor and repetitive Output types -S, --summary Show summary of alignments only -A, --align Show alignments -3, --continuous Show alignment in three continuous lines -4, --continuous-by-exon Show alignment in three lines per exon -Z, --compress Print output in compressed format -E, --exons=STRING Print exons ("cdna" or "genomic") -P, --protein_dna Print protein sequence (cDNA) -Q, --protein_gen Print protein sequence (genomic) -f, --format=INT Other format for output (also note the -A and -S options and other options listed under Output types): psl (or 1)= PSL (BLAT) format, gff3_gene (or 2)= GFF3 gene format, gff3_match_cdna (or 3)= GFF3 cDNA_match format, gff3_match_est (or 4) = GFF3 EST_match format, splicesites (or 6) = splicesites output (for GSNAP splicing file), introns = introns output (for GSNAP splicing file), map_exons (or 7) = IIT FASTA exon map format, map_genes (or 8) = IIT FASTA map format, coords (or 9) = coords in table format, sampe = SAM format (setting paired_read bit in flag), samse = SAM format (without setting paired_read bit) Output options -n, --npaths=INT Maximum number of paths to show. If set to 0, prints two paths if chimera detected, else one. --quiet-if-excessive If more than maximum number of paths are found, then nothing is printed. --suboptimal-score=INT Report only paths whose score is within this value of the best path. By default, if this option is not provided, the program prints all paths found. -O, --ordered Print output in same order as input (relevant only if there is more than one worker thread) -5, --md5 Print MD5 checksum for each query sequence -o, --chimera-overlap Overlap to show, if any, at chimera breakpoint --failsonly Print only failed alignments, those with no results --nofails Exclude printing of failed alignments --fails-as-input Print completely failed alignments as input FASTA or FASTQ format -V, --usesnps=STRING Use database containing known SNPs (in <STRING>.iit, built previously using snpindex) for reporting output --split-output=STRING Basename for multiple-file output, separately for nomapping, uniq, mult, (and chimera, if --chimera-margin is selected) --output-buffer-size=INT Buffer size, in queries, for output thread (default 1000). When the number of results to be printed exceeds this size, the worker threads are halted until the backlog is cleared -F, --fulllength Assume full-length protein, starting with Met --cdsstart=INT Translate codons from given nucleotide (1-based) -T, --truncate Truncate alignment around full-length protein, Met to Stop Implies -F flag. -Y, --tolerant Translates cDNA with corrections for frameshifts Options for SAM output --no-sam-headers Do not print headers beginning with '@' --sam-use-0M Insert 0M in CIGAR between adjacent insertions and deletions Required by Picard, but can cause errors in other tools --read-group-id=STRING Value to put into read-group id (RG-ID) field --read-group-name=STRING Value to put into read-group name (RG-SM) field --read-group-library=STRING Value to put into read-group library (RG-LB) field --read-group-platform=STRING Value to put into read-group library (RG-PL) field Options for quality scores --quality-protocol=STRING Protocol for input quality scores. Allowed values: illumina (ASCII 64-126) (equivalent to -J 64 -j -31) sanger (ASCII 33-126) (equivalent to -J 33 -j 0) Default is sanger (no quality print shift) SAM output files should have quality scores in sanger protocol. Or you can specify the print shift with this flag: -j, --quality-print-shift=INT Shift FASTQ quality scores by this amount in output (default is 0 for sanger protocol; to change Illumina input to Sanger output, select -31) External map file options -M, --mapdir=directory Map directory -m, --map=iitfile Map file. If argument is '?' (with the quotes), this lists available map files. -e, --mapexons Map each exon separately -b, --mapboth Report hits from both strands of genome -u, --flanking=INT Show flanking hits (default 0) --print-comment Show comment line for each hit Alignment output options -N, --nolengths No intron lengths in alignment -I, --invertmode=INT Mode for alignments to genomic (-) strand: 0=Don't invert the cDNA (default) 1=Invert cDNA and print genomic (-) strand 2=Invert cDNA and print genomic (+) strand -i, --introngap=INT Nucleotides to show on each end of intron (default=3) -l, --wraplength=INT Wrap length for alignment (default=50) Help options --version Show version --help Show this help message
ENVIRONMENT
GMAPDB genome directory (eqivalent to -D)
FILES
~/.gmaprc configuration file
AUTHOR
Thomas D. Wu and Colin K. Watanabe
REPORTING BUGS
Report bugs to Thomas Wu <twu@gene.com>.
COPYRIGHT
Copyright 2005 Genentech, Inc. All rights reserved.
SEE ALSO
gmap_setup(1), gsnap(1) http://research-pub.gene.com/gmap/