Provided by: gmap_2012-06-12-1ubuntu1_amd64 bug

NAME

       gmap - Genomic Mapping and Alignment Program

SYNOPSIS

       gmap -dDB|-gFASTA [OPTION]... [QUERY]...

DESCRIPTION

       Align  the sequences QUERY to the reference, specified with -d or -g.  With no QUERY, read
       standard input.

OPTIONS

   Input options
       -D, --dir=directory
              Genome directory

       -d, --db=STRING
              Genome database. If argument is '?' (with the quotes), this command lists available
              databases.

       -k, --kmer=INT
              kmer  size  to  use  in  genome  database  (allowed  values:  16  or less).  If not
              specified, the program will find the highest available  kmer  size  in  the  genome
              database

       --basesize=INT
              Base  size  to use in genome database.  If not specified, the program will find the
              highest available base size in the genome database within selected k-mer size

       --sampling=INT
              Sampling to use in genome database.  If not specified, the program  will  find  the
              smallest  available  sampling value in the genome database within selected basesize
              and k-mer size

       -G, --genomefull
              Use full genome (all ASCII chars  allowed;  built  explicitly  during  setup),  not
              compressed version

       -g, --gseg=filename
              User-supplied genomic segment

       -1, --selfalign
              Align  one  sequence  against  itself in FASTA format via stdin (Useful for getting
              protein translation of a nucleotide sequence)

       -2, --pairalign
              Align two sequences in FASTA format via stdin, first one being genomic  and  second
              one being cDNA

       --cmdline=STRING,STRING
              Align these two sequences provided on the command line, first one being genomic and
              second one being cDNA

       -q, --part=INT/INT
              Process only the i-th out of every n sequences e.g., 0/100 or  99/100  (useful  for
              distributing jobs to a computer farm).

       --input-buffer=INT
              Size  of  input buffer (program reads this many sequences at a time for efficiency)
              (default 1000)

   Computation options
       -B, --batch=INT
               Mode     Offsets       Positions       Genome
                 0      allocate      mmap            mmap
                 1      allocate      mmap & preload  mmap
                 2      allocate      mmap & preload  mmap & preload (default)
                 3      allocate      allocate        mmap & preload
                 4      allocate      allocate        allocate
                 5      expand        allocate        allocate

              Note: For a single sequence, all data structures use mmap.  If mmap  not  available
              and allocate not chosen, then will use fileio (very slow)

       --nosplicing
              Turns off splicing (useful for aligning genomic sequences onto a genome)

       --min-intronlength=INT
              Min  length  for  one  internal intron (default 9).  Below this size, a genomic gap
              will be considered a deletion rather than an intron.

       -K, --intronlength=INT
              Max length for one internal intron (default 1000000)

       -w, --localsplicedist=INT
              Max length for known splice sites at ends of sequence (default 200000)

       -L, --totallength=INT
              Max total intron length (default 2400000)

       -x, --chimera-margin=INT
              Amount of unaligned sequence  that  triggers  search  for  the  remaining  sequence
              (default  40).   Enables  alignment  of chimeric reads, and may help with some non-
              chimeric reads. To turn off, set to a large value (greater than the query length).

       -t, --nthreads=INT
              Number of worker threads

       -C, --chrsubsetfile=filename
              User-supplied chromosome subset file

       -c, --chrsubset=string
              Chromosome subset to search

       -z, --direction=STRING
              cDNA direction (sense_force, antisense_force,  sense_filter,  antisense_filter,  or
              auto (default))

       -H, --trimendexons=INT
              Trim end exons with fewer than given number of matches (in nt, default 12)

       --cross-species
              For cross-species alignments, use a more sensitive search for canonical splicing

       --canonical-mode=INT
              Reward  for  canonical  and  semi-canonical  introns  0=low  reward,  1=high reward
              (default), 2=low reward for high-identity sequences and high reward otherwise

       --allow-close-indels=INT
              Allow an insertion and deletion close to each other (0=no, 1=yes (default),  2=only
              for high-quality alignments)

       --microexon-spliceprob=FLOAT
              Allow  microexons only if one of the splice site probabilities is greater than this
              value (default 0.90)

       --cmetdir=STRING
              Directory for methylcytosine index files  (created  using  cmetindex)  (default  is
              location of genome index files specified using -D, -V, and -d)

       --atoidir=STRING
              Directory  for A-to-I RNA editing index files (created using atoiindex) (default is
              location of genome index files specified using -D, -V, and -d)

       --mode=STRING
              Alignment mode: standard (default), cmet-stranded, cmet-nonstranded, atoi-stranded,
              or  atoi-nonstranded.   Non-standard  modes requires you to have previously run the
              cmetindex or atoiindex programs on the genome

       -p, --prunelevel
              Pruning level: 0=no pruning (default), 1=poor seqs, 2=repetitive seqs,  3=poor  and
              repetitive

   Output types
       -S, --summary
              Show summary of alignments only

       -A, --align
              Show alignments

       -3, --continuous
              Show alignment in three continuous lines

       -4, --continuous-by-exon
              Show alignment in three lines per exon

       -Z, --compress
              Print output in compressed format

       -E, --exons=STRING
              Print exons ("cdna" or "genomic")

       -P, --protein_dna
              Print protein sequence (cDNA)

       -Q, --protein_gen
              Print protein sequence (genomic)

       -f, --format=INT
              Other  format  for output (also note the -A and -S options and other options listed
              under Output types):
               psl (or 1)= PSL (BLAT) format,
               gff3_gene (or 2)= GFF3 gene format,
               gff3_match_cdna (or 3)= GFF3 cDNA_match format,
               gff3_match_est (or 4) = GFF3 EST_match format,
               splicesites (or 6) = splicesites output (for GSNAP splicing file),
               introns = introns output (for GSNAP splicing file),
               map_exons (or 7) = IIT FASTA exon map format,
               map_genes (or 8) = IIT FASTA map format,
               coords (or 9) = coords in table format,
               sampe = SAM format (setting paired_read bit in flag),
               samse = SAM format (without setting paired_read bit)

   Output options
       -n, --npaths=INT
              Maximum number of paths to show. If set to 0, prints two paths if chimera detected,
              else one.

       --quiet-if-excessive
              If more than maximum number of paths are found, then nothing is printed.

       --suboptimal-score=INT
              Report only paths whose score is within this value of the best path. By default, if
              this option is not provided, the program prints all paths found.

       -O, --ordered
              Print output in same order as input (relevant only if there is more than one worker
              thread)

       -5, --md5
              Print MD5 checksum for each query sequence

       -o, --chimera-overlap
              Overlap to show, if any, at chimera breakpoint

       --failsonly
              Print only failed alignments, those with no results

       --nofails
              Exclude printing of failed alignments

       --fails-as-input
              Print completely failed alignments as input FASTA or FASTQ format

       -V, --usesnps=STRING
              Use  database  containing  known  SNPs  (in  <STRING>.iit,  built  previously using
              snpindex) for reporting output

       --split-output=STRING
              Basename for multiple-file output,  separately  for  nomapping,  uniq,  mult,  (and
              chimera, if --chimera-margin is selected)

       --output-buffer-size=INT
              Buffer  size,  in  queries,  for  output  thread (default 1000). When the number of
              results to be printed exceeds this size, the worker threads are  halted  until  the
              backlog is cleared

       -F, --fulllength
              Assume full-length protein, starting with Met

       --cdsstart=INT
              Translate codons from given nucleotide (1-based)

       -T, --truncate
              Truncate alignment around full-length protein, Met to Stop Implies -F flag.

       -Y, --tolerant
              Translates cDNA with corrections for frameshifts

   Options for SAM output
       --no-sam-headers
              Do not print headers beginning with '@'

       --sam-use-0M
              Insert  0M  in  CIGAR between adjacent insertions and deletions Required by Picard,
              but can cause errors in other tools

       --read-group-id=STRING
              Value to put into read-group id (RG-ID) field

       --read-group-name=STRING
              Value to put into read-group name (RG-SM) field

       --read-group-library=STRING
              Value to put into read-group library (RG-LB) field

       --read-group-platform=STRING
              Value to put into read-group library (RG-PL) field

   Options for quality scores
       --quality-protocol=STRING
              Protocol for input quality scores. Allowed values:
               illumina (ASCII 64-126) (equivalent to -J 64 -j -31)
               sanger   (ASCII 33-126) (equivalent to -J 33 -j 0)

              Default is sanger (no quality print shift) SAM output  files  should  have  quality
              scores in sanger protocol.  Or you can specify the print shift with this flag:

       -j, --quality-print-shift=INT
              Shift  FASTQ  quality  scores  by  this  amount  in output (default is 0 for sanger
              protocol; to change Illumina input to Sanger output, select -31)

   External map file options
       -M, --mapdir=directory
              Map directory

       -m, --map=iitfile
              Map file. If argument is '?' (with the quotes), this lists available map files.

       -e, --mapexons
              Map each exon separately

       -b, --mapboth
              Report hits from both strands of genome

       -u, --flanking=INT
              Show flanking hits (default 0)

       --print-comment
              Show comment line for each hit

   Alignment output options
       -N, --nolengths
              No intron lengths in alignment

       -I, --invertmode=INT
              Mode for alignments to genomic (-) strand:
               0=Don't invert the cDNA (default)
               1=Invert cDNA and print genomic (-) strand
               2=Invert cDNA and print genomic (+) strand

       -i, --introngap=INT
              Nucleotides to show on each end of intron (default=3)

       -l, --wraplength=INT
              Wrap length for alignment (default=50)

   Help options
       --version
              Show version

       --help Show this help message

ENVIRONMENT

       GMAPDB genome directory (eqivalent to -D)

FILES

       ~/.gmaprc
              configuration file

AUTHOR

       Thomas D. Wu and Colin K. Watanabe

REPORTING BUGS

       Report bugs to Thomas Wu <twu@gene.com>.

COPYRIGHT

       Copyright 2005 Genentech, Inc. All rights reserved.

SEE ALSO

       gmap_setup(1), gsnap(1)
       http://research-pub.gene.com/gmap/