Ubuntu Manpage: gmap - Genomic Mapping and Alignment Program

Provided by: gmap_2012-06-12-1ubuntu1_amd64

NAME

       gmap - Genomic Mapping and Alignment Program

SYNOPSIS

       gmap -dDB|-gFASTA [OPTION]... [QUERY]...

DESCRIPTION

       Align the sequences QUERY to the reference, specified with -d or -g.  With no QUERY, read standard input.

OPTIONS

   Input options
       -D, --dir=directory
              Genome directory

       -d, --db=STRING
              Genome database. If argument is '?' (with the quotes), this command lists available databases.

       -k, --kmer=INT
              kmer  size  to use in genome database (allowed values: 16 or less).  If not specified, the program
              will find the highest available kmer size in the genome database

       --basesize=INT
              Base size to use in genome database.   If  not  specified,  the  program  will  find  the  highest
              available base size in the genome database within selected k-mer size

       --sampling=INT
              Sampling  to  use  in  genome  database.   If  not  specified,  the program will find the smallest
              available sampling value in the genome database within selected basesize and k-mer size

       -G, --genomefull
              Use full genome (all ASCII chars allowed; built explicitly during setup), not compressed version

       -g, --gseg=filename
              User-supplied genomic segment

       -1, --selfalign
              Align one sequence  against  itself  in  FASTA  format  via  stdin  (Useful  for  getting  protein
              translation of a nucleotide sequence)

       -2, --pairalign
              Align two sequences in FASTA format via stdin, first one being genomic and second one being cDNA

       --cmdline=STRING,STRING
              Align  these  two  sequences  provided on the command line, first one being genomic and second one
              being cDNA

       -q, --part=INT/INT
              Process only the i-th out of every n sequences e.g., 0/100 or 99/100 (useful for distributing jobs
              to a computer farm).

       --input-buffer=INT
              Size of input buffer (program reads this many sequences at a time for efficiency) (default 1000)

   Computation options
       -B, --batch=INT
               Mode     Offsets       Positions       Genome
                 0      allocate      mmap            mmap
                 1      allocate      mmap & preload  mmap
                 2      allocate      mmap & preload  mmap & preload (default)
                 3      allocate      allocate        mmap & preload
                 4      allocate      allocate        allocate
                 5      expand        allocate        allocate

              Note: For a single sequence, all data structures use mmap.  If mmap not available and allocate not
              chosen, then will use fileio (very slow)

       --nosplicing
              Turns off splicing (useful for aligning genomic sequences onto a genome)

       --min-intronlength=INT
              Min length for one internal intron (default 9).  Below this size, a genomic gap will be considered
              a deletion rather than an intron.

       -K, --intronlength=INT
              Max length for one internal intron (default 1000000)

       -w, --localsplicedist=INT
              Max length for known splice sites at ends of sequence (default 200000)

       -L, --totallength=INT
              Max total intron length (default 2400000)

       -x, --chimera-margin=INT
              Amount of unaligned sequence that  triggers  search  for  the  remaining  sequence  (default  40).
              Enables  alignment  of chimeric reads, and may help with some non-chimeric reads. To turn off, set
              to a large value (greater than the query length).

       -t, --nthreads=INT
              Number of worker threads

       -C, --chrsubsetfile=filename
              User-supplied chromosome subset file

       -c, --chrsubset=string
              Chromosome subset to search

       -z, --direction=STRING
              cDNA direction (sense_force, antisense_force, sense_filter, antisense_filter, or auto (default))

       -H, --trimendexons=INT
              Trim end exons with fewer than given number of matches (in nt, default 12)

       --cross-species
              For cross-species alignments, use a more sensitive search for canonical splicing

       --canonical-mode=INT
              Reward for canonical and semi-canonical introns  0=low  reward,  1=high  reward  (default),  2=low
              reward for high-identity sequences and high reward otherwise

       --allow-close-indels=INT
              Allow  an  insertion  and  deletion  close  to each other (0=no, 1=yes (default), 2=only for high-
              quality alignments)

       --microexon-spliceprob=FLOAT
              Allow microexons only if one of the splice site probabilities is greater than this value  (default
              0.90)

       --cmetdir=STRING
              Directory  for methylcytosine index files (created using cmetindex) (default is location of genome
              index files specified using -D, -V, and -d)

       --atoidir=STRING
              Directory for A-to-I RNA editing index files (created using atoiindex)  (default  is  location  of
              genome index files specified using -D, -V, and -d)

       --mode=STRING
              Alignment  mode:  standard  (default),  cmet-stranded,  cmet-nonstranded,  atoi-stranded, or atoi-
              nonstranded.  Non-standard modes requires you to have previously run the  cmetindex  or  atoiindex
              programs on the genome

       -p, --prunelevel
              Pruning level: 0=no pruning (default), 1=poor seqs, 2=repetitive seqs, 3=poor and repetitive

   Output types
       -S, --summary
              Show summary of alignments only

       -A, --align
              Show alignments

       -3, --continuous
              Show alignment in three continuous lines

       -4, --continuous-by-exon
              Show alignment in three lines per exon

       -Z, --compress
              Print output in compressed format

       -E, --exons=STRING
              Print exons ("cdna" or "genomic")

       -P, --protein_dna
              Print protein sequence (cDNA)

       -Q, --protein_gen
              Print protein sequence (genomic)

       -f, --format=INT
              Other  format  for  output  (also note the -A and -S options and other options listed under Output
              types):
               psl (or 1)= PSL (BLAT) format,
               gff3_gene (or 2)= GFF3 gene format,
               gff3_match_cdna (or 3)= GFF3 cDNA_match format,
               gff3_match_est (or 4) = GFF3 EST_match format,
               splicesites (or 6) = splicesites output (for GSNAP splicing file),
               introns = introns output (for GSNAP splicing file),
               map_exons (or 7) = IIT FASTA exon map format,
               map_genes (or 8) = IIT FASTA map format,
               coords (or 9) = coords in table format,
               sampe = SAM format (setting paired_read bit in flag),
               samse = SAM format (without setting paired_read bit)

   Output options
       -n, --npaths=INT
              Maximum number of paths to show. If set to 0, prints two paths if chimera detected, else one.

       --quiet-if-excessive
              If more than maximum number of paths are found, then nothing is printed.

       --suboptimal-score=INT
              Report only paths whose score is within this value of the best path. By default, if this option is
              not provided, the program prints all paths found.

       -O, --ordered
              Print output in same order as input (relevant only if there is more than one worker thread)

       -5, --md5
              Print MD5 checksum for each query sequence

       -o, --chimera-overlap
              Overlap to show, if any, at chimera breakpoint

       --failsonly
              Print only failed alignments, those with no results

       --nofails
              Exclude printing of failed alignments

       --fails-as-input
              Print completely failed alignments as input FASTA or FASTQ format

       -V, --usesnps=STRING
              Use database containing  known  SNPs  (in  <STRING>.iit,  built  previously  using  snpindex)  for
              reporting output

       --split-output=STRING
              Basename  for  multiple-file  output,  separately  for  nomapping,  uniq,  mult,  (and chimera, if
              --chimera-margin is selected)

       --output-buffer-size=INT
              Buffer size, in queries, for output thread (default 1000).  When  the  number  of  results  to  be
              printed exceeds this size, the worker threads are halted until the backlog is cleared

       -F, --fulllength
              Assume full-length protein, starting with Met

       --cdsstart=INT
              Translate codons from given nucleotide (1-based)

       -T, --truncate
              Truncate alignment around full-length protein, Met to Stop Implies -F flag.

       -Y, --tolerant
              Translates cDNA with corrections for frameshifts

   Options for SAM output
       --no-sam-headers
              Do not print headers beginning with '@'

       --sam-use-0M
              Insert  0M  in  CIGAR  between adjacent insertions and deletions Required by Picard, but can cause
              errors in other tools

       --read-group-id=STRING
              Value to put into read-group id (RG-ID) field

       --read-group-name=STRING
              Value to put into read-group name (RG-SM) field

       --read-group-library=STRING
              Value to put into read-group library (RG-LB) field

       --read-group-platform=STRING
              Value to put into read-group library (RG-PL) field

   Options for quality scores
       --quality-protocol=STRING
              Protocol for input quality scores. Allowed values:
               illumina (ASCII 64-126) (equivalent to -J 64 -j -31)
               sanger   (ASCII 33-126) (equivalent to -J 33 -j 0)

              Default is sanger (no quality print shift) SAM output files should have quality scores  in  sanger
              protocol.  Or you can specify the print shift with this flag:

       -j, --quality-print-shift=INT
              Shift  FASTQ  quality scores by this amount in output (default is 0 for sanger protocol; to change
              Illumina input to Sanger output, select -31)

   External map file options
       -M, --mapdir=directory
              Map directory

       -m, --map=iitfile
              Map file. If argument is '?' (with the quotes), this lists available map files.

       -e, --mapexons
              Map each exon separately

       -b, --mapboth
              Report hits from both strands of genome

       -u, --flanking=INT
              Show flanking hits (default 0)

       --print-comment
              Show comment line for each hit

   Alignment output options
       -N, --nolengths
              No intron lengths in alignment

       -I, --invertmode=INT
              Mode for alignments to genomic (-) strand:
               0=Don't invert the cDNA (default)
               1=Invert cDNA and print genomic (-) strand
               2=Invert cDNA and print genomic (+) strand

       -i, --introngap=INT
              Nucleotides to show on each end of intron (default=3)

       -l, --wraplength=INT
              Wrap length for alignment (default=50)

   Help options
       --version
              Show version

       --help Show this help message

ENVIRONMENT

       GMAPDB genome directory (eqivalent to -D)

FILES

       ~/.gmaprc
              configuration file

AUTHOR

       Thomas D. Wu and Colin K. Watanabe

REPORTING BUGS

       Report bugs to Thomas Wu <twu@gene.com>.

COPYRIGHT

       Copyright 2005 Genentech, Inc. All rights reserved.

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

ENVIRONMENT

FILES

AUTHOR

REPORTING BUGS

COPYRIGHT

SEE ALSO