Ubuntu Manpage: tigr-glimmer — Find/Score potential genes in genome-file using the probability model in icm-file

NAME

       tigr-glimmer — Find/Score potential genes in genome-file using the probability model in icm-file

SYNOPSIS

       tigr-glimmer3 [genome-file]  [icm-file]  [[options]]

DESCRIPTION

tigr-glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria and
archaea. tigr-glimmer (Gene Locator and Interpolated Markov Modeler) uses interpolated Markov models
(IMMs) to identify the coding regions and distinguish them from noncoding DNA. The IMM approach,
described in our Nucleic Acids Research paper on tigr-glimmer 1.0 and in our subsequent paper on tigr-
glimmer 2.0, uses a combination of Markov models from 1st through 8th-order, weighting each model
according to its predictive power. tigr-glimmer 1.0 and 2.0 use 3-periodic nonhomogenous Markov models in
their IMMs.

tigr-glimmer is the primary microbial gene finder at TIGR, and has been used to annotate the complete
genomes of B. burgdorferi (Fraser et al., Nature, Dec. 1997), T. pallidum (Fraser et al., Science, July
1998), T. maritima, D. radiodurans, M. tuberculosis, and non-TIGR projects including C. trachomatis, C.
pneumoniae, and others. Its analyses of some of these genomes and others is available at the TIGR
microbial database site.

A special version of tigr-glimmer designed for small eukaryotes, GlimmerM, was used to find the genes in
chromosome 2 of the malaria parasite, P. falciparum.. GlimmerM is described in S.L. Salzberg, M. Pertea,
A.L. Delcher, M.J. Gardner, and H. Tettelin, "Interpolated Markov models for eukaryotic gene finding,"
Genomics 59 (1999), 24-31. Click here (http://www.tigr.org/software/glimmerm/) to visit the GlimmerM
site, which includes information on how to download the GlimmerM system.

The tigr-glimmer system consists of two main programs. The first of these is the training program, build-
imm. This program takes an input set of sequences and builds and outputs the IMM for them. These
sequences can be complete genes or just partial orfs. For a new genome, this training data can consist of
those genes with strong database hits as well as very long open reading frames that are statistically
almost certain to be genes. The second program is glimmer, which uses this IMM to identify putative genes
in an entire genome. tigr-glimmer automatically resolves conflicts between most overlapping genes by
choosing one of them. It also identifies genes that are suspected to truly overlap, and flags these for
closer inspection by the user. These ``suspect'' gene candidates have been a very small percentage of the
total for all the genomes analyzed thus far. tigr-glimmer is a program that...

OPTIONS

-C n Use n as GC percentage of independent model

Note: n should be a percentage, e.g., -C 45.2

-f Use ribosome-binding energy to choose start codon

+f Use first codon in orf as start codon

-g n Set minimum gene length to n

-i filename
Use filename to select regions of bases that are off limits, so that no bases within that
area will be examined

-l Assume linear rather than circular genome, i.e., no wraparound

-L filename
Use filename to specify a list of orfs that should be scored separately, with no overlap rules

-M Input is a multifasta file of separate genes to be scored separately, with no overlap rules

-o n Set minimum overlap length to n. Overlaps shorter than this are ignored.

-p n Set minimum overlap percentage to n%. Overlaps shorter than this percentage of *both* strings
are ignored.

-q n Set the maximum length orf that can be rejected because of the independent probability score
column to (n - 1)

-r Don't use independent probability score column

+r Use independent probability score column

-r Don't use independent probability score column

-s s Use string s as the ribosome binding pattern to find start codons.

+S Do use stricter independent intergenic model that doesn't give probabilities to in-frame stop
codons. (Option is obsolete since this is now the only behaviour

-t n Set threshold score for calling as gene to n. If the in-frame score >= n, then the region is
given a number and considered a potential gene.

-w n Use "weak" scores on tentative genes n or longer. Weak scores ignore the independent
probability score.

AUTHOR

       This manual page was quickly copied from the glimmer web site by Steffen Moeller  moeller@debian.org  for
       the Debian system.

                                                                                                 TIGR-GLIMMER(1)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

SEE ALSO

AUTHOR