Ubuntu Manpage: tigr-glimmer — Ceates and outputs an interpolated Markov model(IMM)

NAME

       tigr-glimmer — Ceates and outputs an interpolated Markov model(IMM)

SYNOPSIS

       tigr-build-icm

DESCRIPTION

       Program   build-icm.c  creates and outputs an interpolated Markov model (IMM) as described
       in the paper A.L. Delcher, D. Harmon, S. Kasif, O. White,  and  S.L.  Salzberg.   Improved
       Microbial  Gene  Identification  with  Glimmer.   Nucleic  Acids Research, 1999, in press.
       Please reference this paper if you use the system as part of any published research.

       Input comes from the file named on the command-line.  Format  should  be  one  string  per
       line.  Each line has an ID string followed by white space followed by the sequence itself.
       The script run-glimmer3 generates an input file in the correct format using the  'extract'
       program.

       The  IMM  is  constructed as follows: For a given context, say acgtta, we want to estimate
       the probability distribution of the  next  character.   We  shall  do  this  as  a  linear
       combination  of  the  observed  probability  distributions for this context and all of its
       suffixes, i.e., cgtta, gtta, tta, ta, a and empty.  By observed distributions I  mean  the
       counts  of  the  number  of  occurrences of these strings in the training set.  The linear
       combination is determined by a set of probabilities, lambda, one for each context  string.
       For context acgtta the linear combination coefficients are:

       lambda (acgtta) (1 - lambda (acgtta)) x lambda (cgtta) (1 - lambda (acgtta)) x (1 - lambda
       (cgtta)) x lambda (gtta) (1 - lambda (acgtta)) x (1  -  lambda  (cgtta))  x  (1  -  lambda
       (gtta))  x lambda (tta) (1 - lambda (acgtta)) x (1 - lambda (cgtta)) x (1 - lambda (gtta))
       x (1 - lambda (tta))  x (1 - lambda (ta))  x (1 - lambda (a))

       We compute the lambda values for each context as follows: - If the number of  observations
       in  the  training set is >= the constant SAMPLE_SIZE_BOUND, the lambda for that context is
       1.0 - Otherwise, do a chi-square test on the observations for this context compared to the
       distribution  predicted  for  the one-character shorter suffix context.  If the chi-square
       significance < 0.5, set the lambda for this context to 0.0 Otherwise set  the  lambda  for
       this context to: (chi-square significance) x (# observations) / SAMPLE_WEIGHT

       To run the program:

       build-icm <train.seq > train.model

       This  will  use the training data in train.seq to produce the file train.model, containing
       your IMM.

AUTHOR

       This  manual  page was quickly copied from the glimmer web site and readme file by Steffen
       Moeller moeller@debian.org for the Debian system.

                                                                       TIGR-GLIMMER     (1)   (1)

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR