Ubuntu Manpage: tigr-glimmer — Creates and outputs an interpolated Markov model(IMM)

Provided by: tigr-glimmer_3.02b-2build1_amd64

NAME

       tigr-glimmer — Creates and outputs an interpolated Markov model(IMM)

SYNOPSIS

       tigr-build-icm

DESCRIPTION

       Program   build-icm.c   creates  and outputs an interpolated Markov model (IMM) as described in the paper
       A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg.  Improved Microbial  Gene  Identification
       with Glimmer.  Nucleic Acids Research, 1999, in press.  Please reference this paper if you use the system
       as part of any published research.

       Input comes from the file named on the command-line.  Format should be one string per  line.   Each  line
       has  an  ID  string  followed  by  white  space followed by the sequence itself.  The script run-glimmer3
       generates an input file in the correct format using the 'extract' program.

       The IMM is constructed as follows: For a given context, say acgtta, we want to estimate  the  probability
       distribution of the next character.  We shall do this as a linear combination of the observed probability
       distributions for this context and all of its suffixes, i.e., cgtta, gtta, tta,  ta,  a  and  empty.   By
       observed  distributions  I  mean the counts of the number of occurrences of these strings in the training
       set.  The linear combination is determined by a set  of  probabilities,  lambda,  one  for  each  context
       string.  For context acgtta the linear combination coefficients are:

       lambda  (acgtta)  (1  -  lambda (acgtta)) x lambda (cgtta) (1 - lambda (acgtta)) x (1 - lambda (cgtta)) x
       lambda (gtta) (1 - lambda (acgtta)) x (1 - lambda (cgtta)) x (1 - lambda (gtta))  x  lambda  (tta)  (1  -
       lambda  (acgtta))  x (1 - lambda (cgtta)) x (1 - lambda (gtta)) x (1 - lambda (tta))  x (1 - lambda (ta))
       x (1 - lambda (a))

       We compute the lambda values for each context as follows: - If the number of observations in the training
       set is >= the constant SAMPLE_SIZE_BOUND, the lambda for that context is 1.0 - Otherwise, do a chi-square
       test on the observations for this context compared to the distribution predicted  for  the  one-character
       shorter  suffix  context.   If  the chi-square significance < 0.5, set the lambda for this context to 0.0
       Otherwise  set  the  lambda  for  this  context  to:  (chi-square  significance)  x  (#  observations)  /
       SAMPLE_WEIGHT

       To run the program:

       build-icm <train.seq > train.model

       This will use the training data in train.seq to produce the file train.model, containing your IMM.

AUTHOR

       This  manual  page  was  quickly  copied  from  the  glimmer  web site and readme file by Steffen Moeller
       moeller@debian.org for the Debian system.

                                                                                      TIGR-GLIMMER     (1)   (1)

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR