Provided by: phast_1.6+dfsg-3_amd64 bug

NAME

       pbsScoreMatrix - Generate log-odds score matrices for use in alignment of

DESCRIPTION

       Generate  log-odds  score  matrices  for  use  in  alignment  of  probabilistic biological
       sequences (PBSs).  By default, generates a matrix for every branch of the tree (as defined
       in   tree.mod),   but  can  also  generate  a  matrix  for  a  given  branch  length  (see
       --branch-length).  For a code size of N,  an  N  x  N  matrix  is  generated  by  default;
       --half-pbs  will  produce  an  N  x  4  matrix,  and  --no-pbs will produce a 4 x 4 matrix
       (assuming a four-character nucleotide alphabet).

       Two sequences are assumed  to  have  evolved  from  a  common  ancestor  by  a  reversible
       continuous-time  Markov substitution process, and to be separated by a branch of length t.
       The conditional probability of a base j in one sequence given a base i in the other, P(j |
       i, t) is given by element (i, j) of the matrix

              P(t) = exp(Qt)

       where  Q  is the rate matrix defining the substitution process, and element (i, j) of Q is
       the instantaneous rate at which base i changes to base j.

       Let S_t(i, j) be a log odds score for the alignment of two bases, i and j, based on P(t):

              S_t(i, j) = log P(i, j | t) / (pi(i) * pi(j))

              = log P(j | i, t) pi(i) / (pi(i) * pi(j))

              = log P(j | i, t) / pi(j)                         (1)

       where pi(x) is the "equilibrium" or  "background"  probability  of  base  x.   Because  of
       reversibility,  S(i, j) = S(j, i), and the S(i, j) form a symmetric 4 x 4 matrix.  This is
       the matrix that is generated by pbsScoreMatrix with the --no-pbs option.  If each "letter"
       in  each  sequence represents a probability distribution over bases, as in a PBS, then the
       score for two letters k and l can be shown to be

              S'_t(k, l) = log sum_i sum_j p_k(i) p_l(j) exp S_t(i, j) (2)

       where the two sums are over the four bases, p_k(i) is the probability of base i under  the
       distribution for k, and p_l(j) is the probability of base j under the distribution for l.

       Notice that (2) reduces to (1) when p_k(i) = p_l(j) = 1 for some i and j and for all other
       i' and j' p_k(i') = p_l(j') = 0 (i.e., when all of the probability mass  is  on  a  single
       base  in  both distributions and the PBS reduces to an ordinary nucleotide sequence).  The
       special case of p_l(j) = 1 only is also of interest when aligning a PBS and  a  nucleotide
       sequence:

              S''_t(k, j) = log sum_i p_k(i) exp S_t(i, j) (3)

       This is the matrix generated by pbsScoreMatrix with the --half-pbs option.  Note: all logs
       are base 2.

EXAMPLE

       Generate an N x N matrix for every branch of the tree, using a code file "code" (generated
       by pbsTrain) and a tree model file "mytree.mod" (generated by phyloFit):

              pbsScoreMatrix mytree.mod code > matrices.dat

       Generate an N x N matrix for a branch length of 0.2 expected substitutions per site.

              pbsScoreMatrix --branch-length 0.2 mytree.mod code > matrix.dat

       Generate an N x 4 matrix:

              pbsScoreMatrix --branch-length 0.2 --half-pbs mytree.mod code > matrix.dat

       Generate a 4 x 4 matrix:

              pbsScoreMatrix --branch-length 0.2 --no-pbs code mytree.mod > matrix.dat

       (In this case, a code file is not needed.)

OPTIONS

       --branch-length, -t <length>

              Output  a  matrix  for  a  branch of the specified length, rather than a matrix for
              every branch of the tree.  The given length must be non-negative and  in  units  of
              expected substitutions per site.

       --half-pbs, -H

              Output an N x 4 matrix, as described above.

       --no-pbs,  -N Output a 4 x 4 matrix, as described above.  With this option, a code file is
              not needed.

       --help, -h

              Show this help message.