Provided by: pftools_3+dfsg-3_amd64 bug

NAME

       pfmake - generate a profile from a multiple sequence alignment

SYNOPSIS

       pfmake [ -0123abes ] [ msf-file | - ] score-matrix [ profile-file ]
                  [E=#] [F=#] [G=#] [H=#] [I=#] [L=#] [M=#] [S=#] [T=#] [X=#]

DESCRIPTION

       pfmake  generates  a  PROSITE  profile  from  a  multiple sequence alignment using methods
       described by Gribskov et al. (1990), Luethy et al. (1994), and  Thompson  et  al.  (1994),
       with  modifications to exploit the features of the new profile format. The file containing
       the multiple sequence alignment (msf-file) must be in  MSF  format  as  generated  by  GCG
       programs or by readseq (checksums are ignored).  The score-matrix file must also be in GCG
       format.  If `-' is specified instead of a real filename, the multiple  sequence  alignment
       is read from the standard input.

       If  an  already  existing  profile  is given as input via the third optional argument, the
       parameters of the DISJOINT, NORMALIZATION, AND CUT_OFF blocks will be read from input, all
       other profile parameters will be recalculated.  Header and footer lines outside the matrix
       block will also be transferred from input to output.

       If no input profile is given, the disjointness definition will  be  set  to  PROTECT  with
       borders  leaving short unprotected tails (maximum 5 positions) at the beginning and at the
       end of the profile. Furthermore, one normalization mode (n-score = raw-score / F , where F
       is  the  output  score multiplier , see below), and two cut-off values (level 0:8.5, level
       -1:6.5) will be defined.

OPTIONS

       -0     Global alignment mode; initiation (termination) at low cost is possible only if the
              alignment  starts  at the beginning (end) of the profile and at the beginning (end)
              of the sequence.

       -1     Domain global alignment mode; initiation (termination) at low cost is possible only
              at  the beginning (end) of the profile; it may start and end at any position within
              the sequence.

       -2     Semi-global alignment mode; initiation (termination) at low cost is possible if the
              alignment  starts  either at the beginning (end) of the profile or at the beginning
              (end) of the sequences. This is the default alignment mode.

       -3     Local alignment mode; initiation (termination) at low cost  is  possible  anywhere.
              The high-cost initiation/termination score (parameter H) is meaningless.

       -a     Causes pfsearch to weight gaps asymmetrically, as in Gribskov et al. (1990).

       -b     Block  profile  mode.  By  imposing  additional  constraints  on  the  placement of
              insertions and deletions, this mode produces profiles that  favor  alignments  with
              insertions  and deletions positioned symmetrically around a few positions. For each
              gap region a gap center is defined which usually corresponds to the place where gap
              excision  has been applied (see parameter X).  If no gap excision has been applied,
              the position is chosen such as to maximize  the  sum  of  deletion  opening  events
              before,  and  deletion  closing  events  after  the gap center.  Within a given gap
              region reduced deletion opening penalties are offered only before, reduced deletion
              closing  penalties  only after, and reduced insertion penalties only at the center.
              This option is incompatible with options -a and -e and automatically disables them.

       -c     Circular profile. The topology of the profile is declared as  circular.  The  first
              and  the  last  insert  positions  are merged by retaining the higher value of each
              parameter type.

       -e     Enables endgap-weighting mode  as  implemented  in  the  GCG  program  ProfileMake.
              Endgaps  in  the  multiple  sequence  alignment  will  be  interpreted as deletions
              relative to the other sequences and thus be considered for the delineation  of  gap
              regions.   The  default  is  no  endgap  weighting as introduced by Thompson et al.
              (1994) in the program ProfileWeight.

       -s     Causes pfsearch to weight  gaps  symmetrically  (default  mode).  The  initial  gap
              opening  scores  (MD, MI) computed from the maximal gap length and the command-line
              parameters E,G,I, and M, will be divided by two and the  resulting  value  will  be
              assigned to both gap opening and gap closing scores (MI, IM, MD, DM).

PARAMETERS

       E=#    Gap  extension penalty, see Gribskov et al. (1990). Default: E=0.2 (appropriate for
              1/3 bit-scaled blosum45 matrix).

       F=#    Output score multiplier. On output, all  profile  scores  are  multiplied  by  this
              factor and rounded to nearest integers. Default: F=100.

       G=#    Gap  opening  penalty,  see Gribskov et al. (1990). Default: G=2.1 (appropriate for
              1/3 bit-scaled blosum45 matrix).

       H=#    High-cost initiation/termination score. This score will be applied to all  external
              and  internal  initiation  and  termination  scores  corresponding  to  path matrix
              positions where initiation or termination at low cost is not possible according  to
              the alignment mode specified. Default: H=* (low-value).

       I=#    Gap penalty multiplier increment, see Gribskov et al. (1990).  Default: I=0.1.

       L=#    Low-cost  initiation/termination  score. This score will be applied to all external
              and internal  initiation  and  termination  scores  corresponding  to  path  matrix
              positions  where initiation or termination at low cost is possible according to the
              alignment mode specified. Default: L=0.

       M=#    Maximum gap penalty multiplier, see Gribskov et al. (1990).  Default: M=0.333.

       S=#    Score matrix multiplier. On input, the numbers of the score matrix  are  multiplied
              by this factor. Default: S=0.1.

       T=#    Gap  region  threshold.  This is the minimal fraction of gap characters a column of
              the multiple sequence alignment must contain in order to be considered  part  of  a
              gap region. Default: T=0.01.

       X=#    Gap excision threshold. This is the minimal fraction of non-gap characters a column
              of the multiple sequence alignment must contain in order to  be  converted  into  a
              match  position.  The IM and MI transition scores of insert positions corresponding
              to excised columns  are  set  to  zero;  the  other  parameters  remain  unchanged.
              Default: X=0.5.

EXAMPLES

       (1)    pfmake -b1 sh3.msf blosum45.cmp H=0.6 > sh3_block.prf

              Generates  a  domain-global  block profile from a multiple alignment of SH3 domains
              using the blosum45 matrix.  sh3.msf contains a multiple alignment of 20 SH3 domains
              from SWISS-PROT release 32 including sequence weights.  blosum45.cmp contains a 1/3
              bits-scaled blosum45 matrix in GCG format.  Note that fragment matches  (alignments
              to parts of the profile) are not prohibited but penalized by the parameter H=0.6.

REFERENCES

       Bucher  P,  Karplus  K,  Moeri  N & Hofmann, K. (1996).  A flexible motif search technique
       based on generalized profiles.  Comput. Chem.  20:3-24.

       Gribskov  M,  Luethy  R  &  Eisenberg  D  (1990).   Profile  analysis.    Meth.   Enzymol.
       183:146-159.

       Luethy R, Xenarios I & Bucher P (1994).  Improving the sensitivity of the sequence profile
       method.  Prot. Sci.  3:139-146.

       Thompson JD, Higgins DG & Gibson  TJ  (1994)  Improved  sensitivity  of  profile  searches
       through the use of sequence weights and gap excision.  Comput. Appl. Biosci.  10:19-29.

AUTHOR

       Philipp Bucher
       Philipp.Bucher@isrec.unil.ch