bionic (1) pfmake.1.gz

Provided by: pftools_3+dfsg-2build1_amd64 bug

NAME

       pfmake - generate a profile from a multiple sequence alignment

SYNOPSIS

       pfmake [ -0123abes ] [ msf-file | - ] score-matrix [ profile-file ]
                  [E=#] [F=#] [G=#] [H=#] [I=#] [L=#] [M=#] [S=#] [T=#] [X=#]

DESCRIPTION

       pfmake generates a PROSITE profile from a multiple sequence alignment using methods described by Gribskov
       et al. (1990), Luethy et al. (1994), and Thompson et  al.  (1994),  with  modifications  to  exploit  the
       features  of  the new profile format. The file containing the multiple sequence alignment (msf-file) must
       be in MSF format as generated by GCG programs or by readseq (checksums are  ignored).   The  score-matrix
       file  must  also be in GCG format.  If `-' is specified instead of a real filename, the multiple sequence
       alignment is read from the standard input.

       If an already existing profile is given as input via the third optional argument, the parameters  of  the
       DISJOINT, NORMALIZATION, AND CUT_OFF blocks will be read from input, all other profile parameters will be
       recalculated.  Header and footer lines outside the matrix block will also be transferred  from  input  to
       output.

       If  no  input  profile  is given, the disjointness definition will be set to PROTECT with borders leaving
       short unprotected tails (maximum  5  positions)  at  the  beginning  and  at  the  end  of  the  profile.
       Furthermore,  one  normalization mode (n-score = raw-score / F , where F is the output score multiplier ,
       see below), and two cut-off values (level 0:8.5, level -1:6.5) will be defined.

OPTIONS

       -0     Global alignment mode; initiation (termination) at low cost is  possible  only  if  the  alignment
              starts at the beginning (end) of the profile and at the beginning (end) of the sequence.

       -1     Domain  global  alignment  mode;  initiation  (termination)  at  low  cost is possible only at the
              beginning (end) of the profile; it may start and end at any position within the sequence.

       -2     Semi-global alignment mode; initiation (termination) at low cost  is  possible  if  the  alignment
              starts  either  at  the beginning (end) of the profile or at the beginning (end) of the sequences.
              This is the default alignment mode.

       -3     Local alignment mode; initiation (termination) at low cost is  possible  anywhere.  The  high-cost
              initiation/termination score (parameter H) is meaningless.

       -a     Causes pfsearch to weight gaps asymmetrically, as in Gribskov et al. (1990).

       -b     Block  profile  mode.  By  imposing  additional  constraints  on  the  placement of insertions and
              deletions, this mode produces  profiles  that  favor  alignments  with  insertions  and  deletions
              positioned symmetrically around a few positions. For each gap region a gap center is defined which
              usually corresponds to the place where gap excision has been applied (see parameter X).  If no gap
              excision  has been applied, the position is chosen such as to maximize the sum of deletion opening
              events before, and deletion closing events after the  gap  center.   Within  a  given  gap  region
              reduced  deletion  opening  penalties  are offered only before, reduced deletion closing penalties
              only after, and reduced insertion penalties only at the center.  This option is incompatible  with
              options -a and -e and automatically disables them.

       -c     Circular  profile.  The  topology  of  the profile is declared as circular. The first and the last
              insert positions are merged by retaining the higher value of each parameter type.

       -e     Enables endgap-weighting mode as implemented in the  GCG  program  ProfileMake.   Endgaps  in  the
              multiple  sequence  alignment will be interpreted as deletions relative to the other sequences and
              thus be considered for the delineation of gap regions.  The default  is  no  endgap  weighting  as
              introduced by Thompson et al. (1994) in the program ProfileWeight.

       -s     Causes  pfsearch  to weight gaps symmetrically (default mode). The initial gap opening scores (MD,
              MI) computed from the maximal gap length and the command-line parameters E,G,I,  and  M,  will  be
              divided by two and the resulting value will be assigned to both gap opening and gap closing scores
              (MI, IM, MD, DM).

PARAMETERS

       E=#    Gap extension penalty, see Gribskov et al. (1990). Default: E=0.2 (appropriate for 1/3  bit-scaled
              blosum45 matrix).

       F=#    Output  score  multiplier. On output, all profile scores are multiplied by this factor and rounded
              to nearest integers. Default: F=100.

       G=#    Gap opening penalty, see Gribskov et al. (1990). Default: G=2.1 (appropriate  for  1/3  bit-scaled
              blosum45 matrix).

       H=#    High-cost  initiation/termination  score.  This score will be applied to all external and internal
              initiation and termination scores corresponding to  path  matrix  positions  where  initiation  or
              termination  at  low  cost is not possible according to the alignment mode specified. Default: H=*
              (low-value).

       I=#    Gap penalty multiplier increment, see Gribskov et al. (1990).  Default: I=0.1.

       L=#    Low-cost initiation/termination score. This score will be applied to  all  external  and  internal
              initiation  and  termination  scores  corresponding  to  path matrix positions where initiation or
              termination at low cost is possible according to the alignment mode specified. Default: L=0.

       M=#    Maximum gap penalty multiplier, see Gribskov et al. (1990).  Default: M=0.333.

       S=#    Score matrix multiplier. On input, the numbers of the score matrix are multiplied by this  factor.
              Default: S=0.1.

       T=#    Gap  region  threshold.  This  is  the minimal fraction of gap characters a column of the multiple
              sequence alignment must contain in order to be considered part of a gap region. Default: T=0.01.

       X=#    Gap excision threshold. This is the minimal  fraction  of  non-gap  characters  a  column  of  the
              multiple  sequence  alignment  must contain in order to be converted into a match position. The IM
              and MI transition scores of insert positions corresponding to excised columns are set to zero; the
              other parameters remain unchanged.  Default: X=0.5.

EXAMPLES

       (1)    pfmake -b1 sh3.msf blosum45.cmp H=0.6 > sh3_block.prf

              Generates  a  domain-global  block  profile  from  a  multiple  alignment of SH3 domains using the
              blosum45 matrix.  sh3.msf contains a multiple alignment of 20 SH3 domains from SWISS-PROT  release
              32  including  sequence  weights.   blosum45.cmp contains a 1/3 bits-scaled blosum45 matrix in GCG
              format.  Note that fragment matches (alignments to parts of the profile) are  not  prohibited  but
              penalized by the parameter H=0.6.

REFERENCES

       Bucher  P,  Karplus  K,  Moeri  N  &  Hofmann,  K.  (1996).   A  flexible motif search technique based on
       generalized profiles.  Comput. Chem.  20:3-24.

       Gribskov M, Luethy R & Eisenberg D (1990).  Profile analysis.  Meth. Enzymol.  183:146-159.

       Luethy R, Xenarios I & Bucher P (1994).  Improving the sensitivity of the sequence profile method.  Prot.
       Sci.  3:139-146.

       Thompson  JD,  Higgins  DG & Gibson TJ (1994) Improved sensitivity of profile searches through the use of
       sequence weights and gap excision.  Comput. Appl. Biosci.  10:19-29.

AUTHOR

       Philipp Bucher
       Philipp.Bucher@isrec.unil.ch