Provided by: pftools_3.2.12-1_amd64 bug

NAME

       pfmake - generate a profile from a multiple sequence alignment

SYNOPSIS

       pfmake    [ -0123abcehlms ] [ -E gap_extend ] [ -F score_multiplier ] [ -G gap_open ] [ -H
                 high_init/term ] [ -I gap_increment ] [ -L low_init/term ] [ -M gap_multiplier ]
                 [  -S  matrix_multiplier ] [ -T gap_region ] [ -X gap_excision ] [ ms_file | - ]
                 score_matrix [ profile ] [ parameters ]

DESCRIPTION

       pfmake generates a PROSITE profile  from  a  multiple  sequence  alignment  using  methods
       described  by Gribskov et al.  (1990), Luethy et al.  (1994), and Thompson et al.  (1994),
       with modifications to exploit the features of the new profile format.  The file containing
       the multiple sequence alignment (ms_file) must be either in MSF format as generated by GCG
       programs or by readseq (checksums are ignored) or in MSA format as created by  psa2msa(1).
       If  '-'  is  specified instead of a filename, the multiple sequence alignment is read from
       the standard input. The score_matrix file must also be in GCG format.

       If an already existing profile is given as input via  the  third  optional  argument,  the
       parameters  of the DISJOINT, NORMALIZATION and CUT_OFF blocks will be read from input, all
       other profile parameters will be recalculated.  Header and footer lines outside the matrix
       block will also be transferred from input to output.

       If  no  input  profile  is  given, the disjointness definition will be set to PROTECT with
       borders leaving short unprotected tails (maximum 5 positions) at the beginning and at  the
       end  of the profile. Furthermore, one normalization mode (n_score = raw_score / F, where F
       is the output score multiplier, see below), and two cut-off values (level  0:  8.5,  level
       -1: 6.5) will be defined.

OPTIONS

       ms_file
              Input multiple sequence alignment.
              The content of the file must be either in MSF or in MSA format.  If the filename is
              replaced by a '-', pfmake will read the input alignment from stdin.

       score_matrix
              Residue score matrix file.
              Contains the substitution  scores  for  all  pairs  of  residues  of  the  sequence
              alphabet. The file must be in GCG format.

       profile
              Optional profile file.
              If  a  filename  is  specified,  the  profile  will  be parsed and those parameters
              mentioned in the description section will be kept for the computation of the output
              profile.

       -0     Global alignment mode.
              Initiation  (termination)  at  low cost is possible only if the alignment starts at
              the beginning (end) of the profile and at the beginning (end) of the sequence.

       -1     Domain global alignment mode.
              Initiation (termination) at low cost is possible only at the beginning (end) of the
              profile; it may start and end at any position within the sequence.

       -2     Semi-global alignment mode.
              Initiation  (termination) at low cost is possible if the alignment starts either at
              the beginning (end) of the profile or at the beginning (end) of the sequences.
              This is the default alignment mode.

       -3     Local alignment mode.
              Initiation  (termination)  at  low  cost  is  possible  anywhere.   The   high-cost
              initiation/termination score (parameter H) is meaningless.

       -a     Causes pfsearch to weight gaps asymmetrically, as in Gribskov et al.  (1990).

       -b     Block profile mode.
              By  imposing  additional  constraints on the placement of insertions and deletions,
              this mode produces profiles that favor alignments  with  insertions  and  deletions
              positioned  symmetrically  around a few positions. For each gap region a gap center
              is defined which usually corresponds to the  place  where  gap  excision  has  been
              applied  (see  parameter  X).  If no gap excision has been applied, the position is
              chosen such as to maximize the sum of deletion opening events before, and  deletion
              closing  events  after  the gap center.  Within a given gap region reduced deletion
              opening penalties are offered only before, reduced deletion closing penalties  only
              after, and reduced insertion penalties only at the center.
              This option is incompatible with options -a and -e and automatically disables them.

       -c     Circular profile.
              The  topology of the profile is declared as circular. The first and the last insert
              positions are merged by retaining the higher value of each parameter type.

       -e     Enables endgap-weighting mode  as  implemented  in  the  GCG  program  ProfileMake.
              Endgaps  in  the  multiple  sequence  alignment  will  be  interpreted as deletions
              relative to the other sequences and thus be considered for the delineation  of  gap
              regions.   The  default  is  no  endgap  weighting as introduced by Thompson et al.
              (1994) in the program ProfileWeight.

       -h     Display usage help text.

       -l     Remove output line length limit. Individual lines of the output profile can  exceed
              a length of 132 characters, removing the need to wrap them over several lines.

       -m     Input multiple sequence alignment is in MSA format.

       -s     Causes  pfsearch  to  weight  gaps  symmetrically  (default  mode). The initial gap
              opening scores (MD, MI) computed from the maximal gap length and  the  command-line
              parameters  E, G, I,  and M, will be divided by two and the resulting value will be
              assigned to both gap opening and gap closing scores (MI, IM, MD, DM).

       -E gap_extend
              Gap extension penalty.  See Gribskov et al.  (1990).
              Default: 0.2 (appropriate for 1/3 bit-scaled blosum45 matrix)

       -F score_multiplier
              Output score multiplier.
              On output, all profile scores are multiplied by this factor and rounded to  nearest
              integers.
              Default: 100

       -G gap_open
              Gap opening penalty.  See Gribskov et al.  (1990).
              Default: 2.1 (appropriate for 1/3 bit-scaled blosum45 matrix)

       -H high_init/term
              High-cost initiation/termination score.
              This  score will be applied to all external and internal initiation and termination
              scores corresponding to path matrix positions where initiation  or  termination  at
              low cost is not possible according to the alignment mode specified.
              Default: * (low-value)

       -I gap_increment
              Gap penalty multiplier increment.  See Gribskov et al.  (1990).
              Default: 0.1

       -L low_init/term
              Low-cost initiation/termination score.
              This  score will be applied to all external and internal initiation and termination
              scores corresponding to path matrix positions where initiation  or  termination  at
              low cost is possible according to the alignment mode specified.
              Default: 0

       -M gap_multiplier
              Maximum gap penalty multiplier.  See Gribskov et al.  (1990).  Default: 0.333

       -S matrix_multiplier
              Score matrix multiplier.
              On input, the numbers of the score matrix are multiplied by this factor.
              Default: 0.1

       -T gap_region
              Gap region threshold.
              This  is  the  minimal fraction of gap characters a column of the multiple sequence
              alignment must contain in order to be considered part of a gap region.
              Default: 0.01

       -X gap_excision
              Gap excision threshold.
              This is the minimal fraction  of  non-gap  characters  a  column  of  the  multiple
              sequence alignment must contain in order to be converted into a match position. The
              IM and MI transition scores of insert positions corresponding  to  excised  columns
              are set to zero; the other parameters remain unchanged.
              Default: 0.5

PARAMETERS

       Note:  for  backwards  compatibility,  release  2.3  of the pftools package will parse the
              version 2.2 style parameters, but these are deprecated and the corresponding option
              (refer to the options section) should be used instead.

       E=#    Gap extension penalty.
              Use option -E instead.

       F=#    Output score multiplier.
              Use option -F instead.

       G=#    Gap opening penalty
              Use option -G instead.

       H=#    High cost initiation/termination score.
              Use option -H instead.

       I=#    Gap penalty multiplier increment.
              Use option -I instead.

       L=#    Low cost initiation/termination score.
              Use option -L instead.

       M=#    maximum gap penalty multiplier.
              Use option -M instead.

       S=#    Score matrix multiplier.
              Use option -S instead.

       T=#    Gap region threshold.
              Use option -T instead.

       X=#    Gap excision threshold.
              Use option -X instead.

EXAMPLES

       (1)    pfmake -b1 -H 0.6 sh3.msf blosum45.cmp > sh3_block.prf

              Generates  a  domain-global  block profile from a multiple alignment of SH3 domains
              using the blosum45 matrix.  The file 'sh3.msf' contains a multiple alignment of  20
              SH3  domains  from  SWISS-PROT  release  32  including  sequence weights.  The file
              'blosum45.cmp' contains a 1/3 bits-scaled blosum45 matrix in GCG format.
              Note that fragment matches (alignments to parts of the profile) are not  prohibited
              but penalized by the option -H 0.6.

EXIT CODE

       On  successful  completion  of its task, pfmake will return an exit code of 0. If an error
       occurs, a diagnostic message will be output on standard error and the exit  code  will  be
       different  from 0. When conflicting options where passed to the program but the task could
       nevertheless be completed, warnings will be issued on standard error.

REFERENCES

       Bucher P, Karplus K, Moeri N & Hofmann, K. (1996).   A  flexible  motif  search  technique
       based on generalized profiles.  Comput. Chem.  20:3-24.

       Gribskov   M,   Luethy  R  &  Eisenberg  D  (1990).   Profile  analysis.   Meth.  Enzymol.
       183:146-159.

       Luethy R, Xenarios I & Bucher P (1994).  Improving the sensitivity of the sequence profile
       method.  Prot. Sci.  3:139-146.

       Thompson  JD,  Higgins  DG  &  Gibson  TJ  (1994) Improved sensitivity of profile searches
       through the use of sequence weights and gap excision.  Comput. Appl. Biosci.  10:19-29.

SEE ALSO

       pfsearch(1), pfscan(1), psa2msa(1), psa(5), xpsa(5)

AUTHOR

       The pftools package was developed by Philipp Bucher.
       Any comments or suggestions should be addressed to <pftools@sib.swiss>.