Provided by: pftools_3+dfsg-2build1_amd64 

NAME
pfmake - generate a profile from a multiple sequence alignment
SYNOPSIS
pfmake [ -0123abes ] [ msf-file | - ] score-matrix [ profile-file ]
[E=#] [F=#] [G=#] [H=#] [I=#] [L=#] [M=#] [S=#] [T=#] [X=#]
DESCRIPTION
pfmake generates a PROSITE profile from a multiple sequence alignment using methods described by Gribskov
et al. (1990), Luethy et al. (1994), and Thompson et al. (1994), with modifications to exploit the
features of the new profile format. The file containing the multiple sequence alignment (msf-file) must
be in MSF format as generated by GCG programs or by readseq (checksums are ignored). The score-matrix
file must also be in GCG format. If `-' is specified instead of a real filename, the multiple sequence
alignment is read from the standard input.
If an already existing profile is given as input via the third optional argument, the parameters of the
DISJOINT, NORMALIZATION, AND CUT_OFF blocks will be read from input, all other profile parameters will be
recalculated. Header and footer lines outside the matrix block will also be transferred from input to
output.
If no input profile is given, the disjointness definition will be set to PROTECT with borders leaving
short unprotected tails (maximum 5 positions) at the beginning and at the end of the profile.
Furthermore, one normalization mode (n-score = raw-score / F , where F is the output score multiplier ,
see below), and two cut-off values (level 0:8.5, level -1:6.5) will be defined.
OPTIONS
-0 Global alignment mode; initiation (termination) at low cost is possible only if the alignment
starts at the beginning (end) of the profile and at the beginning (end) of the sequence.
-1 Domain global alignment mode; initiation (termination) at low cost is possible only at the
beginning (end) of the profile; it may start and end at any position within the sequence.
-2 Semi-global alignment mode; initiation (termination) at low cost is possible if the alignment
starts either at the beginning (end) of the profile or at the beginning (end) of the sequences.
This is the default alignment mode.
-3 Local alignment mode; initiation (termination) at low cost is possible anywhere. The high-cost
initiation/termination score (parameter H) is meaningless.
-a Causes pfsearch to weight gaps asymmetrically, as in Gribskov et al. (1990).
-b Block profile mode. By imposing additional constraints on the placement of insertions and
deletions, this mode produces profiles that favor alignments with insertions and deletions
positioned symmetrically around a few positions. For each gap region a gap center is defined which
usually corresponds to the place where gap excision has been applied (see parameter X). If no gap
excision has been applied, the position is chosen such as to maximize the sum of deletion opening
events before, and deletion closing events after the gap center. Within a given gap region
reduced deletion opening penalties are offered only before, reduced deletion closing penalties
only after, and reduced insertion penalties only at the center. This option is incompatible with
options -a and -e and automatically disables them.
-c Circular profile. The topology of the profile is declared as circular. The first and the last
insert positions are merged by retaining the higher value of each parameter type.
-e Enables endgap-weighting mode as implemented in the GCG program ProfileMake. Endgaps in the
multiple sequence alignment will be interpreted as deletions relative to the other sequences and
thus be considered for the delineation of gap regions. The default is no endgap weighting as
introduced by Thompson et al. (1994) in the program ProfileWeight.
-s Causes pfsearch to weight gaps symmetrically (default mode). The initial gap opening scores (MD,
MI) computed from the maximal gap length and the command-line parameters E,G,I, and M, will be
divided by two and the resulting value will be assigned to both gap opening and gap closing scores
(MI, IM, MD, DM).
PARAMETERS
E=# Gap extension penalty, see Gribskov et al. (1990). Default: E=0.2 (appropriate for 1/3 bit-scaled
blosum45 matrix).
F=# Output score multiplier. On output, all profile scores are multiplied by this factor and rounded
to nearest integers. Default: F=100.
G=# Gap opening penalty, see Gribskov et al. (1990). Default: G=2.1 (appropriate for 1/3 bit-scaled
blosum45 matrix).
H=# High-cost initiation/termination score. This score will be applied to all external and internal
initiation and termination scores corresponding to path matrix positions where initiation or
termination at low cost is not possible according to the alignment mode specified. Default: H=*
(low-value).
I=# Gap penalty multiplier increment, see Gribskov et al. (1990). Default: I=0.1.
L=# Low-cost initiation/termination score. This score will be applied to all external and internal
initiation and termination scores corresponding to path matrix positions where initiation or
termination at low cost is possible according to the alignment mode specified. Default: L=0.
M=# Maximum gap penalty multiplier, see Gribskov et al. (1990). Default: M=0.333.
S=# Score matrix multiplier. On input, the numbers of the score matrix are multiplied by this factor.
Default: S=0.1.
T=# Gap region threshold. This is the minimal fraction of gap characters a column of the multiple
sequence alignment must contain in order to be considered part of a gap region. Default: T=0.01.
X=# Gap excision threshold. This is the minimal fraction of non-gap characters a column of the
multiple sequence alignment must contain in order to be converted into a match position. The IM
and MI transition scores of insert positions corresponding to excised columns are set to zero; the
other parameters remain unchanged. Default: X=0.5.
EXAMPLES
(1) pfmake -b1 sh3.msf blosum45.cmp H=0.6 > sh3_block.prf
Generates a domain-global block profile from a multiple alignment of SH3 domains using the
blosum45 matrix. sh3.msf contains a multiple alignment of 20 SH3 domains from SWISS-PROT release
32 including sequence weights. blosum45.cmp contains a 1/3 bits-scaled blosum45 matrix in GCG
format. Note that fragment matches (alignments to parts of the profile) are not prohibited but
penalized by the parameter H=0.6.
REFERENCES
Bucher P, Karplus K, Moeri N & Hofmann, K. (1996). A flexible motif search technique based on
generalized profiles. Comput. Chem. 20:3-24.
Gribskov M, Luethy R & Eisenberg D (1990). Profile analysis. Meth. Enzymol. 183:146-159.
Luethy R, Xenarios I & Bucher P (1994). Improving the sensitivity of the sequence profile method. Prot.
Sci. 3:139-146.
Thompson JD, Higgins DG & Gibson TJ (1994) Improved sensitivity of profile searches through the use of
sequence weights and gap excision. Comput. Appl. Biosci. 10:19-29.
AUTHOR
Philipp Bucher
Philipp.Bucher@isrec.unil.ch
pftools 2.2 July 1999 PFMAKE(1)