Provided by: pftools_3+dfsg-2build1_amd64 

NAME
pfscale - fit parameters of an extreme-value distribution to a profile score list
SYNOPSIS
pfscale [ score-list | - ] [ profile-file ] [L=#] [N=#] [P=#] [Q=#]
DESCRIPTION
pfscale fits the two parameters of an extreme-value distribution to a score distribution obtained by
searching a sequence database with a profile. score-list is a sorted list of profile match scores
generated by pfsearch. The result is written to the standard output.
If the original profile is given as the second argument, the normalization function specified within the
profile will be updated such as to produce -Log10 per-residue E-values. If the second argument is
omitted, the output consists of a header line containing the normalization parameters followed by a
modified score list, showing original scores, normalized scores, and corresponding log-cumulative
frequencies next to each other.
Note that this program implements the significance estimation procedure for profile match scores
described in (Hofmann & Bucher 1995). It has been used for the calculation of the normalization
parameters of all profiles in PROSITE.
PARAMETERS
L=# Logarithmic base of the parameters of the estimated extreme-value distribution. The parameters
reported by pfscale are expressed as logarithms and thus can be inserted directly into a linear
normalization function defined in a generalized profile. Default: L=10.
N=# Size of the database from which the input score list was derived. The searched database is
typically a shuffled version of a real protein or nucleotide sequence database. Default:
N=14147368 (size of SWISS-PROT release 30 and shuffled derivatives of it).
P=# Upper threshold of the probability range to which the extreme-value distribution will be fitted.
For instance: if N=10'000'000 and P=0.0001 (default value for P) then profile match scores below
rank 1000 in the sorted input list (corresponding to occurrence probabilities > 0.0001) will be
ignored.
Q=# Lower threshold of the probability range to which the extreme-value distribution will be fitted.
For instance: if N=10'000'000 and Q=0.000001 (default value for Q) then profile match scores above
rank 10 in the sorted input list (corresponding to occurrence probabilities < 0.000001) will be
ignored.
EXAMPLES
(1) pfsearch -fr sh3.prf shuffle20.seq C=200 | sort -nr | pfscale - P=0.0001 Q=0.000001
derives score-normalization parameters for the SH3 domain profile in sh3.prf. shuffle20.seq
contains a window-shuffled derivative of SWISS-PROT release 30 in Pearson/Fasta format (window-
size 20). Note that the implicit default of N corresponds to the size of this database and thus
needs not to be specified on the command line. The cut-off value C=200 will produce about 2000
matches completely covering the range defined by the command line parameters of P and Q. A
suitable cut-off value has to be guessed in advance by computing a few optimal alignment scores
for random sequences.
REFERENCES
Hofmann K & Bucher P (1995). The FHA-domain: a nuclear signalling domain found in protein kinases and
transcription factors. Trends Biochem. Sci. 20:47-349.
AUTHOR
Philipp Bucher
Philipp.Bucher@isrec.unil.ch
pftools 2.2 July 1999 PFSCALE(1)