Provided by: pftools_3+dfsg-3_amd64 bug

NAME

       pfscale - fit parameters of an extreme-value distribution to a profile score list

SYNOPSIS

       pfscale [ score-list | - ] [ profile-file ] [L=#] [N=#] [P=#] [Q=#]

DESCRIPTION

       pfscale  fits  the two parameters of an extreme-value distribution to a score distribution
       obtained by searching a sequence database with a profile.  score-list is a sorted list  of
       profile match scores generated by pfsearch.  The result is written to the standard output.

       If  the  original  profile  is  given  as  the second argument, the normalization function
       specified within the profile will be updated such as  to  produce  -Log10  per-residue  E-
       values.   If  the  second  argument  is  omitted,  the  output  consists  of a header line
       containing the normalization  parameters  followed  by  a  modified  score  list,  showing
       original  scores,  normalized scores, and corresponding log-cumulative frequencies next to
       each other.

       Note that this program implements the significance estimation procedure for profile  match
       scores  described in (Hofmann & Bucher 1995).  It has been used for the calculation of the
       normalization parameters of all profiles in PROSITE.

PARAMETERS

       L=#    Logarithmic base of the parameters of  the  estimated  extreme-value  distribution.
              The  parameters  reported  by  pfscale  are expressed as logarithms and thus can be
              inserted directly into a linear normalization function  defined  in  a  generalized
              profile.  Default: L=10.

       N=#    Size  of  the  database  from which the input score list was derived.  The searched
              database is typically a shuffled version of a real protein or  nucleotide  sequence
              database.   Default:  N=14147368  (size  of  SWISS-PROT  release  30  and  shuffled
              derivatives of it).

       P=#    Upper threshold of the probability range to which  the  extreme-value  distribution
              will  be  fitted.  For instance: if N=10'000'000 and P=0.0001 (default value for P)
              then profile match scores below rank 1000 in the sorted input  list  (corresponding
              to occurrence probabilities > 0.0001) will be ignored.

       Q=#    Lower  threshold  of  the probability range to which the extreme-value distribution
              will be fitted.  For instance: if N=10'000'000 and Q=0.000001 (default value for Q)
              then  profile match scores above rank 10 in the sorted input list (corresponding to
              occurrence probabilities < 0.000001) will be ignored.

EXAMPLES

       (1)    pfsearch -fr sh3.prf shuffle20.seq C=200 | sort -nr | pfscale - P=0.0001 Q=0.000001

              derives score-normalization parameters for  the  SH3  domain  profile  in  sh3.prf.
              shuffle20.seq  contains  a  window-shuffled  derivative of SWISS-PROT release 30 in
              Pearson/Fasta format (window-size  20).   Note  that  the  implicit  default  of  N
              corresponds  to the size of this database and thus needs not to be specified on the
              command line.  The cut-off value C=200 will produce about 2000  matches  completely
              covering  the  range defined by the command line parameters of P and Q.  A suitable
              cut-off value has to be guessed in advance by computing  a  few  optimal  alignment
              scores for random sequences.

REFERENCES

       Hofmann K & Bucher P (1995).  The FHA-domain: a nuclear signalling domain found in protein
       kinases and transcription factors.  Trends Biochem. Sci.  20:47-349.

AUTHOR

       Philipp Bucher
       Philipp.Bucher@isrec.unil.ch