Ubuntu Manpage: pfscale - fit parameters of an extreme-value distribution to a profile score list

Provided by: pftools_3+dfsg-2build1_amd64

NAME

       pfscale - fit parameters of an extreme-value distribution to a profile score list

SYNOPSIS

       pfscale [ score-list | - ] [ profile-file ] [L=#] [N=#] [P=#] [Q=#]

DESCRIPTION

       pfscale  fits  the  two  parameters  of an extreme-value distribution to a score distribution obtained by
       searching a sequence database with a profile.  score-list is  a  sorted  list  of  profile  match  scores
       generated by pfsearch.  The result is written to the standard output.

       If  the original profile is given as the second argument, the normalization function specified within the
       profile will be updated such as to produce -Log10  per-residue  E-values.   If  the  second  argument  is
       omitted,  the  output  consists  of  a  header line containing the normalization parameters followed by a
       modified score list,  showing  original  scores,  normalized  scores,  and  corresponding  log-cumulative
       frequencies next to each other.

       Note  that  this  program  implements  the  significance  estimation  procedure  for profile match scores
       described in (Hofmann & Bucher 1995).  It  has  been  used  for  the  calculation  of  the  normalization
       parameters of all profiles in PROSITE.

PARAMETERS

       L=#    Logarithmic  base  of  the parameters of the estimated extreme-value distribution.  The parameters
              reported by pfscale are expressed as logarithms and thus can be inserted directly  into  a  linear
              normalization function defined in a generalized profile.  Default: L=10.

       N=#    Size  of  the  database  from  which  the  input score list was derived.  The searched database is
              typically a shuffled version  of  a  real  protein  or  nucleotide  sequence  database.   Default:
              N=14147368 (size of SWISS-PROT release 30 and shuffled derivatives of it).

       P=#    Upper  threshold  of the probability range to which the extreme-value distribution will be fitted.
              For instance: if N=10'000'000 and P=0.0001 (default value for P) then profile match  scores  below
              rank  1000  in  the sorted input list (corresponding to occurrence probabilities > 0.0001) will be
              ignored.

       Q=#    Lower threshold of the probability range to which the extreme-value distribution will  be  fitted.
              For instance: if N=10'000'000 and Q=0.000001 (default value for Q) then profile match scores above
              rank  10  in  the sorted input list (corresponding to occurrence probabilities < 0.000001) will be
              ignored.

EXAMPLES

       (1)    pfsearch -fr sh3.prf shuffle20.seq C=200 | sort -nr | pfscale - P=0.0001 Q=0.000001

              derives score-normalization parameters for the  SH3  domain  profile  in  sh3.prf.   shuffle20.seq
              contains  a  window-shuffled  derivative of SWISS-PROT release 30 in Pearson/Fasta format (window-
              size 20).  Note that the implicit default of N corresponds to the size of this database  and  thus
              needs  not  to  be specified on the command line.  The cut-off value C=200 will produce about 2000
              matches completely covering the range defined by the command  line  parameters  of  P  and  Q.   A
              suitable  cut-off  value  has to be guessed in advance by computing a few optimal alignment scores
              for random sequences.

REFERENCES

       Hofmann K & Bucher P (1995).  The FHA-domain: a nuclear signalling domain found in  protein  kinases  and
       transcription factors.  Trends Biochem. Sci.  20:47-349.

AUTHOR

       Philipp Bucher
       Philipp.Bucher@isrec.unil.ch

pftools 2.2                                         July 1999                                         PFSCALE(1)