bionic (1) pfscale.1.gz

Provided by: pftools_3+dfsg-2build1_amd64 bug

NAME

       pfscale - fit parameters of an extreme-value distribution to a profile score list

SYNOPSIS

       pfscale [ score-list | - ] [ profile-file ] [L=#] [N=#] [P=#] [Q=#]

DESCRIPTION

       pfscale  fits  the  two  parameters  of an extreme-value distribution to a score distribution obtained by
       searching a sequence database with a profile.  score-list is  a  sorted  list  of  profile  match  scores
       generated by pfsearch.  The result is written to the standard output.

       If  the original profile is given as the second argument, the normalization function specified within the
       profile will be updated such as to produce -Log10  per-residue  E-values.   If  the  second  argument  is
       omitted,  the  output  consists  of  a  header line containing the normalization parameters followed by a
       modified score list,  showing  original  scores,  normalized  scores,  and  corresponding  log-cumulative
       frequencies next to each other.

       Note  that  this  program  implements  the  significance  estimation  procedure  for profile match scores
       described in (Hofmann & Bucher 1995).  It  has  been  used  for  the  calculation  of  the  normalization
       parameters of all profiles in PROSITE.

PARAMETERS

       L=#    Logarithmic  base  of  the parameters of the estimated extreme-value distribution.  The parameters
              reported by pfscale are expressed as logarithms and thus can be inserted directly  into  a  linear
              normalization function defined in a generalized profile.  Default: L=10.

       N=#    Size  of  the  database  from  which  the  input score list was derived.  The searched database is
              typically a shuffled version  of  a  real  protein  or  nucleotide  sequence  database.   Default:
              N=14147368 (size of SWISS-PROT release 30 and shuffled derivatives of it).

       P=#    Upper  threshold  of the probability range to which the extreme-value distribution will be fitted.
              For instance: if N=10'000'000 and P=0.0001 (default value for P) then profile match  scores  below
              rank  1000  in  the sorted input list (corresponding to occurrence probabilities > 0.0001) will be
              ignored.

       Q=#    Lower threshold of the probability range to which the extreme-value distribution will  be  fitted.
              For instance: if N=10'000'000 and Q=0.000001 (default value for Q) then profile match scores above
              rank 10 in the sorted input list (corresponding to occurrence probabilities <  0.000001)  will  be
              ignored.

EXAMPLES

       (1)    pfsearch -fr sh3.prf shuffle20.seq C=200 | sort -nr | pfscale - P=0.0001 Q=0.000001

              derives  score-normalization  parameters  for  the  SH3  domain profile in sh3.prf.  shuffle20.seq
              contains a window-shuffled derivative of SWISS-PROT release 30 in  Pearson/Fasta  format  (window-
              size  20).   Note that the implicit default of N corresponds to the size of this database and thus
              needs not to be specified on the command line.  The cut-off value C=200 will  produce  about  2000
              matches  completely  covering  the  range  defined  by  the command line parameters of P and Q.  A
              suitable cut-off value has to be guessed in advance by computing a few  optimal  alignment  scores
              for random sequences.

REFERENCES

       Hofmann  K  &  Bucher P (1995).  The FHA-domain: a nuclear signalling domain found in protein kinases and
       transcription factors.  Trends Biochem. Sci.  20:47-349.

AUTHOR

       Philipp Bucher
       Philipp.Bucher@isrec.unil.ch