Provided by: pftools_3.2.12-1_amd64 bug

NAME

       pfscale - fit parameters of an extreme-value distribution to a profile score list

SYNOPSIS

       pfscale   [  -hl  ]  [ -L log_base ] [ -M mode_nb ] [ -N db_size ] [ -P upper_limit ] [ -Q
                 lower_limit ] [ score_list | - ] [ profile ] [ parameters ]

DESCRIPTION

       pfscale fits the two parameters  of  an  extreme-value  distribution  to  a  sorted  score
       distribution  obtained  by  searching  a  sequence  database  with  a  profile.   The file
       'score_list' is a sorted list of profile match scores generated by pfsearch.   If  '-'  is
       specified  instead  of  a  filename,  the  score list is read from the standard input. The
       result is written to the standard output.

       If the original profile is given as the second argument, the normalization  function  with
       the  lowest mode number or the lowest priority number specified within the profile will be
       updated such as to produce  -Log10  per-residue  E-values.   If  the  second  argument  is
       omitted,  the  output  consists  of  a header line containing the normalization parameters
       followed by a modified score list, showing score rank, original raw scores, log-cumulative
       frequencies and corresponding normalized scores next to each other.

       Note  that this program implements the significance estimation procedure for profile match
       scores described in Hofmann & Bucher (1995).  It has been used for the calculation of  the
       normalization parameters of all profiles in the PROSITE database.

OPTIONS

       score_list
              Input score list.
              The  file  must  contain  a  sorted list of scores. The first field of each line is
              considered as being a score, all other fields on the same line  are  ignored.   The
              different  fields of each line should be delimited by whitespaces.  If the filename
              is replaced by a '-', pfscale will read the score list from stdin.

       profile
              Optional profile file.
              If a filename is specified, the profile  will  be  parsed  and  either  the  lowest
              priority  mode or the mode number specified with option -M will be scaled. All cut-
              off levels which use the specified mode number will also be updated.

       -h     Display usage help text.

       -l     Remove output line length limit. Individual lines of the output profile can  exceed
              a length of 132 characters, removing the need to wrap them over several lines.

       -L log_base
              Logarithmic  base  of  the  parameters of the estimated extreme-value distribution.
              The parameters reported by pfscale are expressed as  logarithms  and  thus  can  be
              inserted  directly  into  a  linear normalization function defined in a generalized
              profile.
              Default: 10

       -M mode_nb
              Mode number to scale.
              Defines which mode number (and implicitly which cut-off level) of the input PROSITE
              profile  should be scaled. This overrides the default behaviour of scaling only the
              normalization mode with the lowest priority (or lowest mode number).   All  cut-off
              levels defined in the profile as using this mode number (via the MODE keyword) will
              be updated as well.

       -N db_size
              Size of the database from which the input score list  was  derived.   The  searched
              database  is  typically a shuffled version of a real protein or nucleotide sequence
              database.
              Default: 14147368 (size of SWISS-PROT release 30 and shuffled derivatives of it).

       -P upper_limit
              Upper threshold of the probability range to which  the  extreme-value  distribution
              will  be  fitted.   For  instance:  if N=10'000'000 and P=0.0001 then profile match
              scores below rank 1000 in  the  sorted  input  list  (corresponding  to  occurrence
              probabilities > 0.0001) will be ignored.
              Default: 0.0001

       -Q lower_limit
              Lower  threshold  of  the probability range to which the extreme-value distribution
              will be fitted.  For instance: if N=10'000'000 and Q=0.000001  then  profile  match
              scores  above  rank  10  in  the  sorted  input  list  (corresponding to occurrence
              probabilities < 0.000001) will be ignored.
              Default: 0.000001

PARAMETERS

       Note:  for backwards compatibility, release 2.3 of the  pftools  package  will  parse  the
              version 2.2 style parameters, but these are deprecated and the corresponding option
              (refer to the options section) should be used instead.

       L=#    Logarithmic base.
              Use option -L instead.

       M=#    Mode number.
              Use option -M instead.

       N=#    Database size.
              Use option -N instead.

       P=#    Upper probability threshold.
              Use option -P instead.

       Q=#    Lower probability threshold.
              Use option -Q instead.

EXAMPLES

       (1)    pfsearch -fr -C 200 sh3.prf shuffle20.seq  |  sort  -nr  |  pfscale  -P  0.0001  -Q
              0.000001 -

              derives   score-normalization  parameters  for  the  SH3  domain  profile  in  file
              'sh3.prf'.  The file  'shuffle20.seq'  contains  a  window-shuffled  derivative  of
              SWISS-PROT  release  30  in  Pearson/Fasta  format (window-size 20).  Note that the
              implicit default of N corresponds to the size of this database and thus  needs  not
              to  be  specified  on  the command line.  The cut-off value 200 for the pfsearch(1)
              option -C will produce about 2000 matches completely covering the range defined  by
              the  command line parameters -P and -Q of pfscale.  A suitable cut-off value has to
              be guessed in advance by computing  a  few  optimal  alignment  scores  for  random
              sequences.

EXIT CODE

       On  successful  completion of its task, pfscale will return an exit code of 0. If an error
       occurs, a diagnostic message will be output on standard error and the exit  code  will  be
       different  from 0. When conflicting options where passed to the program but the task could
       nevertheless be completed, warnings will be issued on standard error.

NOTES

       (1)    The current version of pfscale does not  yet  support  the  xpsa(5)  output  format
              produced by pfscan(1) or pfsearch(1).  The score list should therefore be generated
              without the pfscan(1) and pfsearch(1) option -k.

REFERENCES

       Hofmann K & Bucher P. (1995).  The  FHA-domain:  a  nuclear  signalling  domain  found  in
       protein kinases and transcription factors.  Trends Biochem. Sci.  20:47-349.

SEE ALSO

       pfsearch(1), pfscan(1), xpsa(5)

AUTHOR

       The pftools package was developed by Philipp Bucher.
       Any comments or suggestions should be addressed to <pftools@sib.swiss>.