Provided by: pftools_3+dfsg-3_amd64 bug

NAME

       ptof - convert a protein profile into a frame-search profile

SYNOPSIS

       ptof [ -r ] [ protein-profile ] [B=#] [F=#] [I=#] [X=#] [Y=#] [Z=#]

DESCRIPTION

       ptof  converts  a protein profile (generated for instance by pftools programs pfmake, gtop
       or htop) into a so-called "frame-search profile".   A  frame-search  profile  is  used  to
       search  an  "interleaved frame-translated" DNA sequence (generated by pftools program 2ft)
       for occurrences of a  protein  sequence  motif.   An  "interleaved  frame-translated"  DNA
       sequence  is  an  amino acid sequence corresponding to the N-2 overlapping codons of a DNA
       sequence of length N.  Note that in  such  a  sequence,  the  character  "O"  is  used  to
       represent stop codons.

       The  conversion procedure works as follows: The protein profile is expanded in length by a
       factor of three to accommodate three translated codons per original match  position.   Two
       dummy  match  positions  are  placed  between  two consecutive significant match positions
       imported from the original profile.  The original  insert  positions  are  placed  between
       pairs  of  adjacent  dummy  match  positions.  The initiation, termination, and transition
       scores of the original insert positions are left unchanged; the  insert  extension  scores
       are  divided  by  a factor of 3, or by the value of the command-line parameter I.  The two
       insert positions flanking the significant match positions serve to accommodate frame-shift
       errors  and  introns, respectively.  The frame-shift insert position allows free insertion
       opening combined with a high insert extension penalty (command-line parameter  F)  whereas
       the  intron  insertion position has high opening but low extension penalties (command line
       parameters Y and Z).  The deletion opening and closing penalties next to  the  significant
       match  positions  are  set  to  values  that  ensure  that the total cost of a single-base
       deletion is the same as the cost of  a  single  base-insertion  at  a  frame-shift  insert
       position.  Furthermore, the alphabet of the original profile is extended by the stop codon
       symbol "O" which is assigned a constant  negative  value  (command-line  parameter  X)  at
       significant match positions, and zero at dummy match positions. At insert positions, it is
       set to the average of the other insert extension scores.

OPTIONS

       -r     Frame-search parameters are given in normalized score units. This option will  only
              be  considered  if  a  linear  normalization  function with priority over all other
              normalization functions is specified in the profile.   In  this  case,  the  frame-
              search  scores  specified  on  the  command  line  will be divided by the slope (R2
              parameter) of the normalization function.  This option is particularly  useful  for
              profiles   which   are   already  scaled  in  units  that  can  be  interpreted  as
              −Log(P)-values, e.g.  bits.

PARAMETERS

       B=#    Minimal insertion and termination score. All internal and external  initiation  and
              termination  scores  will  be  set  to this value if the corresponding value in the
              original profile is lower than this value.  This parameter is used to impose a more
              local  alignment  behavior  on  the  frame-search  profile  in  order  to deal with
              discontinuities in DNA sequences  (long  introns,  alternative  splicing,  chimeric
              clones, etc.)  Default: B=−50(−0.5 with option -r).

       F=#    Frame-shift error penalty. Default: F=−100(−1.0 with option -r).

       F=#    Insert score multiplier. The values of the original insert extension scores will be
              multiplied by this factor in order to compensate for the fact that a  single  amino
              acid  corresponds  to  three  overlapping  codon  positions in the target sequence.
              Default: I=1/3.

       X=#    Stop codon penalty.  Default: X=−100(−1.0 with option -r)

       Y=#    Intron opening penalty.  Default: Y=−300(−3.0 with option -r).

       Z=#    Intron extension penalty.  Default: Z=−1(−0.01 with option -r)

EXAMPLES

       (1)    ptof -r sh3.prf F=−1.2 I=0.6 X=−1.5 B=−0.5 > sh3.fsp
              2ft < R76849.seq | pfsearch -fy  sh3.fsp - C=5.0

              The protein domain profile in  sh3.prf  is  first  converted  into  a  frame-search
              profile  sh3.fps.   Then  both  strands  of  the  Fasta-formatted  EST  sequence in
              R76849.seq (GenBank/EMBL-accession: R76849) are converted into  interleaved  frame-
              translated  protein  sequences  and  searched for SH3 domains with the frame-search
              profile generated in the preceding step.

              The output may be compared to the result of a  more  conventional  search  strategy
              using a protein profile in conjunction with a six-frame translation of the same DNA
              sequence:

              6ft < R76849.seq | pfsearch -fy  sh3.prf - C=5.0

              See also manual pages of pfsearch, 2ft and 6ft.

AUTHOR

       Philipp Bucher
       Philipp.Bucher@isrec.unil.ch