bionic (1) ptof.1.gz

Provided by: pftools_3+dfsg-2build1_amd64 bug

NAME

       ptof - convert a protein profile into a frame-search profile

SYNOPSIS

       ptof [ -r ] [ protein-profile ] [B=#] [F=#] [I=#] [X=#] [Y=#] [Z=#]

DESCRIPTION

       ptof  converts a protein profile (generated for instance by pftools programs pfmake, gtop or htop) into a
       so-called "frame-search profile".  A frame-search profile  is  used  to  search  an  "interleaved  frame-
       translated"  DNA sequence (generated by pftools program 2ft) for occurrences of a protein sequence motif.
       An "interleaved frame-translated" DNA sequence is  an  amino  acid  sequence  corresponding  to  the  N-2
       overlapping  codons  of  a  DNA sequence of length N.  Note that in such a sequence, the character "O" is
       used to represent stop codons.

       The conversion procedure works as follows: The protein profile is expanded in length by a factor of three
       to accommodate three translated codons per original match position.  Two dummy match positions are placed
       between two consecutive significant match positions imported from the  original  profile.   The  original
       insert  positions  are  placed  between  pairs  of  adjacent  dummy  match  positions.   The  initiation,
       termination, and transition scores of the original  insert  positions  are  left  unchanged;  the  insert
       extension  scores are divided by a factor of 3, or by the value of the command-line parameter I.  The two
       insert positions flanking the significant match positions serve to  accommodate  frame-shift  errors  and
       introns,  respectively.   The  frame-shift  insert position allows free insertion opening combined with a
       high insert extension penalty (command-line parameter F) whereas the intron insertion position  has  high
       opening  but low extension penalties (command line parameters Y and Z).  The deletion opening and closing
       penalties next to the significant match positions are set to values that ensure that the total cost of  a
       single-base deletion is the same as the cost of a single base-insertion at a frame-shift insert position.
       Furthermore, the alphabet of the original profile is extended by the  stop  codon  symbol  "O"  which  is
       assigned a constant negative value (command-line parameter X) at significant match positions, and zero at
       dummy match positions. At insert positions, it is set to  the  average  of  the  other  insert  extension
       scores.

OPTIONS

       -r     Frame-search  parameters  are given in normalized score units. This option will only be considered
              if a linear normalization function  with  priority  over  all  other  normalization  functions  is
              specified  in  the  profile.   In this case, the frame-search scores specified on the command line
              will be divided by the slope (R2  parameter)  of  the  normalization  function.   This  option  is
              particularly  useful  for  profiles  which  are already scaled in units that can be interpreted as
              −Log(P)-values, e.g.  bits.

PARAMETERS

       B=#    Minimal insertion and termination score. All internal  and  external  initiation  and  termination
              scores  will be set to this value if the corresponding value in the original profile is lower than
              this value.  This parameter is used to impose a more local alignment behavior on the  frame-search
              profile  in  order  to  deal  with  discontinuities  in  DNA  sequences (long introns, alternative
              splicing, chimeric clones, etc.)  Default: B=−50(−0.5 with option -r).

       F=#    Frame-shift error penalty. Default: F=−100(−1.0 with option -r).

       F=#    Insert score multiplier. The values of the original insert extension scores will be multiplied  by
              this  factor  in  order  to  compensate for the fact that a single amino acid corresponds to three
              overlapping codon positions in the target sequence.  Default: I=1/3.

       X=#    Stop codon penalty.  Default: X=−100(−1.0 with option -r)

       Y=#    Intron opening penalty.  Default: Y=−300(−3.0 with option -r).

       Z=#    Intron extension penalty.  Default: Z=−1(−0.01 with option -r)

EXAMPLES

       (1)    ptof -r sh3.prf F=−1.2 I=0.6 X=−1.5 B=−0.5 > sh3.fsp
              2ft < R76849.seq | pfsearch -fy  sh3.fsp - C=5.0

              The protein domain profile in sh3.prf is first converted  into  a  frame-search  profile  sh3.fps.
              Then  both  strands  of  the  Fasta-formatted  EST sequence in R76849.seq (GenBank/EMBL-accession:
              R76849) are converted into interleaved frame-translated protein sequences  and  searched  for  SH3
              domains with the frame-search profile generated in the preceding step.

              The  output  may  be compared to the result of a more conventional search strategy using a protein
              profile in conjunction with a six-frame translation of the same DNA sequence:

              6ft < R76849.seq | pfsearch -fy  sh3.prf - C=5.0

              See also manual pages of pfsearch, 2ft and 6ft.

AUTHOR

       Philipp Bucher
       Philipp.Bucher@isrec.unil.ch