Ubuntu Manpage: compstruct - calculate accuracy of RNA secondary structure predictions

Provided by: biosquid_1.9g+cvs20050121-5_amd64

NAME

       compstruct - calculate accuracy of RNA secondary structure predictions

SYNOPSIS

       compstruct [options] trusted_file test_file

DESCRIPTION

       compstruct  evaluates  the  accuracy  of  RNA  secondary structure predictions, at the on a per-base-pair
       basis.  The trusted_file contains one or more sequences with  trusted  (known)  RNA  secondary  structure
       annotation.  The  test_file  contains the same sequences, in the same order, with predicted RNA secondary
       structure annotation.  compstruct reads the  structures  and  compares  them,  and  calculates  both  the
       sensitivity  (the  number  of true base pairs that are correctly predicted) and the specificity (positive
       predictive value, the number of predicted base pairs that are  true).   Results  are  reported  for  each
       individual sequence, and in summary for all sequences together.

       Both files must contain secondary structure annotation in WUSS notation. Only SELEX and Stockholm formats
       support structure markup at present.

       The  default definition of a correctly predicted base pair is that a true pair (i,j) must exactly match a
       predicted pair (i,j).

       Mathews, Zuker, Turner and colleagues (see: Mathews et al., JMB 288:911-940, 1999)  use  a  more  relaxed
       definition.  Mathews defines "correct" as follows: a true pair (i,j) is correctly predicted if any of the
       following pairs are predicted: (i,j), (i+1,j),  (i-1,j),  (i,j+1),  or  (i,j-1).  This  rule  allows  for
       "slipped  helices"  off  by  one  base.   The  -m option activates this rule for both sensitivity and for
       specificity. For specificity, the rule is reversed: predicted pair (i,j) is considered to be true if  the
       true structure contains one of the five pairs (i,j), (i+1,j), (i-1,j), (i,j+1), or (i,j-1).

OPTIONS

       -h     Print brief help; includes version number and summary of all options, including expert options.

       -m     Use  the  Mathews relaxed accuracy rule (see above), instead of requiring exact prediction of base
              pairs.

       -p     Count pseudoknotted base pairs towards the accuracy, in either trusted or predicted structures. By
              default, pseudoknots are ignored.

              Normally, only the trusted_file  would  have  pseudoknot  annotation,  since  most  RNA  secondary
              structure  prediction  programs  do  not  predict  pseudoknots.  Using the -p option allows you to
              penalize the prediction program for not predicting known pseudoknots. In a  case  where  both  the
              trusted_file  and  the  test_file  have  pseudoknot  annotation,   the  -p  option  lets you count
              pseudoknots in evaluating the prediction accuracy. Beware, however,  the  case  where  you  use  a
              pseudoknot-capable  prediction  program  to  generate the test_file, but the trusted_file does not
              have pseudoknot annotation; in this case, -p will  penalize  any  predicted  pseudoknots  when  it
              calculates  specificity,  even  if  they're  right,  because  they  don't  appear  in  the trusted
              annotation; this is probably not what you'd want to do.

EXPERT OPTIONS

       --informat <s>
              Specify that the two sequence files are in format <s>. In this case, both files  must  be  in  the
              same  format. The default is to autodetect the file formats, in which case they could be different
              (one SELEX, one Stockholm).

       --quiet
              Don't print any verbose header information.

AUTHOR

       Biosquid  and its documentation are Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
       Freely distributed under the GNU General Public License (GPL) See COPYING in the source code distribution
       for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu

Biosquid 1.9g                                     January 2003                                     compstruct(1)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

EXPERT OPTIONS

SEE ALSO

AUTHOR