Provided by: biosquid_1.9g+cvs20050121-11_amd64 bug

NAME

       compstruct - calculate accuracy of RNA secondary structure predictions

SYNOPSIS

       compstruct [options] trusted_file test_file

DESCRIPTION

       compstruct  evaluates  the  accuracy  of  RNA  secondary structure predictions, at the on a per-base-pair
       basis.  The trusted_file contains one or more sequences with  trusted  (known)  RNA  secondary  structure
       annotation.  The  test_file  contains the same sequences, in the same order, with predicted RNA secondary
       structure annotation.  compstruct reads the  structures  and  compares  them,  and  calculates  both  the
       sensitivity  (the  number  of true base pairs that are correctly predicted) and the specificity (positive
       predictive value, the number of predicted base pairs that are  true).   Results  are  reported  for  each
       individual sequence, and in summary for all sequences together.

       Both files must contain secondary structure annotation in WUSS notation. Only SELEX and Stockholm formats
       support structure markup at present.

       The default definition of a correctly predicted base pair is that a true pair (i,j) must exactly match  a
       predicted pair (i,j).

       Mathews,  Zuker,  Turner  and  colleagues (see: Mathews et al., JMB 288:911-940, 1999) use a more relaxed
       definition. Mathews defines "correct" as follows: a true pair (i,j) is correctly predicted if any of  the
       following  pairs  are  predicted:  (i,j),  (i+1,j),  (i-1,j),  (i,j+1),  or (i,j-1). This rule allows for
       "slipped helices" off by one base.  The -m option activates  this  rule  for  both  sensitivity  and  for
       specificity.  For specificity, the rule is reversed: predicted pair (i,j) is considered to be true if the
       true structure contains one of the five pairs (i,j), (i+1,j), (i-1,j), (i,j+1), or (i,j-1).

OPTIONS

       -h     Print brief help; includes version number and summary of all options, including expert options.

       -m     Use the Mathews relaxed accuracy rule (see above), instead of requiring exact prediction  of  base
              pairs.

       -p     Count pseudoknotted base pairs towards the accuracy, in either trusted or predicted structures. By
              default, pseudoknots are ignored.

              Normally, only the trusted_file  would  have  pseudoknot  annotation,  since  most  RNA  secondary
              structure  prediction  programs  do  not  predict  pseudoknots.  Using the -p option allows you to
              penalize the prediction program for not predicting known pseudoknots. In a  case  where  both  the
              trusted_file  and  the  test_file  have  pseudoknot  annotation,   the  -p  option  lets you count
              pseudoknots in evaluating the prediction accuracy. Beware, however,  the  case  where  you  use  a
              pseudoknot-capable  prediction  program  to  generate the test_file, but the trusted_file does not
              have pseudoknot annotation; in this case, -p will  penalize  any  predicted  pseudoknots  when  it
              calculates  specificity,  even  if  they're  right,  because  they  don't  appear  in  the trusted
              annotation; this is probably not what you'd want to do.

EXPERT OPTIONS

       --informat <s>
              Specify that the two sequence files are in format <s>. In this case, both files  must  be  in  the
              same  format. The default is to autodetect the file formats, in which case they could be different
              (one SELEX, one Stockholm).

       --quiet
              Don't print any verbose header information.

SEE ALSO

       afetch(1),  alistat(1),  compalign(1),  revcomp(1),  seqsplit(1),  seqstat(1),   sfetch(1),   shuffle(1),
       sindex(1), sreformat(1), stranslate(1), weight(1).

AUTHOR

       Biosquid  and its documentation are Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
       Freely distributed under the GNU General Public License (GPL) See COPYING in the source code distribution
       for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu