Provided by: biosquid_1.9g+cvs20050121-12_amd64 bug

NAME

       compstruct - calculate accuracy of RNA secondary structure predictions

SYNOPSIS

       compstruct [options] trusted_file test_file

DESCRIPTION

       compstruct evaluates the accuracy of RNA secondary structure predictions, at the on a per-
       base-pair basis.  The trusted_file contains one or more sequences with trusted (known) RNA
       secondary  structure  annotation.  The  test_file contains the same sequences, in the same
       order, with predicted RNA secondary structure annotation.  compstruct reads the structures
       and compares them, and calculates both the sensitivity (the number of true base pairs that
       are correctly predicted) and the specificity (positive predictive  value,  the  number  of
       predicted  base  pairs that are true).  Results are reported for each individual sequence,
       and in summary for all sequences together.

       Both files must contain secondary structure annotation in WUSS notation.  Only  SELEX  and
       Stockholm formats support structure markup at present.

       The  default  definition of a correctly predicted base pair is that a true pair (i,j) must
       exactly match a predicted pair (i,j).

       Mathews, Zuker, Turner and colleagues (see: Mathews et al., JMB 288:911-940, 1999)  use  a
       more  relaxed  definition.  Mathews  defines  "correct"  as  follows: a true pair (i,j) is
       correctly predicted if any of the following pairs are predicted: (i,j), (i+1,j),  (i-1,j),
       (i,j+1),  or  (i,j-1).  This  rule  allows  for "slipped helices" off by one base.  The -m
       option activates this rule for both sensitivity and for specificity. For specificity,  the
       rule  is  reversed:  predicted  pair  (i,j) is considered to be true if the true structure
       contains one of the five pairs (i,j), (i+1,j), (i-1,j), (i,j+1), or (i,j-1).

OPTIONS

       -h     Print brief help; includes version number and summary  of  all  options,  including
              expert options.

       -m     Use  the  Mathews  relaxed  accuracy  rule  (see above), instead of requiring exact
              prediction of base pairs.

       -p     Count pseudoknotted base pairs towards the accuracy, in either trusted or predicted
              structures. By default, pseudoknots are ignored.

              Normally,  only  the  trusted_file would have pseudoknot annotation, since most RNA
              secondary structure prediction programs do not predict pseudoknots.  Using  the  -p
              option  allows  you  to  penalize  the  prediction program for not predicting known
              pseudoknots. In  a  case  where  both  the  trusted_file  and  the  test_file  have
              pseudoknot  annotation,  the -p option lets you count pseudoknots in evaluating the
              prediction accuracy. Beware, however, the case where you use  a  pseudoknot-capable
              prediction  program  to  generate the test_file, but the trusted_file does not have
              pseudoknot annotation; in this case, -p will  penalize  any  predicted  pseudoknots
              when it calculates specificity, even if they're right, because they don't appear in
              the trusted annotation; this is probably not what you'd want to do.

EXPERT OPTIONS

       --informat <s>
              Specify that the two sequence files are in format <s>. In  this  case,  both  files
              must be in the same format. The default is to autodetect the file formats, in which
              case they could be different (one SELEX, one Stockholm).

       --quiet
              Don't print any verbose header information.

SEE ALSO

       afetch(1),  alistat(1),  compalign(1),  revcomp(1),  seqsplit(1),  seqstat(1),  sfetch(1),
       shuffle(1), sindex(1), sreformat(1), stranslate(1), weight(1).

AUTHOR

       Biosquid  and  its  documentation  are  Copyright (C) 1992-2003 HHMI/Washington University
       School of Medicine Freely distributed under the  GNU  General  Public  License  (GPL)  See
       COPYING in the source code distribution for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu