lunar (1) RNAsubopt.1.gz

Provided by: vienna-rna_2.5.1+dfsg-1_amd64 bug

NAME

       RNAsubopt - manual page for RNAsubopt 2.5.1

SYNOPSIS

       RNAsubopt [OPTION]...

DESCRIPTION

       RNAsubopt 2.5.1

       calculate suboptimal secondary structures of RNAs

       Reads  RNA  sequences  from  stdin  and (in the default -e mode) calculates all suboptimal
       secondary structures within a user defined energy range  above  the  minimum  free  energy
       (mfe).  It prints the suboptimal structures in dot-bracket notation followed by the energy
       in kcal/mol to stdout. Be careful, the number of structures returned  grows  exponentially
       with both sequence length and energy range.

       Alternatively, when used with the -p option, RNAsubopt produces Boltzmann weighted samples
       of secondary structures.

       -h, --help
              Print help and exit

       --detailed-help
              Print help, including all details and hidden options, and exit

       --full-help
              Print help, including hidden options, and exit

       -V, --version
              Print version and exit

   General Options:
              Command line options which alter the general behavior of this program

       -v, --verbose
              Be verbose.  (default=off)

       --noconv
              Do not automatically substitude nucleotide "T" with "U".  (default=off)

       -i, --infile=<filename>
              Read a file instead of reading from stdin.

              The default behavior of RNAsubopt is to read input from stdin. Using this parameter
              the user can specify an input file name where data is read from.

       -o, --outfile[=<filename>]
              Print output to file instead of stdout.

              This option may be used to write all output to output files rather than printing to
              stdout. The default filename is "RNAsubopt_output.sub" if no FASTA header  precedes
              the  input sequences and the --auto-id feature is inactive. Otherwise, output files
              with the scheme "prefix.sub" are generated, where the "prefix" is  taken  from  the
              sequence  id. The user may specify a single output file name for all data generated
              from the input by supplying an optional string as argument to  this  parameter.  In
              case  a  file with the same filename already exists, any output of the program will
              be appended to it. Note: Any special characters in the filename will be replaced by
              the  filename  delimiter,  hence  there  is no way to pass an entire directory path
              through this option yet. (See also the "--filename-delim" parameter)

       --auto-id
              Automatically generate an ID for each sequence.  (default=off)

              The default mode of RNAsubopt is to automatically determine an ID  from  the  input
              sequence  data if the input file format allows to do that. Sequence IDs are usually
              given in the FASTA header of input sequences. If this  flag  is  active,  RNAsubopt
              ignores any IDs retrieved from the input and automatically generates an ID for each
              sequence. This ID consists of a prefix and an increasing number. This flag can also
              be used to add a FASTA header to the output even if the input has none.

       --id-prefix=prefix
              Prefix   for   automatically   generated  IDs  (as  used  in  output  file  names).
              (default=`sequence')

              If this parameter is set, each sequences'  FASTA  id  will  be  prefixed  with  the
              provided  string.  FASTA  ids  then  take the form ">prefix_xxxx" where xxxx is the
              sequence number. Note: Setting this parameter implies --auto-id.

       --id-delim=STRING
              Change the  delimiter  between  prefix  and  increasing  number  for  automatically
              generated IDs (as used in output file names).  (default=`_')

              This  parameter  can be used to change the default delimiter "_" between the prefix
              string and the increasing number for automatically generated ID.

       --id-digits=INT
              Specify the number of digits of the counter in  automatically  generated  alignment
              IDs.  (default=`4')

              When alignments IDs are automatically generated, they receive an increasing number,
              starting with 1. This number will always be left-padded by leading zeros, such that
              the  number  takes  up  a  certain  width.  Using  this parameter, the width can be
              specified to the users need. We allow numbers in  the  range  [1:18].  This  option
              implies --auto-id.

       --id-start=LONG
              Specify the first number in automatically generated alignment IDs.  (default=`1')

              When  sequence  IDs are automatically generated, they receive an increasing number,
              usually starting with 1. Using this parameter, the first number can be specified to
              the users requirements. Note: negative numbers are not allowed.  Note: Setting this
              parameter implies to ignore  any  IDs  retrieved  from  the  input  data,  i.e.  it
              activates the --auto-id flag.

       --filename-delim=STRING
              Change   the   delimiting   character   that   is  used  for  sanitized  filenames.
              (default=`ID-delimiter')

              This parameter can be used to change the delimiting character used while sanitizing
              filenames,  i.e.  replacing  invalid  characters.  Note, that the default delimiter
              ALWAYS is the first character  of  the  "ID  delimiter"  as  supplied  through  the
              --id-delim  option.  If  the  delimiter is a whitespace character or empty, invalid
              characters will be simply removed rather than substituted. Currently, we regard the
              following  characters  as  illegal  for use in filenames: backslash '\', slash '/',
              question mark '?', percent sign '%', asterisk '*',  colon  ':',  pipe  symbol  '|',
              double quote '"', triangular brackets '<' and '>'.

       --filename-full
              Use full FASTA header to create filenames.  (default=off)

              This  parameter  can  be used to deactivate the default behavior of limiting output
              filenames to the first word of the sequence ID. Consider the following example:  An
              input  with  FASTA header ">NM_0001 Homo Sapiens some gene" usually produces output
              files with the prefix "NM_0001" without the additional data available in the  FASTA
              header,  e.g.  "NM_0001.sub".  With  this  flag  set,  no  truncation of the output
              filenames is performed, i.e.  output filenames receive the full FASTA  header  data
              as  prefixes.  Note,  however, that invalid characters (such as whitespace) will be
              substituted by a delimiting character or simply removed, (see  also  the  parameter
              option --filename-delim).

   Structure Constraints:
              Command  line  options  to  interact with the structure constraints feature of this
              program

       --maxBPspan=INT
              Set the maximum base pair span.  (default=`-1')

       -C, --constraint[=<filename>] Apply structural constraint(s) during
              prediction.  (default=`')

              The program first reads the sequence(s), then a dot-bracket like string  containing
              constraints on the structure. The following symbols are recognized:

              '.' ... no constraint for this base

              'x' ... the base is unpaired

              '<' ... the base pairs downstream, i.e. i is paired with j > i

              '>' ... the base pairs upstream, i.e. i is paired with j < i

              '|' ... the corresponding base has to be paired

              '()' ... base i pairs with base j

              Due to historic behavior of this program, all pairing constraints will only

              disallow  pairs  that  conflict  with the constraint. This is usually sufficient to
              enforce the constraint, but occasionally a base  may  stay  unpaired  in  spite  of
              constraints. Use the --enforceConstraint to really exclude unpaired states.

       --batch
              Use constraints for multiple sequences.  (default=off)

              Usually,  constraints  provided  from  input  file  only  apply  to  a single input
              sequence. Therefore, RNAsubopt will stop its computation and quit after  the  first
              input sequence was processed. Using this switch, RNAsubopt processes multiple input
              sequences and applies the same provided constraints to each of them.

       --canonicalBPonly
              Remove non-canonical base pairs from the structure constraint.  (default=off)

       --enforceConstraint
              Enforce  base  pairs  given  by  round  brackets  (  )  in  structure   constraint.
              (default=off)

       --shape=<filename>
              Use  SHAPE  reactivity  data  in  the  folding  recursions (does not work for Zuker
              suboptimals).

       --shapeMethod=STRING
              Specify  the  method  how  to  convert  SHAPE  reactivity  data  to  pseudo  energy
              contributions.  (default=`D')

              The  following methods can be used to convert SHAPE reactivities into pseudo energy
              contributions.

              'D': Convert by using a linear  equation  according  to  Deigan  et  al  2009.  The
              calculated  pseudo  energies  will  be  applied  for every nucleotide involved in a
              stacked pair. This method is recognized by a capital 'D' in the provided parameter,
              i.e.: --shapeMethod="D" is the default setting. The slope 'm' and the intercept 'b'
              can be set to a non-default value if necessary,  otherwise  m=1.8  and  b=-0.6.  To
              alter  these  parameters,  e.g. m=1.9 and b=-0.7, use a parameter string like this:
              --shapeMethod="Dm1.9b-0.7". You may also provide only one  of  the  two  parameters
              like: --shapeMethod="Dm1.9" or --shapeMethod="Db-0.7".

              'Z':  Convert SHAPE reactivities to pseudo energies according to Zarringhalam et al
              2012. SHAPE reactivities will be converted to pairing probabilities by using linear
              mapping.  Aberration  from  the  observed  pairing  probabilities will be penalized
              during the folding recursion. The  magnitude  of  the  penalties  can  affected  by
              adjusting the factor beta (e.g. --shapeMethod="Zb0.8").

              'W':  Apply  a  given  vector  of  perturbation  energies  to  unpaired nucleotides
              according to Washietl et al 2012. Perturbation vectors can be calculated  by  using
              RNApvmin.

       --shapeConversion=STRING
              Specify the method used to convert SHAPE reactivities to pairing probabilities when
              using the SHAPE approach of Zarringhalam et al.  (default=`O')

              The  following  methods  can  be  used  to  convert  SHAPE  reactivities  into  the
              probability for a certain nucleotide to be unpaired.

              'M': Use linear mapping according to Zarringhalam et al.

              'C':  Use  a  cutoff-approach  to divide into paired and unpaired nucleotides (e.g.
              "C0.25")

              'S':  Skip  the  normalizing  step  since  the  input   data   already   represents
              probabilities for being unpaired rather than raw reactivity values

              'L':  Use  a  linear  model  to convert the reactivity into a probability for being
              unpaired (e.g. "Ls0.68i0.2" to use a slope of 0.68 and an intercept of 0.2)

              'O': Use a linear model to convert the log of the reactivity into a probability for
              being unpaired (e.g. "Os1.6i-2.29" to use a slope of 1.6 and an intercept of -2.29)

       --commands=<filename>
              Read additional commands from file

              Commands  include  hard  and soft constraints, but also structure motifs in hairpin
              and interior loops that need to be treeted differently. Furthermore,  commands  can
              be set for unstructured and structured domains.

   Algorithms:
              Select the algorithms which should be applied to the given RNA sequence.

       -e, --deltaEnergy=range
              Compute  suboptimal  structures  with  energy  in  a  certain  range of the optimum
              (kcal/mol).

              Default is calculation of mfe structure only.

       --deltaEnergyPost=range
              Only print structures with energy within range of the mfe after  post  reevaluation
              of energies.

              Useful  in  conjunction  with -logML, -d1 or -d3: while the -e option specifies the
              range before energies are re-evaluated, this option specifies  the  maximum  energy
              after re-evaluation.

       -s, --sorted
              Sort the suboptimal structures by energy and lexicographical order.  (default=off)

              Structures are first sorted by energy in ascending order. Within groups of the same
              energy, structures are then sorted in ascending in lexicographical order  of  their
              dot-bracket  notation.  See the --en-only flag to deactivate this second step. Note
              that sorting is done in memory, thus it can easily lead to exhaution of  RAM!  This
              is  especially  true  if the number of structures produced becomes large or the RNA
              sequence is rather long. In such cases better use an external sort method, such  as
              UNIX "sort".

       --en-only
              Only sort structures by free energy.  (default=off)

              In combination with --sorted, this flag deactivates the second sorting criteria and
              sorts structures solely by their free energy instead  of  additionally  sorting  by
              lexicographic  order  in  each  energy  band.  This might save some time during the
              sorting process in situations where lexicographic order is not required.

       -p, --stochBT=number
              Randomly draw structures according to their probability in the Boltzmann ensemble.

              Instead of producing all suboptimals in an energy range, produce a random sample of
              suboptimal  structures,  drawn  with probabilities equal to their Boltzmann weights
              via stochastic backtracking in the partition function. The -e and  -p  options  are
              mutually exclusive.

       --stochBT_en=number
              Same  as  "--stochBT"  but  also  print  free  energies  and  probabilities  of the
              backtraced structures.

       -N, --nonRedundant
              Enable non-redundant sampling strategy.  (default=off)

       -S, --pfScale=DOUBLE
              Set scaling factor for Boltzmann factors to prevent under/overflows.

              In the calculation of the pf use scale*mfe as an estimate  for  the  ensemble  free
              energy  (used  to  avoid  overflows). The default is 1.07, useful values are 1.0 to
              1.2. Occasionally needed for long sequences.  You can also recompile the program to
              use double precision (see the README file).

       -c, --circ
              Assume a circular (instead of linear) RNA molecule.  (default=off)

       -D, --dos
              Compute density of states instead of secondary structures.  (default=off)

              This option enables the evaluation of the number of secondary structures in certain
              energy bands around the MFE.

       -z, --zuker
              Compute Zuker suboptimals instead of all suboptimal  structures  within  an  energy
              band around the MFE.  (default=off)

       -g, --gquad
              Incoorporate G-Quadruplex formation.  (default=off)

              No  support  of G-quadruplex prediction for stochastic backtracking and Zuker-style
              suboptimals yet).

   Model Details:
       -T, --temp=DOUBLE
              Rescale energy parameters to a temperature in degrees centigrade.  (default=`37.0')

       -4, --noTetra
              Do not include special tabulated stabilizing energies for tri-, tetra- and hexaloop
              hairpins.  (default=off)

              Mostly for testing.

       -d, --dangles=INT
              Specify  "dangling  end"  model  for  bases  adjacent  to  helices in free ends and
              multi-loops.  (default=`2')

              With -d1 only unpaired bases can participate in at most one dangling end.  With -d2
              this  check is ignored, dangling energies will be added for the bases adjacent to a
              helix on both sides in any case; this is the default for mfe and partition function
              folding  (-p).   The  option  -d0  ignores  dangling  ends  altogether  (mostly for
              debugging).  With -d3 mfe folding will allow coaxial stacking of  adjacent  helices
              in multi-loops. At the moment the implementation will not allow coaxial stacking of
              the two interior pairs in a loop of degree 3 and works only for mfe folding.

              Note that with -d1 and -d3 only the MFE computations will  be  using  this  setting
              while  partition  function  uses  -d2  setting,  i.e. dangling ends will be treated
              differently.

       --noLP Produce structures without lonely pairs (helices of length 1).  (default=off)

              For partition function folding this  only  disallows  pairs  that  can  only  occur
              isolated. Other pairs may still occasionally occur as helices of length 1.

       --noGU Do not allow GU pairs.  (default=off)

       --noClosingGU
              Do not allow GU pairs at the end of helices.  (default=off)

       --logML
              Recompute   energies   of  structures  using  a  logarithmic  energy  function  for
              multi-loops before output.  (default=off)

              This option does not effect  structure  generation,  only  the  energies  that  are
              printed out. Since logML lowers energies somewhat, some structures may be missing.

       -P, --paramFile=paramfile
              Read energy parameters from paramfile, instead of using the default parameter set.

              Different  sets  of  energy  parameters  for  RNA  and  DNA  should  accompany your
              distribution.  See the RNAlib documentation for details on the  file  format.  When
              passing the placeholder file name "DNA", DNA parameters are loaded without the need
              to actually specify any input file.

       --nsp=STRING
              Allow other pairs in addition to the usual AU,GC,and GU pairs.

              Its argument is a comma separated list of additionally allowed pairs. If the  first
              character  is  a  "-"  then  AB  will imply that AB and BA are allowed pairs.  e.g.
              RNAsubopt -nsp -GA  will allow GA and AG  pairs.  Nonstandard  pairs  are  given  0
              stacking energy.

       --energyModel=INT
              Set energy model.

              Rarely  used option to fold sequences from the artificial ABCD... alphabet, where A
              pairs B, C-D etc.  Use the energy parameters for GC (-e 1) or AU (-e 2) pairs.

       --betaScale=DOUBLE
              Set the scaling of the Boltzmann factors.  (default=`1.')

              The  argument  provided  with  this  option  enables  to  scale  the  thermodynamic
              temperature  used  in the Boltzmann factors independently from the temperature used
              to scale the individual energy contributions  of  the  loop  types.  The  Boltzmann
              factors  then  become exp(-dG/(kT*betaScale)) where k is the Boltzmann constant, dG
              the free energy contribution of the state and T the absolute temperature.

REFERENCES

       If you use this program in your work you might want to cite:

       R. Lorenz, S.H. Bernhart, C. Hoener zu Siederdissen, H. Tafer, C. Flamm, P.F. Stadler  and
       I.L. Hofacker (2011), "ViennaRNA Package 2.0", Algorithms for Molecular Biology: 6:26

       I.L.  Hofacker,  W.  Fontana,  P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994),
       "Fast Folding and Comparison of RNA Secondary Structures", Monatshefte f. Chemie: 125,  pp
       167-188

       R.  Lorenz,  I.L.  Hofacker,  P.F.  Stadler  (2016),  "RNA  folding  with  hard  and  soft
       constraints", Algorithms for Molecular Biology 11:1 pp 1-13

       S. Wuchty, W. Fontana, I. L. Hofacker and P. Schuster (1999), "Complete Suboptimal Folding
       of RNA and the Stability of Secondary Structures", Biopolymers: 49, pp 145-165

       M.  Zuker  (1989),  "On  Finding  All  Suboptimal  Foldings  of  an RNA Molecule", Science
       244.4900, pp 48-52

       Y. Ding, and C.E. Lawrence (2003), "A statistical sampling  algorithm  for  RNA  secondary
       structure prediction", Nucleic Acids Research 31.24, pp 7280-7301

       The energy parameters are taken from:

       D.H.  Mathews,  M.D.  Disney, D. Matthew, J.L. Childs, S.J. Schroeder, J. Susan, M. Zuker,
       D.H. Turner (2004),  "Incorporating  chemical  modification  constraints  into  a  dynamic
       programming  algorithm  for prediction of RNA secondary structure", Proc. Natl. Acad. Sci.
       USA: 101, pp 7287-7292

       D.H Turner, D.H. Mathews (2009),  "NNDB:  The  nearest  neighbor  parameter  database  for
       predicting  stability of nucleic acid secondary structure", Nucleic Acids Research: 38, pp
       280-282

AUTHOR

       Ivo L Hofacker, Stefan Wuchty, Walter Fontana, Ronny Lorenz

REPORTING BUGS

       If in doubt our program is right,  nature  is  at  fault.   Comments  should  be  sent  to
       rna@tbi.univie.ac.at.