lunar (1) RNAcofold.1.gz

Provided by: vienna-rna_2.5.1+dfsg-1_amd64 bug

NAME

       RNAcofold - manual page for RNAcofold 2.5.1

SYNOPSIS

       RNAcofold [OPTION]... [FILE]...

DESCRIPTION

       RNAcofold 2.5.1

       calculate secondary structures of two RNAs with dimerization

       The program works much like RNAfold, but allows one to specify two RNA sequences which are
       then allowed to form a dimer structure. RNA sequences are read from  stdin  in  the  usual
       format,  i.e.  each  line  of input corresponds to one sequence, except for lines starting
       with ">" which contain the name of the next sequence.  To compute the hybrid structure  of
       two  molecules,  the  two  sequences  must  be  concatenated  using the \'&\' character as
       separator.  RNAcofold can compute  minimum  free  energy  (mfe)  structures,  as  well  as
       partition  function  (pf)  and base pairing probability matrix (using the -p switch) Since
       dimer formation is concentration dependent, RNAcofold can be used to  compute  equilibrium
       concentrations   for  all  five  monomer  and  (homo/hetero)-dimer  species,  given  input
       concentrations for the monomers.  Output consists of the mfe structure in bracket notation
       as  well  as  PostScript  structure  plots  and  "dot  plot"  files  containing  the  pair
       probabilities, see the RNAfold man page for details. In the dot plots a  cross  marks  the
       chain break between the two concatenated sequences.  The program will continue to read new
       sequences until a line consisting of the single character @ or an end of file condition is
       encountered.

       -h, --help
              Print help and exit

       --detailed-help
              Print help, including all details and hidden options, and exit

       --full-help
              Print help, including hidden options, and exit

       -V, --version
              Print version and exit

   General Options:
              Command line options which alter the general behavior of this program

       -v, --verbose
              Be verbose.

              (default=off)

       -j, --jobs[=number]
              Split  batch  input  into  jobs  and  start  processing  in parallel using multiple
              threads. A value of 0 indicates to use as  many  parallel  threads  as  computation
              cores are available.

              (default=`0')

              Default  processing  of  input  data  is  performed  in  a serial fashion, i.e. one
              sequence pair at  a  time.  Using  this  switch,  a  user  can  instead  start  the
              computation for many sequence pairs in the input in parallel. RNAcofold will create
              as many parallel computation slots as specified and assigns input sequences of  the
              input  file(s) to the available slots. Note, that this increases memory consumption
              since input alignments have to be kept in memory until an  empty  compute  slot  is
              available and each running job requires its own dynamic programming matrices.

       --unordered
              Do  not  try  to  keep  output  in order with input while parallel processing is in
              place.

              (default=off)

              When parallel input processing (--jobs flag) is enabled, the order in  which  input
              is  processed  depends on the host machines job scheduler. Therefore, any output to
              stdout or files generated by this program will most likely not follow the order  of
              the  corresponding input data set. The default of RNAcofold is to use a specialized
              data structure to still keep the results output  in  order  with  the  input  data.
              However,  this  comes  with  a  trade-off in terms of memory consumption, since all
              output must be kept in memory for as long as  no  chunks  of  consecutive,  ordered
              output  are  available.  By setting this flag, RNAcofold will not buffer individual
              results but print them as soon as they have been computated.

       --noPS Do not produce postscript drawing of the mfe structure.

              (default=off)

       --noconv
              Do not automatically substitute nucleotide "T" with "U"

              (default=off)

       --auto-id
              Automatically generate an ID for each sequence.  (default=off)

              The default mode of RNAcofold is to automatically determine an ID  from  the  input
              sequence  data if the input file format allows to do that. Sequence IDs are usually
              given in the FASTA header of input sequences. If this  flag  is  active,  RNAcofold
              ignores any IDs retrieved from the input and automatically generates an ID for each
              sequence. This ID consists of a prefix and an increasing number. This flag can also
              be used to add a FASTA header to the output even if the input has none.

       --id-prefix=prefix
              Prefix for automatically generated IDs (as used in output file names)

              (default=`sequence')

              If  this parameter is set, each sequence will be prefixed with the provided string.
              Hence, the output files will obey the following naming scheme:  "prefix_xxxx_ss.ps"
              (secondary  structure  plot),  "prefix_xxxx_dp.ps" (dot-plot), "prefix_xxxx_dp2.ps"
              (stack probabilities), etc. where xxxx is the sequence number. Note:  Setting  this
              parameter implies --auto-id.

       --id-delim=delimiter
              Change  the  delimiter  between  prefix  and  increasing  number  for automatically
              generated IDs (as used in output file names)

              (default=`_')

              This parameter can be used to change the default delimiter "_" between

              the prefix string and the increasing number for automatically generated ID.

       --id-digits=INT
              Specify the number of digits of the counter in  automatically  generated  alignment
              IDs.

              (default=`4')

              When alignments IDs are automatically generated, they receive an increasing number,
              starting with 1. This number will always be left-padded by leading zeros, such that
              the  number  takes  up  a  certain  width.  Using  this parameter, the width can be
              specified to the users need. We allow numbers in  the  range  [1:18].  This  option
              implies --auto-id.

       --id-start=LONG
              Specify the first number in automatically generated alignment IDs.

              (default=`1')

              When  sequence  IDs are automatically generated, they receive an increasing number,
              usually starting with 1. Using this parameter, the first number can be specified to
              the users requirements. Note: negative numbers are not allowed.  Note: Setting this
              parameter implies to ignore  any  IDs  retrieved  from  the  input  data,  i.e.  it
              activates the --auto-id flag.

       --filename-delim=delimiter
              Change the delimiting character that is used

              for sanitized filenames

              (default=`ID-delimiter')

              This parameter can be used to change the delimiting character used while sanitizing
              filenames, i.e. replacing invalid characters.  Note,  that  the  default  delimiter
              ALWAYS  is  the  first  character  of  the  "ID  delimiter" as supplied through the
              --id-delim option. If the delimiter is a whitespace  character  or  empty,  invalid
              characters will be simply removed rather than substituted. Currently, we regard the
              following characters as illegal for use in filenames:  backslash  '\',  slash  '/',
              question  mark  '?',  percent  sign  '%', asterisk '*', colon ':', pipe symbol '|',
              double quote '"', triangular brackets '<' and '>'.

       --filename-full
              Use full FASTA header to create filenames

              (default=off)

              This parameter can be used to deactivate the default behavior  of  limiting  output
              filenames  to the first word of the sequence ID. Consider the following example: An
              input with FASTA header ">NM_0001 Homo Sapiens some gene" usually  produces  output
              files  with the prefix "NM_0001" without the additional data available in the FASTA
              header, e.g. "NM_0001_ss.ps" for secondary structure plots. With this flag set,  no
              truncation  of the output filenames is done, i.e. output filenames receive the full
              FASTA header data as prefixes. Note, however,  that  invalid  characters  (such  as
              whitespace)  will  be substituted by a delimiting character or simply removed, (see
              also the parameter option --filename-delim).

       --output-format=format-character
              Change the default output format

              (default=`V')

              The following output formats are currently supported:

              ViennaRNA format (V), Delimiter-separated format (D) also known as CSV

              format.

       --csv-delim=delimiter
              Change the delimiting character for Delimiter-separated output format, such as CSV

              (default=`,')

              Delimiter-separated output defaults to comma separated values (CSV), i.e. all  data
              in one data set is delimited by a comma character. This option allows one to change
              the delimiting character to something else. Note, to switch to tab-separated  data,
              use $'\t' as delimiting character.

       --csv-noheader
              Do not print header for Delimiter-separated output, such as CSV

              (default=off)

   Structure Constraints:
              Command  line  options  to  interact with the structure constraints feature of this
              program

       --maxBPspan=INT
              Set the maximum base pair span

              (default=`-1')

       -C, --constraint[=<filename>] Calculate structures subject to constraints.
              (default=`')

              The program reads first the sequence, then a string containing constraints  on  the
              structure encoded with the symbols:

              . (no constraint for this base)

              | (the corresponding base has to be paired

              x (the base is unpaired)

              < (base i is paired with a base j>i)

              > (base i is paired with a base j<i)

              and matching brackets ( ) (base i pairs base j)

              With the exception of "|", constraints will disallow all pairs conflicting with the
              constraint. This is usually sufficient to enforce the constraint, but  occasionally
              a base may stay unpaired in spite of constraints. PF folding ignores constraints of
              type "|".

       --batch
              Use constraints for multiple sequences.  (default=off)

              Usually, constraints provided  from  input  file  only  apply  to  a  single  input
              sequence.  Therefore,  RNAcofold will stop its computation and quit after the first
              input sequence was processed. Using this switch, RNAcofold processes multiple input
              sequences and applies the same provided constraints to each of them.

       --canonicalBPonly
              Remove non-canonical base pairs from the structure constraint

              (default=off)

       --enforceConstraint
              Enforce base pairs given by round brackets ( ) in structure constraint

              (default=off)

       --shape=<filename>
              Use SHAPE reactivity data to guide structure predictions

       --shapeMethod=[D/Z/W] + [optional parameters]
              Select method to incorporate SHAPE reactivity

       data.  (default=`D')

              The  following methods can be used to convert SHAPE reactivities into pseudo energy
              contributions.

              'D': Convert by using a linear  equation  according  to  Deigan  et  al  2009.  The
              calculated  pseudo  energies  will  be  applied  for every nucleotide involved in a
              stacked pair. This method is recognized by a capital 'D' in the provided parameter,
              i.e.: --shapeMethod="D" is the default setting. The slope 'm' and the intercept 'b'
              can be set to a non-default value if necessary,  otherwise  m=1.8  and  b=-0.6.  To
              alter  these  parameters,  e.g. m=1.9 and b=-0.7, use a parameter string like this:
              --shapeMethod="Dm1.9b-0.7". You may also provide only one  of  the  two  parameters
              like: --shapeMethod="Dm1.9" or --shapeMethod="Db-0.7".

              'Z':  Convert SHAPE reactivities to pseudo energies according to Zarringhalam et al
              2012. SHAPE reactivities will be converted to pairing probabilities by using linear
              mapping.  Aberration  from  the  observed  pairing  probabilities will be penalized
              during the folding recursion. The  magnitude  of  the  penalties  can  affected  by
              adjusting the factor beta (e.g. --shapeMethod="Zb0.8").

              'W':  Apply  a  given  vector  of  perturbation  energies  to  unpaired nucleotides
              according to Washietl et al 2012. Perturbation vectors can be calculated  by  using
              RNApvmin.

       --shapeConversion=M/C/S/L/O
              + [optional parameters] Select method to convert SHAPE reactivities to

       pairing probabilities.
              (default=`O')

              This  parameter  is  useful  when dealing with the SHAPE incorporation according to
              Zarringhalam et al. The following methods can be used to convert SHAPE reactivities
              into the probability for a certain nucleotide to be unpaired.

              'M':   Use   linear   mapping   according  to  Zarringhalam  et  al.   'C':  Use  a
              cutoff-approach to divide into paired and unpaired nucleotides (e.g. "C0.25")  'S':
              Skip the normalizing step since the input data already represents probabilities for
              being unpaired rather than raw reactivity values 'L': Use a linear model to convert
              the  reactivity  into  a probability for being unpaired (e.g. "Ls0.68i0.2" to use a
              slope of 0.68 and an intercept of 0.2) 'O': Use a linear model to convert  the  log
              of  the reactivity into a probability for being unpaired (e.g. "Os1.6i-2.29" to use
              a slope of 1.6 and an intercept of -2.29)

       --commands=<filename>
              Read additional commands from file

              Commands include hard and soft constraints, but also structure  motifs  in  hairpin
              and  interior  loops that need to be treeted differently. Furthermore, commands can
              be set for unstructured and structured domains.

   Algorithms:
              Select additional algorithms which should be included  in  the  calculations.   The
              Minimum  free  energy  (MFE)  and  a structure representative are calculated in any
              case.

       -p, --partfunc[=INT]
              Calculate the partition function and base pairing probability matrix in addition to
              the mfe structure. Default is calculation of mfe structure only.

              (default=`1')

              In  addition  to  the  MFE  structure  we print a coarse representation of the pair
              probabilities in form of a pseudo bracket notation, followed by the  ensemble  free
              energy,  as  well  as  the  centroid  structure derived from the pair probabilities
              together with its free energy and distance to the ensemble.  Finally it prints  the
              frequency of the mfe structure, and the structural diversity (mean distance between
              the  structures  in  the  ensemble).   See  the  description   of   pf_fold()   and
              mean_bp_dist()  and  centroid() in the RNAlib documentation for details.  Note that
              unless you also specify -d2 or -d0, the partition  function  and  mfe  calculations
              will  use  a  slightly  different  energy model. See the discussion of dangling end
              options below.

              An additionally passed value to this  option  changes  the  behavior  of  partition
              function calculation:

              In order to calculate the partition function but not the pair probabilities

              use the -p0 option and save about

              50% in runtime. This prints the ensemble free energy -kT ln(Z).

       -a, --all_pf[=INT]
              Compute  the  partition  function  and  free  energies not only of the hetero-dimer
              consisting of the two input sequences (the "AB dimer"), but also of the homo-dimers
              AA and BB as well as A and B monomers.

              (default=`1')

              The  output  will contain the free energies for each of these species, as well as 5
              dot plots containing  the  conditional  pair  probabilities,  called  "ABname5.ps",
              "AAname5.ps"  and  so on. For later use, these dot plot files also contain the free
              energy of the ensemble as a comment. Using -a  automatically  switches  on  the  -p
              option.  Base  pair  probability  computations  may  be  turned  off  altogether by
              providing "0" as an argument to this parameter. In that case,  no  dot  plot  files
              will be generated.

       -c, --concentrations
              In  addition  to  everything  listed  under  the -a option, read in initial monomer
              concentrations and  compute  the  expected  equilibrium  concentrations  of  the  5
              possible species (AB, AA, BB, A, B).

              (default=off)

              Start concentrations are read from stdin (unless the -f option is used) in [mol/l],
              equilibrium concentrations are given realtive to the sum  of  the  two  inputs.  An
              arbitrary   number  of  initial  concentrations  can  be  specified  (one  pair  of
              concentrations per line).

       -f, --concfile=filename
              Specify a file with initial concentrations for the two sequences.

              The table consits of arbitrary many lines with just two numbers (the  concentration
              of sequence A and B). This option will automatically toggle the -c (and thus -a and
              -p) options (see above).

       --centroid
              Compute the centroid structure.  (default=off)

              Additionally to the MFE structure,  compute  the  centroid  representative  of  the
              structure  ensemble. Here, we apply the base pair distance as distance measure, and
              report the structure that minimizes its Boltzmann weighted base  pair  distance  to
              the  rest  of  the  ensemble. Computing the centroid structure requires equilibrium
              base pair  probabilities.  Therefore,  this  option  implies  the  -p  switch.  For
              historical reasons, the centroid structure output is deactivated by default.

       --MEA[=gamma]
              Calculate an MEA (maximum expected accuracy) structure, where the expected accuracy
              is computed from the  pair  probabilities:  each  base  pair  (i,j)  gets  a  score
              2*gamma*p_ij  and  the score of an unpaired base is given by the probability of not
              forming a pair.

              (default=`1.')

              The parameter gamma tunes  the  importance  of  correctly  predicted  pairs  versus
              unpaired bases. Thus, for small values of gamma the MEA structure will contain only
              pairs with very high probability.  Using --MEA implies -p for  computing  the  pair
              probabilities.

       -S, --pfScale=scaling factor
              In  the  calculation  of  the pf use scale*mfe as an estimate for the ensemble free
              energy (used to avoid overflows).

              The default is 1.07, useful values are 1.0 to 1.2.  Occasionally  needed  for  long
              sequences.   You  can  also  recompile the program to use double precision (see the
              README file).

       --bppmThreshold=<value>
              Set the threshold for base pair probabilities included in the postscript output

              (default=`1e-5')

              By setting the threshold the base pair  probabilities  that  are  included  in  the
              output  can  be varied. By default only those exceeding 1e-5 in probability will be
              shown as squares in the dot plot. Changing the threshold to any other value  allows
              for increase or decrease of data.

       -g, --gquad
              Incoorporate G-Quadruplex formation into the structure prediction algorithm.

              (default=off)

   Model Details:
       -T, --temp=DOUBLE
              Rescale energy parameters to a temperature of temp C. Default is 37C.

       -4, --noTetra
              Do not include special tabulated stabilizing energies for tri-, tetra- and hexaloop
              hairpins.

              (default=off)

              Mostly for testing.

       -d, --dangles=INT
              How to treat "dangling end" energies for bases adjacent to helices in free ends and
              multi-loops

              (default=`2')

              With -d1 only unpaired bases can participate in at most one dangling end.  With -d2
              this check is ignored, dangling energies will be added for the bases adjacent to  a
              helix on both sides in any case; this is the default for mfe and partition function
              folding (-p).   The  option  -d0  ignores  dangling  ends  altogether  (mostly  for
              debugging).   With  -d3 mfe folding will allow coaxial stacking of adjacent helices
              in multi-loops. At the moment the implementation will not allow coaxial stacking of
              the two interior pairs in a loop of degree 3 and works only for mfe folding.

              Note  that  with  -d1  and -d3 only the MFE computations will be using this setting
              while partition function uses -d2 setting,  i.e.  dangling  ends  will  be  treated
              differently.

       --noLP Produce structures without lonely pairs (helices of length 1).

              (default=off)

              For  partition  function  folding  this  only  disallows  pairs that can only occur
              isolated. Other pairs may still occasionally occur as helices of length 1.

       --noGU Do not allow GU pairs

              (default=off)

       --noClosingGU
              Do not allow GU pairs at the end of helices

              (default=off)

       -P, --paramFile=paramfile
              Read energy parameters from paramfile, instead of using the default parameter set.

              Different sets  of  energy  parameters  for  RNA  and  DNA  should  accompany  your
              distribution.   See  the  RNAlib documentation for details on the file format. When
              passing the placeholder file name "DNA", DNA parameters are loaded without the need
              to actually specify any input file.

       --nsp=STRING
              Allow other pairs in addition to the usual AU,GC,and GU pairs.

              Its  argument is a comma separated list of additionally allowed pairs. If the first
              character is a "-" then AB will imply that AB  and  BA  are  allowed  pairs.   e.g.
              RNAcofold  -nsp  -GA   will  allow  GA  and AG pairs. Nonstandard pairs are given 0
              stacking energy.

       -e, --energyModel=INT
              Rarely used option to fold sequences from the artificial ABCD... alphabet, where  A
              pairs B, C-D etc.  Use the energy parameters for GC (-e 1) or AU (-e 2) pairs.

       --betaScale=DOUBLE
              Set the scaling of the Boltzmann factors (default=`1.')

              The  argument  provided  with  this  option  enables  to  scale  the  thermodynamic
              temperature used in the Boltzmann factors independently from the  temperature  used
              to  scale  the  individual  energy  contributions  of the loop types. The Boltzmann
              factors then become exp(-dG/(kT*betaScale)) where k is the Boltzmann  constant,  dG
              the free energy contribution of the state and T the absolute temperature.

REFERENCES

       If you use this program in your work you might want to cite:

       R.  Lorenz, S.H. Bernhart, C. Hoener zu Siederdissen, H. Tafer, C. Flamm, P.F. Stadler and
       I.L. Hofacker (2011), "ViennaRNA Package 2.0", Algorithms for Molecular Biology: 6:26

       I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M.  Tacker,  P.  Schuster  (1994),
       "Fast  Folding and Comparison of RNA Secondary Structures", Monatshefte f. Chemie: 125, pp
       167-188

       R.  Lorenz,  I.L.  Hofacker,  P.F.  Stadler  (2016),  "RNA  folding  with  hard  and  soft
       constraints", Algorithms for Molecular Biology 11:1 pp 1-13

       S.H.Bernhart, Ch. Flamm, P.F. Stadler, I.L. Hofacker, (2006), "Partition Function and Base
       Pairing Probabilities of RNA Heterodimers", Algorithms Mol. Biol.

       The energy parameters are taken from:

       D.H. Mathews, M.D. Disney, D. Matthew, J.L. Childs, S.J. Schroeder, J.  Susan,  M.  Zuker,
       D.H.  Turner  (2004),  "Incorporating  chemical  modification  constraints  into a dynamic
       programming algorithm for prediction of RNA secondary structure", Proc. Natl.  Acad.  Sci.
       USA: 101, pp 7287-7292

       D.H  Turner,  D.H.  Mathews  (2009),  "NNDB:  The  nearest neighbor parameter database for
       predicting stability of nucleic acid secondary structure", Nucleic Acids Research: 38,  pp
       280-282

AUTHOR

       Ivo L Hofacker, Peter F Stadler, Stephan Bernhart, Ronny Lorenz

REPORTING BUGS

       If  in  doubt  our  program  is  right,  nature  is  at fault.  Comments should be sent to
       rna@tbi.univie.ac.at.