lunar (1) AnalyseSeqs.1.gz

Provided by: vienna-rna_2.5.1+dfsg-1_amd64 bug

NAME

       AnalyseSeqs - Analyse a set of sequences of common length

SYNOPSIS

       AnalyseSeqs [-X[bswn]] [-Q] [-M{mask}[+|!]] [-D{H|A|G}] [-d{S|H|D|B}]

DESCRIPTION

       AnalyseSeqs  reads  a  set  of  sequences  from  stdin  and tries a variety of methods for
       sequence analysis on them. Currently available are:
       Statistical geometry for quadruples of sequences; THIS IS PRELIMINARY AND NOT WELL  TESTED
       BY NOW.
       split  decomposition;  neighbour  joining  and  Ward's  variance method for reconstructing
       phylogenies using various distance measures.  For statistical  geometry  and  the  cluster
       methods PostScript output is available.
       The  program  continues reading until it encounters one of the separator characters '@' or
       '%'. Only sequences of alphabetical characters or of a specified alphabet  are  processed,
       all  other  lines  are  ignored.  The program stops reading if it either encounters an EOF
       condition, or if there are no  valid  sequence  data  between  two  lines  beginning  with
       separator characters.
       A  list  of  taxa  names can be specified in the input stream. The list begins with a line
       beginning with '*'. Optionally, a file name prefix [fn] for the PostScript output  can  be
       specified  in  this line.  The entries have the form 'x : Taxon', where x is the number of
       taxon, i.e., of the corresponding entry in the list of input sequences. The taxa list need
       not  be  complete.  It  must  end,  however,  with a line beginning with '*' or any of the
       separator characters. The taxa list is printed on top of the output.  The  specified  taxa
       names are used as labels in the PostScript output.

OPTIONS

       -X[bswn]
              specifies the analysis methods to be used.

       [b]    Statistical  Geometry.  A  PostScript  file  named '[fn_]box.ps' giving a graphical
              representation of the statistical geometry is created. The resulting box is a  good
              measure of 'tree likeness' of the data set.  This is the default.

       [s]    Split decomposition.

       [w]    Cluster  analysis  using  Ward's method. A PostScript file named '[fn_]wards.ps' is
              created containing a drawing of the tree.

       [n]    Cluster analysis using Saitou's neighbour joining method. A PostScript  file  named
              '[fn_]nj.ps' is created containing a drawing of the tree.

       -Q     indicates  that  a  statistical geometry analysis is to be performed comparing four
              data sets, for instance to confirm the significance of a proposed  phylogeny.  This
              option  is only useful for statistical geometry analysis and hence the -X option is
              ignored. Each of the four data sets must be of the form
              * [filename_prefix]
              # number
              [list of taxa names]
              *
              list of sequences
              %
              where number is 1,2,3,4 for the four groups to be compared.

       -M{mask}[+|!]
              allows one to specify a mask for the  input  file.  '{mask}'  can  be  one  of  the
              following  letters  indicating  a predefined alphabet or the %-sign followed by all
              characters to be accepted. A + sign at the very end of the mask indicates that  the
              input  is to be handled case sensitive. Default is conversion of the input to upper
              case. A ! sign can be used to convert the input data  to  RY  code:  GgAaXx  ->  R,
              UuCcKkTt -> Y, all other letters are converted to *.

       -Ma    all letters A-Z and a-z.

       -Mu    uppercase letters.

       -Ml    lowercase letters.

       -Mc    digits [0-9].

       -Mn    all alphanumeric characters.

       -MR    RNA alphabet (GCAUgcau).

       -MD    DNA alphabet (GCATgcat).

       -MA    Amino acids in one-letter code.

       -MS    Secondary strcutures coded as '^.()'

       -M%alphabet
              use the specified alphabet.

       -D     specifies the algorithm to be used for calculating the distance matrix of the input
              data set. Available are

       -DH    Hamming Distance

       -DA[,cost]
              Simple alignment distance according to Needleman and Wunsch.  A gap cost  different
              from 1. can be specified after the comma.

       -DG[,cost1,cost2]
              Gotoh's  distance with gap cost function g(k) = cost2+cost1*(k-1). cost2<=cost1 has
              to be fulfilled.  Default values are cost1=1., cost2=1., yielding the same distance
              as option A.
              ONLY THE HAMMING DISTANCE IS WELL TESTED BY NOW !!!

       -d     specifies the edit cost matrix to be used. Available are

       -dS    simple  distance.  Indel  and substitution of different characters all have cost 1.
              The indel cost can be set by specifying the gap costs with  the  algorithm  options
              -DA and -DG. This is the default.

       -dH    A  distance  matrix  for RNA secondary structures. Inspired by Hogeweg's similarity
              measure (J.Mol.Biol 1988).  Gap-function is set automatically.

       -dD    Dayhoff's matrix for amino acid distances.

       -dB    Distinguish purines and pyrimidines only.  CAUTION this option of course influences
              only  the  calculation  of  distances.   It  does  NOT  affect  computation  of the
              statistical geometry. This is done directly on the sequences. If  you  want  to  do
              statistical  geometry  on  RY  sequences  use  the  !  sign with the -M option, for
              instance -MR!.

REFERENCES

       The method of statistical geometry has been introduced by M. Eigen, R.  Winkler-Oswatitsch
       and  A.W.M.  Dress  (Proc Natl Acad Sci, 85:1988,5912).  The method of split decomposition
       was proposed by H.J. Bandelt and A.W.M. Dress (Adv Math, 92:1992,47).  The variance method
       for  cluster  analysis  is due to H.J. Ward (J Amer Stat Ass, 58:1963,236).  The neighbour
       joining method was published by Saitou and Nei (Mol Biol Evol, 4:1987,406).

       This program is part of the Vienna RNA Package

WARNING

       This is the beta test version. Some options or combinations of options may  still  produce
       nonsense. Please send bug reports to ivo@tbi.univie.ac.at.

VERSION

       This man page is part of the Vienna RNA Package version 1.2.

AUTHOR

       Peter F Stadler, Ivo L. Hofacker.

BUGS

       Comments should be sent to ivo@itc.univie.ac.at.

                                                                                   ANALYSESEQS(1)