Provided by: emboss_6.6.0-1_amd64 bug

NAME

       emma - Multiple sequence alignment (ClustalW wrapper)

SYNOPSIS

       emma -sequence seqall [-onlydend toggle] -dend toggle -dendfile infile [-slow toggle] -pwmatrix list
            -pwdnamatrix list -usermatrix variable -pairwisedatafile infile -matrix list -usermamatrix variable
            -dnamatrix list -umamatrix variable -mamatrixfile infile -pwgapopen float -pwgapextend float
            -ktup integer -gapw integer -topdiags integer -window integer -nopercent boolean [-gapopen float]
            [-gapextend float] [-endgaps boolean] [-gapdist integer] -norgap boolean -hgapres string
            -nohgap boolean [-maxdiv integer] -outseq seqoutset -dendoutfile outfile

       emma -help

DESCRIPTION

       emma is a command line program from EMBOSS (“the European Molecular Biology Open Software Suite”). It is
       part of the "Alignment:Multiple" command group(s).

OPTIONS

   Input section
       -sequence seqall

       -onlydend toggle
           Default value: N

       -dend toggle
           Default value: N

       -dendfile infile

       -slow toggle
           A distance is calculated between every pair of sequences and these are used to construct the
           dendrogram which guides the final multiple alignment. The scores are calculated from separate
           pairwise alignments. These can be calculated using 2 methods: dynamic programming (slow but accurate)
           or by the method of Wilbur and Lipman (extremely fast but approximate). The slow-accurate method is
           fine for short sequences but will be VERY SLOW for many (e.g. >100) long (e.g. >1000 residue)
           sequences. Default value: Y

   Pairwise align options
       -pwmatrix list
           The scoring table which describes the similarity of each amino acid to each other. There are three
           'in-built' series of weight matrices offered. Each consists of several matrices which work
           differently at different evolutionary distances. To see the exact details, read the documentation.
           Crudely, we store several matrices in memory, spanning the full range of amino acid distance (from
           almost identical sequences to highly divergent ones). For very similar sequences, it is best to use a
           strict weight matrix which only gives a high score to identities and the most favoured conservative
           substitutions. For more divergent sequences, it is appropriate to use 'softer' matrices which give a
           high score to many other frequent substitutions. 1) BLOSUM (Henikoff). These matrices appear to be
           the best available for carrying out data base similarity (homology searches). The matrices used are:
           Blosum80, 62, 45 and 30. 2) PAM (Dayhoff). These have been extremely widely used since the late '70s.
           We use the PAM 120, 160, 250 and 350 matrices. 3) GONNET . These matrices were derived using almost
           the same procedure as the Dayhoff one (above) but are much more up to date and are based on a far
           larger data set. They appear to be more sensitive than the Dayhoff series. We use the GONNET 40, 80,
           120, 160, 250 and 350 matrices. We also supply an identity matrix which gives a score of 1.0 to two
           identical amino acids and a score of zero otherwise. This matrix is not very useful. Default value: b

       -pwdnamatrix list
           The scoring table which describes the scores assigned to matches and mismatches (including IUB
           ambiguity codes). Default value: i

       -usermatrix variable

       -pairwisedatafile infile

   Matrix options
       -matrix list
           This gives a menu where you are offered a choice of weight matrices. The default for proteins is the
           PAM series derived by Gonnet and colleagues. Note, a series is used! The actual matrix that is used
           depends on how similar the sequences to be aligned at this alignment step are. Different matrices
           work differently at each evolutionary distance. There are three 'in-built' series of weight matrices
           offered. Each consists of several matrices which work differently at different evolutionary
           distances. To see the exact details, read the documentation. Crudely, we store several matrices in
           memory, spanning the full range of amino acid distance (from almost identical sequences to highly
           divergent ones). For very similar sequences, it is best to use a strict weight matrix which only
           gives a high score to identities and the most favoured conservative substitutions. For more divergent
           sequences, it is appropriate to use 'softer' matrices which give a high score to many other frequent
           substitutions. 1) BLOSUM (Henikoff). These matrices appear to be the best available for carrying out
           data base similarity (homology searches). The matrices used are: Blosum80, 62, 45 and 30. 2) PAM
           (Dayhoff). These have been extremely widely used since the late '70s. We use the PAM 120, 160, 250
           and 350 matrices. 3) GONNET . These matrices were derived using almost the same procedure as the
           Dayhoff one (above) but are much more up to date and are based on a far larger data set. They appear
           to be more sensitive than the Dayhoff series. We use the GONNET 40, 80, 120, 160, 250 and 350
           matrices. We also supply an identity matrix which gives a score of 1.0 to two identical amino acids
           and a score of zero otherwise. This matrix is not very useful. Alternatively, you can read in your
           own (just one matrix, not a series). Default value: b

       -usermamatrix variable

       -dnamatrix list
           This gives a menu where a single matrix (not a series) can be selected. Default value: i

       -umamatrix variable

       -mamatrixfile infile

   Additional section
   Slow align options
       -pwgapopen float
           The penalty for opening a gap in the pairwise alignments. Default value: 10.0

       -pwgapextend float
           The penalty for extending a gap by 1 residue in the pairwise alignments. Default value: 0.1

   Fast align options
       -ktup integer
           This is the size of exactly matching fragment that is used. INCREASE for speed (max= 2 for proteins;
           4 for DNA), DECREASE for sensitivity. For longer sequences (e.g. >1000 residues) you may need to
           increase the default. Default value: @($(acdprotein)?1:2)

       -gapw integer
           This is a penalty for each gap in the fast alignments. It has little affect on the speed or
           sensitivity except for extreme values. Default value: @($(acdprotein)?3:5)

       -topdiags integer
           The number of k-tuple matches on each diagonal (in an imaginary dot-matrix plot) is calculated. Only
           the best ones (with most matches) are used in the alignment. This parameter specifies how many.
           Decrease for speed; increase for sensitivity. Default value: @($(acdprotein)?5:4)

       -window integer
           This is the number of diagonals around each of the 'best' diagonals that will be used. Decrease for
           speed; increase for sensitivity. Default value: @($(acdprotein)?5:4)

       -nopercent boolean
           Default value: N

   Gap options
       -gapopen float
           The penalty for opening a gap in the alignment. Increasing the gap opening penalty will make gaps
           less frequent. Default value: 10.0

       -gapextend float
           The penalty for extending a gap by 1 residue. Increasing the gap extension penalty will make gaps
           shorter. Terminal gaps are not penalised. Default value: 5.0

       -endgaps boolean
           End gap separation: treats end gaps just like internal gaps for the purposes of avoiding gaps that
           are too close (set by 'gap separation distance'). If you turn this off, end gaps will be ignored for
           this purpose. This is useful when you wish to align fragments where the end gaps are not biologically
           meaningful. Default value: Y

       -gapdist integer
           Gap separation distance: tries to decrease the chances of gaps being too close to each other. Gaps
           that are less than this distance apart are penalised more than other gaps. This does not prevent
           close gaps; it makes them less frequent, promoting a block-like appearance of the alignment. Default
           value: 8

       -norgap boolean
           Residue specific penalties: amino acid specific gap penalties that reduce or increase the gap opening
           penalties at each position in the alignment or sequence. As an example, positions that are rich in
           glycine are more likely to have an adjacent gap than positions that are rich in valine. Default
           value: N

       -hgapres string
           This is a set of the residues 'considered' to be hydrophilic. It is used when introducing Hydrophilic
           gap penalties. Default value: GPSNDQEKR

       -nohgap boolean
           Hydrophilic gap penalties: used to increase the chances of a gap within a run (5 or more residues) of
           hydrophilic amino acids; these are likely to be loop or random coil regions where gaps are more
           common. The residues that are 'considered' to be hydrophilic are set by '-hgapres'. Default value: N

       -maxdiv integer
           This switch, delays the alignment of the most distantly related sequences until after the most
           closely related sequences have been aligned. The setting shows the percent identity level required to
           delay the addition of a sequence; sequences that are less identical than this level to any other
           sequences will be aligned later. Default value: 30

   Output section
       -outseq seqoutset

       -dendoutfile outfile

BUGS

       Bugs can be reported to the Debian Bug Tracking system (http://bugs.debian.org/emboss), or directly to
       the EMBOSS developers (http://sourceforge.net/tracker/?group_id=93650&atid=605031).

SEE ALSO

       emma is fully documented via the tfm(1) system.

AUTHOR

       Debian Med Packaging Team <debian-med-packaging@lists.alioth.debian.org>
           Wrote the script used to autogenerate this manual page.

COPYRIGHT

       This manual page was autogenerated from an Ajax Control Definition of the EMBOSS package. It can be
       redistributed under the same terms as EMBOSS itself.