Ubuntu Manpage: Bio::Tools::Run::Alignment::TCoffee - Object for the calculation of a multiple sequence

Provided by: libbio-perl-run-perl_1.6.9-2_all

NAME

       Bio::Tools::Run::Alignment::TCoffee - Object for the calculation of a multiple sequence
       alignment from a set of unaligned sequences or alignments using the TCoffee program

SYNOPSIS

         # Build a tcoffee alignment factory
         @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
         $factory = Bio::Tools::Run::Alignment::TCoffee->new(@params);

         # Pass the factory a list of sequences to be aligned.
         $inputfilename = 't/cysprot.fa';
         # $aln is a SimpleAlign object.
         $aln = $factory->align($inputfilename);

         # or where @seq_array is an array of Bio::Seq objects
         $seq_array_ref = \@seq_array;
         $aln = $factory->align($seq_array_ref);

         # Or one can pass the factory a pair of (sub)alignments
         #to be aligned against each other, e.g.:

         # where $aln1 and $aln2 are Bio::SimpleAlign objects.
         $aln = $factory->profile_align($aln1,$aln2);

         # Or one can pass the factory an alignment and one or more
         # unaligned sequences to be added to the alignment. For example:

         # $seq is a Bio::Seq object.
         $aln = $factory->profile_align($aln1,$seq);

         #There are various additional options and input formats available.
         #See the DESCRIPTION section that follows for additional details.

DESCRIPTION

       Note: this DESCRIPTION only documents the (Bio)perl interface to TCoffee.

   Helping the module find your executable
       You will need to enable TCoffee to find the t_coffee program. This can be done in (at
       least) three ways:

        1. Make sure the t_coffee executable is in your path so that
           which t_coffee returns a t_coffee executable on your system.

        2. Define an environmental variable TCOFFEEDIR which is a dir
           which contains the 't_coffee' app:
           In bash
           export TCOFFEEDIR=/home/username/progs/T-COFFEE_distribution_Version_1.37/bin
           In csh/tcsh
           setenv TCOFFEEDIR /home/username/progs/T-COFFEE_distribution_Version_1.37/bin

        3. Include a definition of an environmental variable TCOFFEEDIR in
           every script that will use this TCoffee wrapper module.
           BEGIN { $ENV{TCOFFEDIR} = '/home/username/progs/T-COFFEE_distribution_Version_1.37/bin' }
           use Bio::Tools::Run::Alignment::TCoffee;

       If you are running an application on a webserver make sure the webserver environment has
       the proper PATH set or use the options 2 or 3 to set the variables.

PARAMETERS FOR ALIGNMENT COMPUTATION

There are a number of possible parameters one can pass in TCoffee. One should really read
the online manual for the best explanation of all the features. See
http://igs-server.cnrs-mrs.fr/~cnotred/Documentation/t_coffee/t_coffee_doc.html

These can be specified as parameters when instantiating a new TCoffee object, or through
get/set methods of the same name (lowercase).

IN
Title : IN
Description : (optional) input filename, this is specified when
align so should not use this directly unless one
understand TCoffee program very well.

TYPE
Title : TYPE
Args : [string] DNA, PROTEIN
Description : (optional) set the sequence type, guessed automatically
so should not use this directly

PARAMETERS
Title : PARAMETERS
Description : (optional) Indicates a file containing extra parameters

EXTEND
Title : EXTEND
Args : 0, 1, or positive value
Default : 1
Description : Flag indicating that library extension should be
carried out when performing multiple alignments, if set
to 0 then extension is not made, if set to 1 extension
is made on all pairs in the library. If extension is
set to another positive value, the extension is only
carried out on pairs having a weigth value superior to
the specified limit.

DP_NORMALISE
Title : DP_NORMALISE
Args : 0 or positive value
Default : 1000
Description : When using a value different from 0, this flag sets the
score of the highest scoring pair to 1000.

DP_MODE
Title : DP_MODE
Args : [string] gotoh_pair_wise, myers_miller_pair_wise,
fasta_pair_wise cfasta_pair_wise
Default : cfast_fair_wise
Description : Indicates the type of dynamic programming used by
the program

gotoh_pair_wise : implementation of the gotoh algorithm
(quadratic in memory and time)

myers_miller_pair_wise : implementation of the Myers and Miller
dynamic programming algorithm ( quadratic in time and linear in
space). This algorithm is recommended for very long sequences. It
is about 2 time slower than gotoh. It only accepts tg_mode=1.

fasta_pair_wise: implementation of the fasta algorithm. The
sequence is hashed, looking for ktuples words. Dynamic programming
is only carried out on the ndiag best scoring diagonals. This is
much faster but less accurate than the two previous.

cfasta_pair_wise : c stands for checked. It is the same
algorithm. The dynamic programming is made on the ndiag best
diagonals, and then on the 2*ndiags, and so on until the scores
converge. Complexity will depend on the level of divergence of the
sequences, but will usually be L*log(L), with an accuracy
comparable to the two first mode ( this was checked on BaliBase).

KTUPLE
Title : KTUPLE
Args : numeric value
Default : 1 or 2 (1 for protein, 2 for DNA )

Description : Indicates the ktuple size for cfasta_pair_wise dp_mode
and fasta_pair_wise. It is set to 1 for proteins, and 2
for DNA. The alphabet used for protein is not the 20
letter code, but a mildly degenerated version, where
some residues are grouped under one letter, based on
physicochemical properties:
rk, de, qh, vilm, fy (the other residues are
not degenerated).

NDIAGS
Title : NDIAGS
Args : numeric value
Default : 0
Description : Indicates the number of diagonals used by the
fasta_pair_wise algorithm. When set to 0,
n_diag=Log (length of the smallest sequence)

DIAG_MODE
Title : DIAG_MODE
Args : numeric value
Default : 0

Description : Indicates the manner in which diagonals are scored
during the fasta hashing.

0 indicates that the score of a diagonal is equal to the
sum of the scores of the exact matches it contains.

1 indicates that this score is set equal to the score of
the best uninterrupted segment

1 can be useful when dealing with fragments of sequences.

SIM_MATRIX
Title : SIM_MATRIX
Args : string
Default : vasiliky
Description : Indicates the manner in which the amino acid is being
degenerated when hashing. All the substitution matrix
are acceptable. Categories will be defined as sub-group
of residues all having a positive substitution score
(they can overlap).

If you wish to keep the non degenerated amino acid
alphabet, use 'idmat'

MATRIX
Title : MATRIX
Args :
Default :
Description : This flag is provided for compatibility with
ClustalW. Setting matrix = 'blosum' is equivalent to
-in=Xblosum62mt , -matrix=pam is equivalent to
in=Xpam250mt . Apart from this, the rules are similar
to those applying when declaring a matrix with the
-in=X fl

GAPOPEN
Title : GAPOPEN
Args : numeric
Default : 0
Description : Indicates the penalty applied for opening a gap. The
penalty must be negative. If you provide a positive
value, it will automatically be turned into a negative
number. We recommend a value of 10 with pam matrices,
and a value of 0 when a library is used.

GAPEXT
Title : GAPEXT
Args : numeric
Default : 0
Description : Indicates the penalty applied for extending a gap.

COSMETIC_PENALTY
Title : COSMETIC_PENALTY
Args : numeric
Default : 100
Description : Indicates the penalty applied for opening a gap. This
penalty is set to a very low value. It will only have
an influence on the portions of the alignment that are
unalignable. It will not make them more correct, but
only more pleasing to the eye ( i.e. Avoid stretches of
lonely residues).

The cosmetic penalty is automatically turned off if a
substitution matrix is used rather than a library.

TG_MODE
Title : TG_MODE
Args : 0,1,2
Default : 1
Description : (Terminal Gaps)
0: indicates that terminal gaps must be panelized with
a gapopen and a gapext penalty.
1: indicates that terminal gaps must be penalized only
with a gapext penalty
2: indicates that terminal gaps must not be penalized.

WEIGHT
Title : WEIGHT
Args : sim or sim_<matrix_name or matrix_file> or integer value
Default : sim

Description : Weight defines the way alignments are weighted when
turned into a library.

sim indicates that the weight equals the average
identity within the match residues.

sim_matrix_name indicates the average identity with two
residues regarded as identical when their
substitution value is positive. The valid matrices
names are in matrices.h (pam250mt) . Matrices not
found in this header are considered to be
filenames. See the format section for matrices. For
instance, -weight=sim_pam250mt indicates that the
grouping used for similarity will be the set of
classes with positive substitutions. Other groups
include

sim_clustalw_col ( categories of clustalw
marked with :)

sim_clustalw_dot ( categories of clustalw
marked with .)

Value indicates that all the pairs found in the
alignments must be given the same weight equal to
value. This is useful when the alignment one wishes to
turn into a library must be given a pre-specified score
(for instance if they come from a structure
super-imposition program). Value is an integer:

-weight=1000

Note : Weight only affects methods that return an alignment to
T-Coffee, such as ClustalW. On the contrary, the
version of Lalign we use here returns a library where
weights have already been applied and are therefore
insensitive to the -weight flag.

SEQ_TO_ALIGN
Title : SEQ_TO_ALIGN
Args : filename
Default : no file - align all the sequences

Description : You may not wish to align all the sequences brought in
by the -in flag. Supplying the seq_to_align flag allows
for this, the file is simply a list of names in Fasta
format.

However, note that library extension will be carried out
on all the sequences.

PARAMETERS FOR TREE COMPUTATION AND OUTPUT

   NEWTREE
        Title       : NEWTREE
        Args        : treefile
        Default     : no file
        Description : Indicates the name of the new tree to compute. The
                      default will be <sequence_name>.dnd, or <run_name.dnd>.
                      Format is Phylip/Newick tree format

   USETREE
        Title       : USETREE
        Args        : treefile
        Default     : no file specified
        Description : This flag indicates that rather than computing a new
                      dendrogram, t_coffee can use a pre-computed one. The
                      tree files are in phylips format and compatible with
                      ClustalW. In most cases, using a pre-computed tree will
                      halve the computation time required by t_coffee. It is
                      also possible to use trees output by ClustalW or
                      Phylips. Format is Phylips tree format

   TREE_MODE
        Title       : TREE_MODE
        Args        : slow, fast, very_fast
        Default     : very_fast
        Description : This flag indicates the method used for computing the
                      dendrogram.
                      slow : the chosen dp_mode using the extended library,
                      fast : The fasta dp_mode using the extended library.
                      very_fast: The fasta dp_mode using pam250mt.

   QUICKTREE
        Title       : QUICKTREE
        Args        :
        Default     :
        Description : This flag is kept for compatibility with ClustalW.
                      It indicates that:  -tree_mode=very_fast

PARAMETERS FOR ALIGNMENT OUTPUT

   OUTFILE
        Title       : OUTFILE
        Args        : out_aln file, default, no
        Default     : default ( yourseqfile.aln)
        Description : indicates name of output alignment file

   OUTPUT
        Title       : OUTPUT
        Args        : format1, format2
        Default     : clustalw
        Description : Indicated format for outputting outputfile
                      Supported formats are:

                      clustalw_aln, clustalw: ClustalW format.
                      gcg, msf_aln : Msf alignment.
                      pir_aln : pir alignment.
                      fasta_aln : fasta alignment.
                      phylip : Phylip format.
                      pir_seq : pir sequences (no gap).
                      fasta_seq : fasta sequences (no gap).
           As well as:
                       score_html : causes the output to be a reliability
                                    plot in HTML
                       score_pdf : idem in PDF.
                       score_ps : idem in postscript.

           More than one format can be indicated:
                       -output=clustalw,gcg, score_html

   CASE
        Title       : CASE
        Args        : upper, lower
        Default     : upper
        Description : triggers choice of the case for output

   CPU
        Title       : CPU
        Args        : value
        Default     : 0
        Description : Indicates the cpu time (micro seconds) that must be
                      added to the t_coffee computation time.

   OUT_LIB
        Title       : OUT_LIB
        Args        : name of library, default, no
        Default     : default
        Description : Sets the name of the library output. Default implies
                      <run_name>.tc_lib

   OUTORDER
        Title       : OUTORDER
        Args        : input or aligned
        Default     : input
        Description : Sets the name of the library output. Default implies
                      <run_name>.tc_lib

   SEQNOS
        Title       : SEQNOS
        Args        : on or off
        Default     : off
        Description : Causes the output alignment to contain residue numbers
                      at the end of each line:

PARAMETERS FOR GENERIC OUTPUT

   RUN_NAME
        Title       : RUN_NAME
        Args        : your run name
        Default     :
        Description : This flag causes the prefix <your sequences> to be
                      replaced by <your run name> when renaming the default
                      files.

   ALIGN
        Title       : ALIGN
        Args        :
        Default     :
        Description : Indicates that the program must produce the
                      alignment. This flag is here for compatibility with
                      ClustalW

   QUIET
        Title       : QUIET
        Args        : stderr, stdout, or filename, or nothing
        Default     : stderr
        Description : Redirects the standard output to either a file.
                     -quiet on its own redirect the output to /dev/null.

   CONVERT
        Title       : CONVERT
        Args        :
        Default     :
        Description : Indicates that the program must not compute the
                      alignment but simply convert all the sequences,
                      alignments and libraries into the format indicated with
                      -output. This flag can also be used if you simply want
                      to compute a library ( i.e. You have an alignment and
                      you want to turn it into a library).

FEEDBACK

   Mailing Lists
       User feedback is an integral part of the evolution of this and other Bioperl modules. Send
       your comments and suggestions preferably to one of the Bioperl mailing lists.  Your
       participation is much appreciated.

         bioperl-l@bioperl.org                  - General discussion
         http://bioperl.org/wiki/Mailing_lists  - About the mailing lists

   Support
       Please direct usage questions or support issues to the mailing list:

       bioperl-l@bioperl.org

       rather than to the module maintainer directly. Many experienced and reponsive experts will
       be able look at the problem and quickly address it. Please include a thorough description
       of the problem with code and data examples if at all possible.

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their
       resolution.  Bug reports can be submitted via the web:

        http://redmine.open-bio.org/projects/bioperl/

AUTHOR - Jason Stajich, Peter Schattner

       Email jason-at-bioperl-dot-org, schattner@alum.mit.edu

APPENDIX

       The rest of the documentation details each of the object methods. Internal methods are
       usually preceded with a _

   program_name
        Title   : program_name
        Usage   : $factory->program_name()
        Function: holds the program name
        Returns:  string
        Args    : None

   program_dir
        Title   : program_dir
        Usage   : $factory->program_dir(@params)
        Function: returns the program directory, obtained from ENV variable.
        Returns:  string
        Args    :

   error_string
        Title   : error_string
        Usage   : $obj->error_string($newval)
        Function: Where the output from the last analysus run is stored.
        Returns : value of error_string
        Args    : newvalue (optional)

   version
        Title   : version
        Usage   : exit if $prog->version() < 1.8
        Function: Determine the version number of the program
        Example :
        Returns : float or undef
        Args    : none

   run
        Title   : run
        Usage   : my $output = $application->run(-seq     => $seq,
                                                 -profile => $profile,
                                                 -type    => 'profile-aln');
        Function: Generic run of an application
        Returns : Bio::SimpleAlign object
        Args    : key-value parameters allowed for TCoffee runs AND
                  -type     => profile-aln or alignment for profile alignments or
                               just multiple sequence alignment
                  -seq      => either Bio::PrimarySeqI object OR
                               array ref of Bio::PrimarySeqI objects OR
                               filename of sequences to run with
                  -profile  => profile to align to, if this is an array ref
                               will specify the first two entries as the two
                               profiles to align to each other

   align
        Title   : align
        Usage   :
               $inputfilename = 't/data/cysprot.fa';
               $aln = $factory->align($inputfilename);
       or
               $seq_array_ref = \@seq_array;
               # @seq_array is array of Seq objs
               $aln = $factory->align($seq_array_ref);
        Function: Perform a multiple sequence alignment
        Returns : Reference to a SimpleAlign object containing the
                  sequence alignment.
        Args    : Name of a file containing a set of unaligned fasta sequences
                  or else an array of references to Bio::Seq objects.

        Throws an exception if argument is not either a string (eg a
        filename) or a reference to an array of Bio::Seq objects.  If
        argument is string, throws exception if file corresponding to string
        name can not be found. If argument is Bio::Seq array, throws
        exception if less than two sequence objects are in array.

   profile_align
        Title   : profile_align
        Usage   :
        Function: Perform an alignment of 2 (sub)alignments
        Example :
        Returns : Reference to a SimpleAlign object containing the (super)alignment.
        Args    : Names of 2 files containing the subalignments
                  or references to 2 Bio::SimpleAlign objects.
        Note    : Needs to be updated to run with newer TCoffee code, which
                  allows more than two profile alignments.

       Throws an exception if arguments are not either strings (eg filenames) or references to
       SimpleAlign objects.

   _run
        Title   :  _run
        Usage   :  Internal function, not to be called directly
        Function:  makes actual system call to tcoffee program
        Example :
        Returns : nothing; tcoffee output is written to a
                  temporary file OR specified output file
        Args    : Name of a file containing a set of unaligned fasta sequences
                  and hash of parameters to be passed to tcoffee

   _setinput
        Title   :  _setinput
        Usage   :  Internal function, not to be called directly
        Function:  Create input file for tcoffee program
        Example :
        Returns : name of file containing tcoffee data input AND
                  type of file (if known, S for sequence, L for sequence library,
                  A for sequence alignment)
        Args    : Seq or Align object reference or input file name

   _setparams
        Title   :  _setparams
        Usage   :  Internal function, not to be called directly
        Function:  Create parameter inputs for tcoffee program
        Example :
        Returns : parameter string to be passed to tcoffee
                  during align or profile_align
        Args    : name of calling object

   aformat
        Title   : aformat
        Usage   : my $alignmentformat = $self->aformat();
        Function: Get/Set alignment format
        Returns : string
        Args    : string

   methods
        Title   : methods
        Usage   : my @methods = $self->methods()
        Function: Get/Set Alignment methods - NOT VALIDATED
        Returns : array of strings
        Args    : arrayref of strings

Bio::Tools::Run::BaseWrapper methods

   no_param_checks
        Title   : no_param_checks
        Usage   : $obj->no_param_checks($newval)
        Function: Boolean flag as to whether or not we should
                  trust the sanity checks for parameter values
        Returns : value of no_param_checks
        Args    : newvalue (optional)

   save_tempfiles
        Title   : save_tempfiles
        Usage   : $obj->save_tempfiles($newval)
        Function:
        Returns : value of save_tempfiles
        Args    : newvalue (optional)

   outfile_name
        Title   : outfile_name
        Usage   : my $outfile = $tcoffee->outfile_name();
        Function: Get/Set the name of the output file for this run
                  (if you wanted to do something special)
        Returns : string
        Args    : [optional] string to set value to

   tempdir
        Title   : tempdir
        Usage   : my $tmpdir = $self->tempdir();
        Function: Retrieve a temporary directory name (which is created)
        Returns : string which is the name of the temporary directory
        Args    : none

   cleanup
        Title   : cleanup
        Usage   : $tcoffee->cleanup();
        Function: Will cleanup the tempdir directory
        Returns : none
        Args    : none

   io
        Title   : io
        Usage   : $obj->io($newval)
        Function:  Gets a L<Bio::Root::IO> object
        Returns : L<Bio::Root::IO>
        Args    : none