Note: this DESCRIPTION only documents the Bioperl interface to
Clustalw. Clustalw, itself, is a large & complex program - for more
information regarding clustalw, please see the clustalw documentation which
accompanies the clustalw distribution. Clustalw is available from (among
others) ftp://ftp.ebi.ac.uk/pub/software/. Clustalw.pm has only been tested
using version 1.8 of clustalw. Compatibility with earlier versions of the
clustalw program is currently unknown. Before running Clustalw successfully
it will be necessary: to install clustalw on your system, and to ensure that
users have execute privileges for the clustalw program.
Bio::Tools::Run::Alignment::Clustalw is an object for performing a
multiple sequence alignment from a set of unaligned sequences and/or
sub-alignments by means of the clustalw program.
Initially, a clustalw "factory object" is created.
Optionally, the factory may be passed most of the parameters or switches of
the clustalw program, e.g.:
@params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
$factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);
Any parameters not explicitly set will remain as the defaults of
the clustalw program. Additional parameters and switches (not available in
clustalw) may also be set. Currently, the only such parameter is
"quiet", which when set to a non-zero value, suppresses clustalw
terminal output. Not all clustalw parameters are supported at this
stage.
By default, Clustalw output is returned solely in a the form of a
Bio::SimpleAlign object which can then be printed and/or saved in multiple
formats using the AlignIO.pm module. Optionally the raw clustalw output file
can be saved if the calling script specifies an output file (with the
clustalw parameter OUTFILE). Currently only the GCG-MSF output file formats
is supported.
Not all parameters and features have been implemented yet in Perl
format.
Alignment parameters can be changed and/or examined at any time
after the factory has been created. The program checks that any
parameter/switch being set/read is valid. However, currently no additional
checks are included to check that parameters are of the proper type (eg
string or numeric) or that their values are within the proper range. As an
example, to change the value of the clustalw parameter ktuple to 3 and
subsequently to check its value one would write:
$ktuple = 3;
$factory->ktuple($ktuple);
$get_ktuple = $factory->ktuple();
Once the factory has been created and the appropriate parameters
set, one can call the method align() to align a set of unaligned
sequences, or call profile_align() to add one or more sequences or a
second alignment to an initial alignment.
Input to align() may consist of a set of unaligned
sequences in the form of the name of file containing the sequences. For
example,
$inputfilename = 't/data/cysprot.fa';
$aln = $factory-E<gt>align($inputfilename);
Alternately one can create an array of Bio::Seq objects
somehow
$str = Bio::SeqIO->new(-file=> 't/data/cysprot.fa', -format => 'Fasta');
@seq_array =();
while ( my $seq = $str->next_seq() ) {push (@seq_array, $seq) ;}
and pass the factory a reference to that array
$seq_array_ref = \@seq_array;
$aln = $factory->align($seq_array_ref);
In either case, align() returns a reference to a
SimpleAlign object which can then used (see Bio::SimpleAlign).
Once an initial alignment exists, one can pass the factory
additional sequence(s) to be added (ie aligned) to the original alignment.
The alignment can be passed as either an alignment file or a Bio:SimpleAlign
object. The unaligned sequence(s) can be passed as a filename or as an array
of BioPerl sequence objects or as a single BioPerl Seq object. For example
(to add a single sequence to an alignment),
$str = Bio::AlignIO->new(-file=> 't/data/cysprot1a.msf');
$aln = $str->next_aln();
$str1 = Bio::SeqIO->new(-file=> 't/data/cysprot1b.fa');
$seq = $str1->next_seq();
$aln = $factory->profile_align($aln,$seq);
In either case, profile_align() returns a reference to a
SimpleAlign object containing a new SimpleAlign object of the alignment with
the additional sequence(s) added in.
Finally one can pass the factory a pair of (sub)alignments to be
aligned against each other. The alignments can be passed in the form of
either a pair of alignment files or a pair of Bio:SimpleAlign objects. For
example,
$profile1 = 't/data/cysprot1a.msf';
$profile2 = 't/data/cysprot1b.msf';
$aln = $factory->profile_align($profile1,$profile2);
or
$str1 = Bio::AlignIO->new(-file=> 't/data/cysprot1a.msf');
$aln1 = $str1->next_aln();
$str2 = Bio::AlignIO->new(-file=> 't/data/cysprot1b.msf');
$aln2 = $str2->next_aln();
$aln = $factory->profile_align($aln1,$aln2);
In either case, profile_align() returns a reference to a
SimpleAlign object containing an (super)alignment of the two input
alignments.
For more examples of syntax and use of Clustalw, the user is
encouraged to look at the script Clustalw.t in the t/ directory.
Note: Clustalw is still under development. Various features of the
clustalw program have not yet been implemented. If you would like that a
specific clustalw feature be added to this perl contact
bioperl-l@bioperl.org.
These can be specified as parameters when instantiating a new
Clustalw object, or through get/set methods of the same name
(lowercase).