oracular (3) Bio::Search::BlastUtils.3pm.gz

Provided by: libbio-perl-perl_1.7.8-1_all bug

NAME

       Bio::Search::BlastUtils - Utility functions for Bio::Search:: BLAST objects

SYNOPSIS

        # This module is just a collection of subroutines, not an object.

       See Bio::Search::Hit::BlastHit.

DESCRIPTION

       The BlastUtils.pm module is a collection of subroutines used primarily by
       Bio::Search::Hit::BlastHit objects for some of the additional functionality, such as HSP
       tiling. Right now, the BlastUtils is just a collection of methods, not an object, and it's
       tightly coupled to Bio::Search::Hit::BlastHit. A goal for the future is to generalize it
       to work based on the Bio::Search interfaces, then it can work with any objects that
       implements them.

AUTHOR

       Steve Chervitz <sac@bioperl.org>

   tile_hsps
        Usage     : tile_hsps( $sbjct );
                  : This is called automatically by Bio::Search::Hit::BlastHit
                  : during object construction or
                  : as needed by methods that rely on having tiled data.
        Purpose   : Collect statistics about the aligned sequences in a set of HSPs.
                  : Calculates the following data across all HSPs:
                  :    -- total alignment length
                  :    -- total identical residues
                  :    -- total conserved residues
        Returns   : n/a
        Argument  : A Bio::Search::Hit::BlastHit object
        Throws    : n/a
        Comments  :
                  : This method is *strongly* coupled to Bio::Search::Hit::BlastHit
                  : (it accesses BlastHit data members directly).
                  : TODO: Re-write this to the Bio::Search::Hit::HitI interface.
                  :
                  : This method performs more careful summing of data across
                  : all HSPs in the Sbjct object. Only HSPs that are in the same strand
                  : and frame are tiled. Simply summing the data from all HSPs
                  : in the same strand and frame will overestimate the actual
                  : length of the alignment if there is overlap between different HSPs
                  : (often the case).
                  :
                  : The strategy is to tile the HSPs and sum over the
                  : contigs, collecting data separately from overlapping and
                  : non-overlapping regions of each HSP. To facilitate this, the
                  : HSP.pm object now permits extraction of data from sub-sections
                  : of an HSP.
                  :
                  : Additional useful information is collected from the results
                  : of the tiling. It is possible that sub-sequences in
                  : different HSPs will overlap significantly. In this case, it
                  : is impossible to create a single unambiguous alignment by
                  : concatenating the HSPs. The ambiguity may indicate the
                  : presence of multiple, similar domains in one or both of the
                  : aligned sequences. This ambiguity is recorded using the
                  : ambiguous_aln() method.
                  :
                  : This method does not attempt to discern biologically
                  : significant vs. insignificant overlaps. The allowable amount of
                  : overlap can be set with the overlap() method or with the -OVERLAP
                  : parameter used when constructing the Blast & Sbjct objects.
                  :
                  : For a given hit, both the query and the sbjct sequences are
                  : tiled independently.
                  :
                  :    -- If only query sequence HSPs overlap,
                  :          this may suggest multiple domains in the sbjct.
                  :    -- If only sbjct sequence HSPs overlap,
                  :          this may suggest multiple domains in the query.
                  :    -- If both query & sbjct sequence HSPs overlap,
                  :          this suggests multiple domains in both.
                  :    -- If neither query & sbjct sequence HSPs overlap,
                  :          this suggests either no multiple domains in either
                  :          sequence OR that both sequences have the same
                  :          distribution of multiple similar domains.
                  :
                  : This method can deal with the special case of when multiple
                  : HSPs exactly overlap.
                  :
                  : Efficiency concerns:
                  :  Speed will be an issue for sequences with numerous HSPs.
                  :
        Bugs      : Currently, tile_hsps() does not properly account for
                  : the number of non-tiled but overlapping HSPs, which becomes a problem
                  : as overlap() grows. Large values overlap() may thus lead to
                  : incorrect statistics for some hits. For best results, keep overlap()
                  : below 5 (DEFAULT IS 2). For more about this, see the "HSP Tiling and
                  : Ambiguous Alignments" section in L<Bio::Search::Hit::BlastHit>.

       See Also   : _adjust_contigs(), Bio::Search::Hit::BlastHit

   _adjust_contigs
        Usage     : n/a; called automatically during object construction.
        Purpose   : Builds HSP contigs for a given BLAST hit.
                  : Utility method called by _tile_hsps()
        Returns   :
        Argument  :
        Throws    : Exceptions propagated from Bio::Search::Hit::BlastHSP::matches()
                  : for invalid sub-sequence ranges.
        Status    : Experimental
        Comments  : This method does not currently support gapped alignments.
                  : Also, it does not keep track of the number of HSPs that
                  : overlap within the amount specified by overlap().
                  : This will lead to significant tracking errors for large
                  : overlap values.

       See Also   : tile_hsps(), Bio::Search::Hit::BlastHSP::matches

   get_exponent
        Usage     : &get_exponent( number );
        Purpose   : Determines the power of 10 exponent of an integer, float,
                  : or scientific notation number.
        Example   : &get_exponent("4.0e-206");
                  : &get_exponent("0.00032");
                  : &get_exponent("10.");
                  : &get_exponent("1000.0");
                  : &get_exponent("e+83");
        Argument  : Float, Integer, or scientific notation number
        Returns   : Integer representing the exponent part of the number (+ or -).
                  : If argument == 0 (zero), return value is "-999".
        Comments  : Exponents are rounded up (less negative) if the mantissa is >= 5.
                  : Exponents are rounded down (more negative) if the mantissa is <= -5.

   collapse_nums
        Usage     : @cnums = collapse_nums( @numbers );
        Purpose   : Collapses a list of numbers into a set of ranges of consecutive terms:
                  : Useful for condensing long lists of consecutive numbers.
                  :  EXPANDED:
                  :     1 2 3 4 5 6 10 12 13 14 15 17 18 20 21 22 24 26 30 31 32
                  :  COLLAPSED:
                  :     1-6 10 12-15 17 18 20-22 24 26 30-32
        Argument  : List of numbers sorted numerically.
        Returns   : List of numbers mixed with ranges of numbers (see above).
        Throws    : n/a

       See Also   : Bio::Search::Hit::BlastHit::seq_inds()

   strip_blast_html
        Usage     : $boolean = &strip_blast_html( string_ref );
                  : This method is exported.
        Purpose   : Removes HTML formatting from a supplied string.
                  : Attempts to restore the Blast report to enable
                  : parsing by Bio::SearchIO::blast.pm
        Returns   : Boolean: true if string was stripped, false if not.
        Argument  : string_ref = reference to a string containing the whole Blast
                  :              report containing HTML formatting.
        Throws    : Croaks if the argument is not a scalar reference.
        Comments  : Based on code originally written by Alex Dong Li
                  : (ali@genet.sickkids.on.ca).
                  : This method does some Blast-specific stripping
                  : (adds back a '>' character in front of each HSP
                  : alignment listing).
                  :
                  : THIS METHOD IS VERY SENSITIVE TO BLAST FORMATTING CHANGES!
                  :
                  : Removal of the HTML tags and accurate reconstitution of the
                  : non-HTML-formatted report is highly dependent on structure of
                  : the HTML-formatted version. For example, it assumes that first
                  : line of each alignment section (HSP listing) starts with a
                  : <a name=..> anchor tag. This permits the reconstruction of the
                  : original report in which these lines begin with a ">".
                  : This is required for parsing.
                  :
                  : If the structure of the Blast report itself is not intended to
                  : be a standard, the structure of the HTML-formatted version
                  : is even less so. Therefore, the use of this method to
                  : reconstitute parsable Blast reports from HTML-format versions
                  : should be considered a temporary solution.