Provided by: libbio-perl-perl_1.7.2-3_all bug


       Bio::Assembly::IO::sam - An IO module for assemblies in Sam format *BETA*


        $aio = Bio::Assembly::IO( -file => "mysam.bam",
                                  -refdb => "myrefseqs.fas");
        $assy = $aio->next_assembly;


       This is a (currently) read-only IO module designed to convert Sequence/Alignment Map (SAM;
       <>) formatted alignments to Bio::Assembly::Scaffold
       representations, containing .Bio::Assembly::Contig and Bio::Assembly::Singlet objects. It
       uses lstein's Bio::DB::Sam to parse binary formatted SAM (.bam) files guided by a
       reference sequence fasta database.

       NB: "Bio::DB::Sam" is not a BioPerl module; it can be obtained via CPAN. It in turn
       requires the "libbam" library; source can be downloaded at


       ·   Required files

           A binary SAM (".bam") alignment and a reference sequence database in FASTA format are
           required. Various required indexes (".fai", ".bai") will be created as necessary (via

       ·   Compressed files

           ...can be specified directly , if IO::Uncompress::Gunzip is installed. Get it from
           your local CPAN mirror.

       ·   BAM vs. SAM

           The input alignment should be in (possibly gzipped) binary SAM (".bam") format. If it
           isn't, you will get a message explaining how to convert it, viz.:

            $ samtools view -Sb mysam.sam > mysam.bam

           The bam file must also be sorted on coordinates: do

            $ samtools sort mysam.unsorted.bam > mysam.bam

       ·   Contigs

           Contigs are calculated by this module, using the 'coverage' feature of the
           Bio::DB::Sam object. A contig represents a contiguous portion of a reference sequence
           having non-zero coverage at each base.

           The bwa assembler (<>) can assign read sequences to
           multiple reference sequence locations. The present implementation currently assigns
           such reads only to the first contig in which they appear.

       ·   Consensus sequences

           Consensus sequence and quality objects are calculated by this module, using the
           "pileup" callback feature of "Bio::DB::Sam". The consensus is (currently) simply the
           residue at a position that has the maximum sum of quality values. The consensus
           quality is the integer portion of the simple average of quality values for the
           consensus residue.

       ·   SeqFeatures

           Read sequences stored in contigs are accompanied by the following features:

            contig : name of associated contig
            cigar  : CIGAR string for this read

           If the read is paired with a successfully mapped mate, these features will also be

            mate_start  : coordinate of to which the mate was aligned
            mate_len    : length of mate read
            mate_strand : strand of mate (-1 or 1)
            insert_size : size of insert spanned by the mate pair

           These features are obtained as follows:

            @ids = $contig->get_seq_ids;
            $an_id = $id[0]; # or whatever
            $seq = $contig->get_seq_by_name($an_id);
            # Bio::LocatableSeq's aren't SeqFeature containers, so...
            $feat = $contig->get_seq_feat_by_tag(
                       $seq, "_aligned_coord:".$s->id
            ($cigar) = $feat->get_tag_values('cigar');
            # etc.


       ·   Supporting both text SAM (TAM) and binary SAM (BAM)


   Mailing Lists
       User feedback is an integral part of the evolution of this and other Bioperl modules. Send
       your comments and suggestions preferably to the Bioperl mailing list.  Your participation
       is much appreciated.
                  - General discussion  - About the mailing lists

       Please direct usage questions or support issues to the mailing list:

       rather than to the module maintainer directly. Many experienced and reponsive experts will
       be able look at the problem and quickly address it. Please include a thorough description
       of the problem with code and data examples if at all possible.

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their
       resolution. Bug reports can be submitted via the web:

AUTHOR - Mark A. Jensen

       Email maj -at- fortinbras -dot- us


       The rest of the documentation details each of the object methods.  Internal methods are
       usually preceded with a _

Bio::Assembly::IO compliance

           Title   : next_assembly
           Usage   : my $scaffold = $asmio->next_assembly();
           Function: return the next assembly in the sam-formatted stream
           Returns : Bio::Assembly::Scaffold object
           Args    : none

           Title   : next_contig
           Usage   : my $contig = $asmio->next_contig();
           Function: return the next contig or singlet from the sam stream
           Returns : Bio::Assembly::Contig or Bio::Assembly::Singlet
           Args    : none

        Title   : write_assembly
        Usage   :
        Function: not implemented (module currrently read-only)
        Returns :
        Args    :


           Title   : _store_contig
           Usage   : my $contigobj = $self->_store_contig(\%contiginfo);
           Function: create and load a contig object
           Returns : Bio::Assembly::Contig object
           Args    : Bio::DB::Sam::Segment object

           Title   : _store_read
           Usage   : my $readobj = $self->_store_read($readobj, $contigobj);
           Function: store information of a read belonging to a contig in a contig object
           Returns : Bio::LocatableSeq
           Args    : Bio::DB::Bam::AlignWrapper, Bio::Assembly::Contig

           Title   : _store_singlet
           Usage   : my $singletobj = $self->_store_singlet($contigobj);
           Function: convert a contig object containing a single read into
                     a singlet object
           Returns : Bio::Assembly::Singlet
           Args    : Bio::Assembly::Contig (previously loaded with only one seq)

REALLY Internal

        Title   : _init_sam
        Usage   : $self->_init_sam($fasfile)
        Function: obtain a Bio::DB::Sam parsing of the associated sam file
        Returns : true on success
        Args    : [optional] name of the fasta reference db (scalar string)
        Note    : The associated file can be plain text (.sam) or binary (.bam);
                  If the fasta file is not specified, and no file is contained in
                  the refdb() attribute, a .fas file with the same
                  basename as the sam file will be searched for.

        Title   : _get_contig_segs_from_coverage
        Usage   :
        Function: calculates separate contigs using coverage info
                  in the segment
        Returns : array of Bio::DB::Sam::Segment objects, representing
                  each contig
        Args    : Bio::DB::Sam::Segment object

        Title   : _calc_consensus_quality
        Usage   : @qual = $aio->_calc_consensus_quality( $contig_seg );
        Function: calculate an average or other data-reduced quality
                  over all sites represented by the features contained
                  in a Bio::DB::Sam::Segment
        Returns :
        Args    : a Bio::DB::Sam::Segment object

        Title   : _calc_consensus
        Usage   : @qual = $aio->_calc_consensus( $contig_seg );
        Function: calculate a simple quality-weighted consensus sequence
                  for the segment
        Returns : a SeqWithQuality object
        Args    : a Bio::DB::Sam::Segment object

        Title   : refdb
        Usage   : $obj->refdb($newval)
        Function: the name of the reference db fasta file
        Example :
        Returns : value of refdb (a scalar)
        Args    : on set, new value (a scalar or undef, optional)

        Title   : _segset
        Usage   : $segset_hashref = $self->_segset()
        Function: hash container for the Bio::DB::Sam::Segment objects that
                  represent each set of contigs for each seq_id
                  { $seq_id => [@contig_segments], ... }
        Example :
        Returns : value of _segset (a hashref) if no arg,
                  or the arrayref of contig segments, if arg == a seq id
        Args    : [optional] seq id (scalar string)
        Note    : readonly; hash elt set in _init_sam()

        Title   : _current_refseq_id
        Usage   : $obj->_current_refseq_id($newval)
        Function: the "current" reference sequence id
        Example :
        Returns : value of _current_refseq (a scalar)
        Args    : on set, new value (a scalar or undef, optional)

        Title   : current_contig_seg_idx
        Usage   : $obj->current_contig_seg_idx($newval)
        Function: the "current" segment index in the "current" refseq
        Example :
        Returns : value of current_contig_seg_idx (a scalar)
        Args    : on set, new value (a scalar or undef, optional)

        Title   : sam
        Usage   : $obj->sam($newval)
        Function: store the associated Bio::DB::Sam object
        Example :
        Returns : value of sam (a Bio::DB::Sam object)
        Args    : on set, new value (a scalar or undef, optional)