Provided by: libbio-perl-perl_1.6.901-2_all bug

NAME

       Bio::Assembly::IO::tigr - Driver to read and write assembly files in the TIGR Assembler v2
       default format.

SYNOPSIS

           # Building an input stream
           use Bio::Assembly::IO;

           # Assembly loading methods
           my $asmio = Bio::Assembly::IO->new( -file   => 'SGC0-424.tasm',
                                               -format => 'tigr' );
           my $scaffold = $asmio->next_assembly;

           # Do some things on contigs...

           # Assembly writing methods
           my $outasm = Bio::Assembly::IO->new( -file   => ">SGC0-modified.tasm",
                                                -format => 'tigr' );
           $outasm->write_assembly( -scaffold => $assembly,
                                    -singlets => 1 );

DESCRIPTION

       This package loads and writes assembly information in/from files in the default TIGR
       Assembler v2 format. The files are lassie-formatted and often have the .tasm extension.
       This module was written to be used as a driver module for Bio::Assembly::IO input/output.

   Implementation
       Assemblies are loaded into Bio::Assembly::Scaffold objects composed of
       Bio::Assembly::Contig and Bio::Assembly::Singlet objects. Since aligned reads and contig
       gapped consensus can be obtained in the tasm files, only aligned/gapped sequences are
       added to the different BioPerl objects.

       Additional assembly information is stored as features. Contig objects have SeqFeature
       information associated with the primary_tag:

           _main_contig_feature:$contig_id -> misc contig information
           _quality_clipping:$read_id      -> quality clipping position

       Read objects have sub_seqFeature information associated with the primary_tag:

           _main_read_feature:$read_id     -> misc read information

       Singlets are considered by TIGR Assembler as contigs of one sequence. Contigs are
       represented here with features having these primary_tag:

           _main_contig_feature:$contig_id
           _quality_clipping:$read_primary_id
           _main_read_feature:$read_primary_id
           _aligned_coord:$read_primary_id

THE TIGR TASM LASSIE FORMAT

   Description
       In the TIGR tasm lassie format, contigs are separated by a line containing a single pipe
       character "|", whereas the reads in a contig are separated by a blank line.  Singlets can
       be present in the file and are represented as a contig composed of a single sequence.

       Other than the two above-mentioned separators, each line has an attribute name, followed a
       tab and then an attribute value.

       The tasm format is used by more TIGR applications than just TIGR Assembler.  Some of the
       attributes are not used by TIGR Assembler or have constant values.  They are indicated by
       an asterisk *

       Contigs have the following attributes:

           asmbl_id   -> contig ID
           sequence   -> contig ungapped consensus sequence (ambiguities are lowercase)
           lsequence  -> gapped consensus sequence (lowercase ambiguities)
           quality    -> gapped consensus quality score (in hexadecimal)
           seq_id     -> *
           com_name   -> *
           type       -> *
           method     -> always 'asmg' *
           ed_status  -> *
           redundancy -> fold coverage of the contig consensus
           perc_N     -> percent of ambiguities in the contig consensus
           seq#       -> number of sequences in the contig
           full_cds   -> *
           cds_start  -> start of coding sequence *
           cds_end    -> end of coding sequence *
           ed_pn      -> name of editor (always 'GRA') *
           ed_date    -> date and time of edition
           comment    -> some comments *
           frameshift -> *

       Each read has the following attributes:

           seq_name  -> read name
           asm_lend  -> position of first base on contig ungapped consensus sequence
           asm_rend  -> position of last base on contig ungapped consensus sequence
           seq_lend  -> start of quality-trimmed sequence (aligned read coordinates)
           seq_rend  -> end of quality-trimmed sequence (aligned read coordinates)
           best      -> always '0' *
           comment   -> some comments *
           db        -> database name associated with the sequence (e.g. >my_db|seq1234)
           offset    -> offset of the sequence (gapped consensus coordinates)
           lsequence -> aligned read sequence (ambiguities are uppercase)

       When asm_rend < asm_lend, the sequence was on the complementary DNA strand but its reverse
       complement is shown in the aligned sequence of the assembly file, not the original read.

       Ambiguities are reflected in the contig consensus sequence as lowercase IUPAC characters:
       a c g t u m r w s y k x n . In the read sequences, however, ambiguities are uppercase: M R
       W S Y K X N

   Example
       Example of a contig containing three sequences:

           sequence    CGATGCTGTACGGCTGTTGCGACAGATTGCGCTGGGTCGATACCGCGTTGGTGATCGGCTTGTTCAGCGGGCTCTGGTTCGGCGACAGCGCGGCGATCTTGGCGGCTGCGAAGGTTGCCGGCGCAATCATGCGCTGCTGACCGTTGACCTGGTCCTGCCAGTACACCCAGTCGCCCACCATGACCTTCAGCGCGTAGCTGTCACAGCCGGCTGTGGTCAGCGCAGTGGCGACGGTGGTGTAGGAGGCGCCAGCAACACCTTGGGTGATCATGTAGCAGCCTTCTGACAGGCCGTAGGTCAGCATGGTCGGCCACTGGGTACCAGTCAGTCGGGTCAACCGAGATTCGCAsCTGAGCGCCACTGCCGCGCAGAGCGTACATGCCCTTGCGGGTCGCGCCGGTAACACCATCCACGCCGATCAGAACTGCGTCGGTGATGGTGGTGTTACCCGAGGTGCCAGTGGTGAAGGCGACGGTCTGGGTGCTGGCCACAGGCGCCAGAGTGGTCGCGCCAACGGTGGCGATGACCAGTTGCGATGGGCCACGGATACCTGACTGCCCGTTGTTCACGGCGCTGACGATGTTCTGCCACAGCGCCAGGCCAGAGCCGGTGATGTTGTCGAACACTTCGGGCGCAACGCCAGGGAGCGAGACGGTCAGCTTCCAGCTCGAAGCAGCGGAGCCAGTAGCCAGGGCGGCGCTGAGCGAGTTGCCGAGCGTGCCGGTGTAGAACGCGGTCAGCGTGGCGCCGGTGGCGGCGGCAGTGTCCTTCAGCGCACTGGTCGCGGCGGTGTCGGTGCCGTCAGTGACGCGCACGGCGCGGATGTTCGAGGCGCCGCCCTGGATTGATACCGCCAGCGCGGTGCACAGGTCGTACTTGCGCACGGTCyGAGTGCCGAACTTCTGCGATGCGTCACCTGGCGAGCCGATAaGCGTGGCGCTGTTCACCGGCCCCCAGTCAGCAATGCCGACGATGCCGAGAATGTCAGTCGGGACGCCATTGATGTAGCGGGTCTTGGGCGCCACTATTTGTATGTACAAATCTGGCGCAGATAAAGCCGCCGTATTCAAATAACCAGCAGGATAGATAGGCATCACGCCTCCAGAATGAAAAAGGCCACCGATTAGGTGGCCTTTGTTGTGTTCGGCTGGCTGTTAGAGCAGCAGCCCGTTTTCCCGCGCAAACGCGAATGGGTCCTTGTCATGCTTCCTGCAATTGCAGGTAGGACAAAGAATTTGCAGGTTGGATTTGTCGTTCGATCCGCCCTTTGCAAGCGGGAACACGTGGTCAACGTGATACCCATCCCTTATGGATATAGTGCACATGGCGCATTTCCAGCGCTGAGCAGCCAGCAAAAATTTTATGTCGTCGCCGGTGTGTGAGCCGACAGCATTTTTCTTGCGAGCCTTGTATGTCCGCGAGAGTGAACGAACTTGCTCCTTGTTGGCTGTCTTCCAGAGCTTTTGAGTAAGCGCACAGAGATCCTTGTTTCTTGATCTCCACTCTCTGGTTGCGGAAAT
           lsequence   CGATGCTGTACGGCTGTTGCGACAGATTGCGCTGGGTCGATACCGCGTTGGTGATCGGCTTGTTCAGCGGGCTCTGGTTCGGCGACAGCGCGGCGATCTTGGCGGCTGCGAAGGTTGCCGGCGCAATCATGCGCTGCTGACCGTTGACCTGGTCCTGCCAGTACACCCAGTCGCCCACCATGACCTTCAGCGCGTAGCTGTCACAGCCGGCTGTGGTCAGCGCAGTGGCGACGGTGGTGTAGGAGGCGCCAGCAACACCTTGGGTGATCATGTAGCAGCCTTCTGACAGGCCGTAGGTCAGCATGGTCGGCCACTGGGTACCAGTCAGTCGGGTCAACCGAGATTCG-CAsCTGAGCGCCACTGCCGCGCAGAGCGTACATGCCCTTGCGGGTCGCGCCGGTAACACCATCCACGCCGATCAGAACTGCGTCGGTGATGGTGGTGTTACCCGAGGTGCCAGTGGTGAAGGCGACGGTCTGGGTGCTGGCCACAGGCGCCAGAGTGGTCGCGCCAACGGTGGCGATGACCAGTTGCGATGGGCCACGGATACCTGACTGCCCGTTGTTCACGGCGCTGACGATGTTCTGCCACAGCGCCAGGCCAGAGCCGGTGATGTTGTCGAACACTTCGGGCGCAACGCCAGGGAGCGAGACGGTCAGCTTCCAGCTCGAAGCAGCGGAGCCAGTAGCCAGGGCGGCGCTGAGCGAGTTGCCGAGCGTGCCGGTGTAGAACGCGGTCAGCGTGGCGCCGGTGGCGGCGGCAGTGTCCTTCAGCGCACTGGTCGCGGCGGTGTCGGTGCCGTCAGTGACGCGCACGGCGCGGATGTTCGAGGCGCCGCCCTGGATTGATACCGCCAGCGCGGTGCACAGGTCGTACTTGCGCACGGTCyGAGTGCCGAACTTCTGCGATGCGTCACCTGGCGAGCCGATAaGCGTGGCGCTGTTCACCGGCCCCCAGTCAGCAATGCCGACGATGCCGAGAATGTCAGTCGGGACGCCATTGATGTAGCGGGTCTTGGGCGCCACTATTTGTATGTACAAATCTGGCGCAGATAAAGCCGCCGTATTCAAATAACCAGCAGGATAGATAGGCATCACGCCTCCAGAATGAAAAAGGCCACCGATTAGGTGGCCTTTGTTGTGTTCGGCTGGCTGTTAGAGCAGCAGCCCGTTTTCCCGCGCAAACGCGAATGGGTCCTTGTCATGCTTCCTGCAATTGCAGGTAGGACAAAGAATTTGCAGGTTGGATTTGTCGTTCGATCCGCCCTTTGCAAGCGGGAACACGTGGTCAACGTGATACCCATCCCTTATGGATATAGTGCACATGGCGCATTTCCAGCGCTGAGCAGCCAGCAAAAATTTTATGTCGTCGCCGGTGTGTGAGCCGACAGCATTTTTCTTGCGAGCCTTGTATGTCCGCGAGAGTGAACGAACTTGCTCCTTGTTGGCTGTCTTCCAGAGCTTTTGAGTAAGCGCACAGAGATCCTTGTTTCTTGATCTCCACTCTCTGGTTGCGGAAAT
           quality     0x
           asmbl_id    93
           seq_id
           com_name
           type
           method      asmg
           ed_status
           redundancy  1.11
           perc_N      0.20
           seq#        3
           full_cds
           cds_start
           cds_end
           ed_pn       GRA
           ed_date     08/16/07 17:10:12
           comment
           frameshift

           seq_name    SDSU_RFPERU_010_C09.x01.phd.1
           asm_lend    1
           asm_rend    4423
           seq_lend    1
           seq_rend    442
           best        0
           comment
           db
           offset      0
           lsequence   CGATGCTGTACGGCTGTTGCGACAGATTGCGCTGGGTCGATACCGCGTTGGTGATCGGCTTGTTCAGCGGGCTCTGGTTCGGCGACAGCGCGGCGATCTTGGCGGCTGCGAAGGTTGCCGGCGCAATCATGCGCTGCTGACCGTTGACCTGGTCCTGCCAGTACACCCAGTCGCCCACCATGACCTTCAGCGCGTAGCTGTCACAGCCGGCTGTGGTCAGCGCAGTGGCGACGGTGGTGTAGGAGGCGCCAGCAACACCTTGGGTGATCATGTAGCAGCCTTCTGACAGGCCGTAGGTCAGCATGGTCGGCCACTGGGTACCAGTCAGTCGGGTCAACCGAGATTCG-CAGCTGAGCGCCACTGCCGCGCAGAGCGTACATGCCCTTGCGGGTCGCGCCGGTAACACCATCCACGCCGATCAGAACTGCGTCGGTGATGGTGG

           seq_name    SDSU_RFPERU_002_H12.x01.phd.1
           asm_lend    339
           asm_rend    940
           seq_lend    1
           seq_rend    602
           best        0
           comment
           db
           offset      338
           lsequence   CGAGATTCGCCACCTGAGCGCCACTGCCGCGCAGAGCGTACATGCCCTTGCGGGTCGCGCCGGTAACACCATCCACGCCGATCAGAACTGCGTCGGTGATGGTGGTGTTACCCGAGGTGCCAGTGGTGAAGGCGACGGTCTGGGTGCTGGCCACAGGCGCCAGAGTGGTCGCGCCAACGGTGGCGATGACCAGTTGCGATGGGCCACGGATACCTGACTGCCCGTTGTTCACGGCGCTGACGATGTTCTGCCACAGCGCCAGGCCAGAGCCGGTGATGTTGTCGAACACTTCGGGCGCAACGCCAGGGAGCGAGACGGTCAGCTTCCAGCTCGAAGCAGCGGAGCCAGTAGCCAGGGCGGCGCTGAGCGAGTTGCCGAGCGTGCCGGTGTAGAACGCGGTCAGCGTGGCGCCGGTGGCGGCGGCAGTGTCCTTCAGCGCACTGGTCGCGGCGGTGTCGGTGCCGTCAGTGACGCGCACGGCGCGGATGTTCGAGGCGCCGCCCTGGATTGATACCGCCAGCGCGGTGCACAGGTCGTACTTGCGCACGGTCCGAGTGCCGAACTTCTGCGATGCGTCACCTGGCGAGCCGATA-GCGTGGCGC

           seq_name    SDSU_RFPERU_009_E07.x01.phd.1
           asm_lend    880
           asm_rend    1520
           seq_lend    641
           seq_rend    1
           best        0
           comment
           db
           offset      8803
           lsequence   CGCACGGTCTGAGTGCCGAACTTCTGCGATGCGTCACCTGGCGAGCCGATAAGCGTGGCGCTGTTCACCGGCCCCCAGTCAGCAATGCCGACGATGCCGAGAATGTCAGTCGGGACGCCATTGATGTAGCGGGTCTTGGGCGCCACTATTTGTATGTACAAATCTGGCGCAGATAAAGCCGCCGTATTCAAATAACCAGCAGGATAGATAGGCATCACGCCTCCAGAATGAAAAAGGCCACCGATTAGGTGGCCTTTGTTGTGTTCGGCTGGCTGTTAGAGCAGCAGCCCGTTTTCCCGCGCAAACGCGAATGGGTCCTTGTCATGCTTCCTGCAATTGCAGGTAGGACAAAGAATTTGCAGGTTGGATTTGTCGTTCGATCCGCCCTTTGCAAGCGGGAACACGTGGTCAACGTGATACCCATCCCTTATGGATATAGTGCACATGGCGCATTTCCAGCGCTGAGCAGCCAGCAAAAATTTTATGTCGTCGCCGGTGTGTGAGCCGACAGCATTTTTCTTGCGAGCCTTGTATGTCCGCGAGAGTGAACGAACTTGCTCCTTGTTGGCTGTCTTCCAGAGCTTTTGAGTAAGCGCACAGAGATCCTTGTTTCTTGATCTCCACTCTCTGGTTGCGGAAAT
           |

       ...

FEEDBACK

   Mailing Lists
       User feedback is an integral part of the evolution of this and other Bioperl modules. Send
       your comments and suggestions preferably to the Bioperl mailing lists  Your participation
       is much appreciated.

         bioperl-l@bioperl.org                  - General discussion
         http://bioperl.org/wiki/Mailing_lists  - About the mailing lists

   Support
       Please direct usage questions or support issues to the mailing list:

       bioperl-l@bioperl.org

       rather than to the module maintainer directly. Many experienced and reponsive experts will
       be able look at the problem and quickly address it. Please include a thorough description
       of the problem with code and data examples if at all possible.

   Reporting Bugs
       Report bugs to the BioPerl bug tracking system to help us keep track the bugs and their
       resolution. Bug reports can be submitted via email or the web:

         bioperl-bugs@bio.perl.org
         https://redmine.open-bio.org/projects/bioperl/

AUTHOR - Florent E Angly

       Email florent dot angly at gmail dot com

APPENDIX

       The rest of the documentation details each of the object methods. Internal methods are
       usually preceded with a "_".

   next_assembly
        Title   : next_assembly
        Usage   : my $scaffold = $asmio->next_assembly();
        Function: return the next assembly in the tasm-formatted stream
        Returns : Bio::Assembly::Scaffold object
        Args    : none

   next_contig
        Title   : next_contig
        Usage   : my $contig = $asmio->next_contig();
        Function: return the next contig or singlet TIGR-formatted stream
        Returns : Bio::Assembly::Contig or Bio::Assembly::Singlet
        Args    : none

   _qual_hex2dec
           Title   : _qual_hex2dec
           Usage   : my dec_quality = $self->_qual_hex2dec($hex_quality);
           Function: convert an hexadecimal quality score into a decimal quality score
           Returns : string
           Args    : string

   _qual_dec2hex
           Title   : _qual_dec2hex
           Usage   : my hex_quality = $self->_qual_dec2hex($dec_quality);
           Function: convert a decimal quality score into an hexadecimal quality score
           Returns : string
           Args    : string

   _store_contig
           Title   : _store_contig
           Usage   : my $contigobj = $self->_store_contig(\%contiginfo, $contigobj);
           Function: store information of a contig belonging to a scaffold in the
                     appropriate object
           Returns : Bio::Assembly::Contig object
           Args    : hash, Bio::Assembly::Contig

   _store_read
           Title   : _store_read
           Usage   : my $readobj = $self->_store_read(\%readinfo, $contigobj);
           Function: store information of a read belonging to a contig in a contig object
           Returns : Bio::LocatableSeq
           Args    : hash, Bio::Assembly::Contig

   _store_singlet
           Title   : _store_singlet
           Usage   : my $singletobj = $self->_store_read(\%readinfo, \%contiginfo);
           Function: store information of a singlet belonging to a scaffold in a singlet object
           Returns : Bio::Assembly::Singlet
           Args    : hash, hash

   write_assembly
           Title   : write_assembly
           Usage   : $asmio->write_assembly($assembly)
           Function: Write the assembly object in TIGR Assembler compatible format. The
                     contig IDs are sorted naturally if the Sort::Naturally module is
                     present, or lexically otherwise. Internally, write_assembly use
                     the write_contig, write_footer and write_header methods. Use these
                     methods if you want more control on the writing process.
           Returns : 1 on success, 0 for error
           Args    : A Bio::Assembly::Scaffold object
                     1 to write singlets in the assembly file, 0 otherwise

   write_contig
           Title   : write_contig
           Usage   : $asmio->write_contig($contig)
           Function: Write a contig or singlet object in TIGR compatible format. Quality
                     scores are automatically generated if the contig does not contain
                     any
           Returns : 1 on success, 0 for error
           Args    : A Bio::Assembly::Contig or Singlet object

   write_header
           Title   : write_header
           Usage   : $asmio->write_header($assembly)
           Function: In the TIGR Asseformat assembly driver, this does nothing. The
                     method is present for compatibility with other assembly drivers
                     that need to write a file header.
           Returns : 1 on success, 0 for error
           Args    : A Bio::Assembly::Scaffold object

   write_footer
           Title   : write_footer
           Usage   : $asmio->write_footer($assembly)
           Function: Write TIGR footer, i.e. do nothing except making sure that the
                     file does not end with a '|'.
           Returns : 1 on success, 0 for error
           Args    : A Bio::Assembly::Scaffold object

   _perc_N
           Title   : _perc_N
           Usage   : my $perc_N = $asmio->_perc_N($sequence_string)
           Function: Calculate the percent of ambiguities in a sequence.
                     M R W S Y K X N are regarded as ambiguities in an aligned read
                     sequence by TIGR Assembler. In the case of a gapped contig
                     consensus sequence, all lowercase symbols are ambiguities, i.e.:
                     a c g t u m r w s y k x n.
           Returns : decimal number
           Args    : string

   _redundancy
           Title   : _redundancy
           Usage   : my $ref = $asmio->_redundancy($contigobj)
           Function: Calculate the fold coverage (redundancy) of a contig consensus
                     (average number of read base pairs covering the consensus)
           Returns : decimal number
           Args    : Bio::Assembly::Contig

   _ungap
           Title   : _ungap
           Usage   : my $ungapped = $asmio->_ungap($gapped)
           Function: Remove the gaps from a sequence. Gaps are - in TIGR Assembler
           Returns : string
           Args    : string

   _date_time
           Title   : _date_time
           Usage   : my $timepoint = $asmio->date_time
           Function: Get date and time (MM//DD/YY HH:MM:SS)
           Returns : string
           Args    : none

   _split_seq_name_and_db
           Title   : _split_seq_name_and_db
           Usage   : my ($seqname, $db) = $asmio->_split_seq_name_and_db($id)
           Function: Extract seq_name and db from sequence id
           Returns : seq_name, db
           Args    : id

   _merge_seq_name_and_db
           Title   : _merge_seq_name_and_db
           Usage   : my $id = $asmio->_merge_seq_name_and_db($seq_name, $db)
           Function: Construct id from seq_name and db
           Returns : id
           Args    : seq_name, db

   _coord
           Title   : _coord
           Usage   : my $id = $asmio->__coord($readobj, $contigobj)
           Function: Get different coordinates for the read
           Returns : number, number, number, number, number
           Args    : Bio::Assembly::Seq, Bio::Assembly::Contig