oracular (3) Bio::SeqIO::mbsout.3pm.gz

Provided by: libbio-perl-perl_1.7.8-1_all bug

NAME

       Bio::SeqIO::mbsout - input stream for output by Teshima et al.'s mbs.

SYNOPSIS

       Do not use this module directly.  Use it via the Bio::SeqIO class.

DESCRIPTION

       mbs (Teshima KM, Innan H (2009) mbs: modifying Hudson's ms software to generate samples of
       DNA sequences with a biallelic site under selection. BMC Bioinformatics 10: 166 ) can be
       found at http://www.biomedcentral.com/1471-2105/10/166/additional/.

       Currently this object can be used to read output from mbs into seq objects.  However,
       because bioperl has no support for haplotypes created using an infinite sites model (where
       '1' identifies a derived allele and '0' identifies an ancestral allele), the sequences
       returned by mbsout are coded using A, T, C and G. To decode the bases, use the sequence
       conversion table (a hash) returned by get_base_conversion_table(). In the table, 4 and 5
       are used when the ancestry is unclear. This should not ever happen when creating files
       with mbs, but it will be used when creating mbsOUT files from a collection of seq objects
       ( To be added later ). Alternatively, use get_next_hap() to get a string with 1's and 0's
       instead of a seq object.

FEEDBACK

   Mailing Lists
       User feedback is an integral part of the evolution of this and other Bioperl modules. Send
       your comments and suggestions preferably to the Bioperl mailing list. Your participation
       is much appreciated.

         bioperl-l@bioperl.org                  - General discussion
         http://bioperl.org/wiki/Mailing_lists  - About the mailing lists

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their
       resolution. Bug reports can be submitted via the web:

         https://github.com/bioperl/bioperl-live/issues

AUTHOR - Warren Kretzschmar

       This module was written by Warren Kretzschmar

       email: wkretzsch@gmail.com

       This module grew out of a parser written by Aida Andres.

   Public Domain Notice
       This software/database is ``United States Government Work'' under the terms of the United
       States Copyright Act. It was written as part of the authors' official duties for the
       United States Government and thus cannot be copyrighted. This software/database is freely
       available to the public for use without a copyright notice. Restrictions cannot be placed
       on its present or future use.

       Although all reasonable efforts have been taken to ensure the accuracy and reliability of
       the software and data, the National Human Genome Research Institute (NHGRI) and the U.S.
       Government does not and cannot warrant the performance or results that may be obtained by
       using this software or data.  NHGRI and the U.S. Government disclaims all warranties as to
       performance, merchantability or fitness for any particular purpose.

METHODS

   INTERNAL METHODS
       _initialize

       Title   : _initialize Usage   : $stream = Bio::SeqIO::mbsout->new($infile) Function:
       extracts basic information about the file.  Returns : Bio::SeqIO object Args    : no_og
       Details   : include 'no_og' flag = 0 if the last population of an mbsout file
                 contains only one haplotype and you want the last haplotype to be
                 treated as the outgroup.

       _read_start

       Title   : _read_start Usage   : $stream->_read_start() Function: reads from the filehandle
       $stream->{_filehandle} all information up to the first haplotype (sequence).  Returns :
       void Args    : none

   Methods to retrieve mbsout data
       get_segsites

       Title   : get_segsites Usage   : $segsites = $stream->get_segsites() Function: returns the
       number segsites in the mbsout file (according to the mbsout header line).  Returns :
       scalar Args    : NONE

       get_current_run_segsites

       Title   : get_current_run_segsites Usage   : $segsites =
       $stream->get_current_run_segsites() Function: returns the number of segsites in the run of
       the last read haplotype (sequence).  Returns : scalar Args    : NONE

       get_pop_mut_param_per_site

       Title   : get_pop_mut_param_per_site Usage   : $pop_mut_param_per_site =
       $stream->get_pop_mut_param_per_site() Function: returns 4*N0*mu or the "population
       mutation parameter per site" Returns : scalar Args    : NONE

       get_pop_recomb_param_per_site

       Title   : get_pop_recomb_param_per_site Usage   : $pop_recomb_param_per_site =
       $stream->get_pop_recomb_param_per_site() Function: returns 4*N0*r or the "population
       recombination parameter per site" Returns : scalar Args    : NONE

       get_nsites

       Title   : get_nsites Usage   : $nsites = $stream->get_nsites() Function: returns the
       number of sites simulated by mbs.  Returns : scalar Args    : NONE

       get_selpos

       Title   : get_selpos Usage   : $selpos = $stream->get_selpos() Function: returns the
       location on the chromosome where the allele is located that was selected for by mbs.
       Returns : scalar Args    : NONE

       get_nreps

       Title   : get_nreps Usage   : $nreps = $stream->get_nreps() Function: returns the number
       replications done by mbs on each trajectory file to create the mbsout file.  Returns :
       scalar Args    : NONE

       get_nfiles

       Title   : get_nfiles Usage   : $nfiles = $stream->get_nfiles() Function: returns the
       number of trajectory files used by mbs to create the mbsout file Returns : scalar Args
       : NONE

       get_traj_filename

       Title   : get_traj_filename Usage   : $traj_filename = $stream->get_traj_filename()
       Function: returns the prefix of the trajectory files used by mbs to create the mbsout file
       Returns : scalar Args    : NONE

       get_runs

       Title   : get_runs Usage   : $runs = $stream->get_runs() Function: returns the number of
       runs in the mbsout file Returns : scalar Args    : NONE

       get_Positions

       Title   : get_Positions Usage   : @positions = $stream->get_Positions() Function: returns
       an array of the names of each segsite of the run of the last read hap.  Returns : array
       Args    : NONE

       get_tot_run_haps

       Title   : get_tot_run_haps Usage   : $number_of_haps_per_run = $stream->get_tot_run_haps()
       Function: returns the number of haplotypes (sequences) in each run of the mbsout file.
       Returns : scalar >= 0 Args    : NONE

       get_mbs_info_line

       Title   : get_mbs_info_line Usage   : $mbs_info_line = $stream->get_mbs_info_line()
       Function: returns the header line of the mbsout file.  Returns : scalar Args    : NONE

       tot_haps

       Title   : tot_haps Usage   : $number_of_haplotypes_in_file = $stream->tot_haps() Function:
       returns the number of haplotypes (sequences) in the mbsout file.  Information gathered
       from mbsout header line.  Returns : scalar Args    : NONE

       next_run_num

       Title   : next_run_num Usage   : $next_run_number = $stream->next_run_num() Function:
       returns the number of the mbs run that the next haplotype (sequence)
                 will be taken from (starting at 1).  Returns undef if the complete
                 file has been read.  Returns : scalar > 0 or undef Args    : NONE

       get_last_haps_run_num

       Title   : get_last_haps_run_num Usage   : $last_haps_run_number =
       $stream->get_last_haps_run_num() Function: returns the number of the ms run that the last
       haplotype (sequence)
                 was taken from (starting at 1).  Returns undef if no hap has been
                 read yet.  Returns : scalar > 0 or undef Args    : NONE

       get_last_read_hap_num

       Title   : get_last_read_hap_num Usage   : $last_read_hap_num =
       $stream->get_last_read_hap_num()
                 Function: returns the number (starting with 1) of the last haplotype
                 read from the mbs file Returns : scalar >= 0 Args    : NONE Details   : 0 means
       that no haplotype has been read yet.

       outgroup

       Title   : outgroup Usage   : $outgroup = $stream->outgroup() Function: returns '1' if the
       mbsout object has an outgroup.  Returns '0'
                 otherwise.  Returns :  1 or 0, currently always 0 Args    : NONE Details   :
       This method will return '1' only if the last population in the mbsout
                 file contains only one haplotype.  If the last population is not an
                 outgroup then create the mbsout object using 'no_outgroup' as input
                 parameter for new() (see mbsout->new()).

                 Currently there exists no way of introducing an outgroup into an mbs
                 file, so this function will always return '0'.

       get_next_seq

       Title   : get_next_seq Usage   : $seq = $stream->get_next_seq() Function: reads and
       returns the next sequence (haplotype) in the stream Returns : Bio::Seq object Args    :
       NONE Note : This function is included only to conform to convention.  It only
                 calls next_hap() and passes on that method's return value.  Use
                 next_hap() instead for better performance.

       get_next_hap

       Title   : get_next_hap Usage   : $seq = $stream->get_next_hap() Function: reads and
       returns the next sequence (haplotype) in the stream. Returns
                 void if all sequences in stream have been read.  Returns : Bio::Seq object Args
       : NONE Note : Use this instead of get_next_seq().

       get_next_run

       Title   : get_next_run Usage   : @seqs = $stream->get_next_run() Function: reads and
       returns all the remaining sequences (haplotypes) in the mbs
                 run of the next sequence.  Returns : array of Bio::Seq objects Args    : NONE

   METHODS TO RETRIEVE CONSTANTS
       base_conversion_table

       Title   : get_base_conversion_table Usage   : $table_hash_ref =
       $stream->get_base_conversion_table() Function: returns a reference to a hash.  The keys of
       the hash are the letters
                 'A','T','G','C'.  The values associated with each key are the value
                 that each letter in the sequence of a seq object returned by a
                 Bio::SeqIO::mbsout stream should be translated to.  Returns : reference to a
       hash Args    : NONE Synopsis:

               # retrieve the Bio::Seq object's sequence
               my $haplotype = $seq->seq;
               my $rh_base_conversion_table = $stream->get_base_conversion_table();

               # need to convert all letters to their corresponding numbers.
               foreach my $base (keys %{$rh_base_conversion_table}){
                       $haplotype =~ s/($base)/$rh_base_conversion_table->{$base}/g;
               }

               # $haplotype is now an ms style haplotype. (e.g. '100101101455')