oracular (3) Bio::SeqIO::msout.3pm.gz

Provided by: libbio-perl-perl_1.7.8-1_all bug

NAME

       Bio::SeqIO::msout - input stream for output by Hudson's ms

SYNOPSIS

       Do not use this module directly.  Use it via the Bio::SeqIO class.

DESCRIPTION

       ms ( Hudson, R. R. (2002) Generating samples under a Wright-Fisher neutral model.
       Bioinformatics 18:337-8 ) can be found at
       http://home.uchicago.edu/~rhudson1/source/mksamples.html.

       Currently, this object can be used to read output from ms into seq objects.  However,
       because bioperl has no support for haplotypes created using an infinite sites model (where
       '1' identifies a derived allele and '0' identifies an ancestral allele), the sequences
       returned by msout are coded using A, T, C and G. To decode the bases, use the sequence
       conversion table (a hash) returned by get_base_conversion_table(). In the table, 4 and 5
       are used when the ancestry is unclear. This should not ever happen when creating files
       with ms, but it will be used when creating msOUT files from a collection of seq objects (
       To be added later ). Alternatively, use get_next_hap() to get a string with 1's and 0's
       instead of a seq object.

   Mapping to Finite Sites
       This object can now also be used to map haplotypes created using an infinite sites model
       to sequences of arbitrary finite length.  See set_n_sites() for more detail.  Thanks to
       Filipe G. Vieira <fgvieira@berkeley.edu> for the idea and code.

FEEDBACK

   Mailing Lists
       User feedback is an integral part of the evolution of this and other Bioperl modules. Send
       your comments and suggestions preferably to the Bioperl mailing list. Your participation
       is much appreciated.

         bioperl-l@bioperl.org                  - General discussion
         http://bioperl.org/wiki/Mailing_lists  - About the mailing lists

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their
       resolution. Bug reports can be submitted via the web:

         https://github.com/bioperl/bioperl-live/issues

AUTHOR - Warren Kretzschmar

       This module was written by Warren Kretzschmar

       email: wkretzsch@gmail.com

       This module grew out of a parser written by Aida Andres.

   Public Domain Notice
       This software/database is ``United States Government Work'' under the terms of the United
       States Copyright Act. It was written as part of the authors' official duties for the
       United States Government and thus cannot be copyrighted. This software/database is freely
       available to the public for use without a copyright notice. Restrictions cannot be placed
       on its present or future use.

       Although all reasonable efforts have been taken to ensure the accuracy and reliability of
       the software and data, the National Human Genome Research Institute (NHGRI) and the U.S.
       Government does not and cannot warrant the performance or results that may be obtained by
       using this software or data.  NHGRI and the U.S. Government disclaims all warranties as to
       performance, merchantability or fitness for any particular purpose.

METHODS

   Methods for Internal Use
       _initialize

       Title   : _initialize Usage   : $stream = Bio::SeqIO::msOUT->new($infile) Function:
       extracts basic information about the file.  Returns : Bio::SeqIO object Args    : no_og,
       gunzip, gzip, n_sites Details   :
           - include 'no_og' flag if the last population of an msout file contains
             only one haplotype and you don't want the last haplotype to be
             treated as the outgroup ( suggested when reading data created by ms ).
           - including 'n_sites' (positive integer) causes all output haplotypes to be
             mapped to a sequence of length 'n_sites'. See set_n_sites() for more details.

       _read_start

       Title   : _read_start Usage   : $stream->_read_start() Function: reads from the filehandle
       $stream->{_filehandle} all information up to the first haplotype (sequence).  Closes the
       filehandle if all lines have been read.  Returns : void Args    : none

   Methods to Access Data
       get_segsites

       Title   : get_segsites Usage   : $segsites = $stream->get_segsites() Function: returns the
       number of segsites in the msOUT file (according to the msOUT header line's -s option), or
       the current run's segsites if -s was not specified in the command line (in this case the
       number of segsites varies from run to run).  Returns : scalar Args    : NONE

       get_current_run_segsites

       Title   : get_current_run_segsites Usage   : $segsites =
       $stream->get_current_run_segsites() Function: returns the number of segsites in the run of
       the last read
                 haplotype (sequence).  Returns : scalar Args    : NONE

       get_n_sites

       Title   : get_n_sites Usage   : $n_sites = $stream->get_n_sites() Function: Gets the
       number of total sites (variable or not) to be output.  Returns : scalar if n_sites option
       is defined at call time of new() Args    : NONE Note    :
                 WARNING: Final sequence length might not be equal to n_sites if n_sites is
                          too close to number of segregating sites in the msout file.

       set_n_sites

       Title   : set_n_sites Usage   : $n_sites = $stream->set_n_sites($value) Function: Sets the
       number of total sites (variable or not) to be output.  Returns : 1 on success; throws an
       error if $value is not a positive integer or undef Args    : positive integer Note    :
                 WARNING: Final sequence length might not be equal to n_sites if it is
                          too close to number of segregating sites.
                 - n_sites needs to be at least as large as the number of segsites of
                   the next haplotype returned
                 - n_sites may also be set to undef, in which case haplotypes are returned
                   under the infinite sites model assumptions.

       get_runs

       Title   : get_runs Usage   : $runs = $stream->get_runs() Function: returns the number of
       runs in the msOUT file (according to the
                 msinfo line) Returns : scalar Args    : NONE

       get_Seeds

       Title   : get_Seeds Usage   : @seeds = $stream->get_Seeds() Function: returns an array of
       the seeds used in the creation of the msOUT file.  Returns : array Args    : NONE Details
       : In older versions, ms used three seeds.  Newer versions of ms seem to
                 use only one (longer) seed.  This function will return all the seeds
                 found.

       get_Positions

       Title   : get_Positions Usage   : @positions = $stream->get_Positions() Function: returns
       an array of the names of each segsite of the run of the last
                 read hap.  Returns : array Args    : NONE Details : The Positions may or may not
       vary from run to run depending on the
                 options used with ms.

       get_tot_run_haps

       Title   : get_tot_run_haps Usage   : $number_of_haps_per_run = $stream->get_tot_run_haps()
       Function: returns the number of haplotypes (sequences) in each run of the msOUT
                 file ( according to the msinfo line ).  Returns : scalar >= 0 Args    : NONE
       Details : This number should not vary from run to run.

       get_ms_info_line

       Title   : get_ms_info_line Usage   : $ms_info_line = $stream->get_ms_info_line() Function:
       returns the header line of the msOUT file.  Returns : scalar Args    : NONE

       tot_haps

       Title   : tot_haps Usage   : $number_of_haplotypes_in_file = $stream->tot_haps() Function:
       returns the number of haplotypes (sequences) in the msOUT file.
                 Information gathered from msOUT header line.  Returns : scalar Args    : NONE

       get_Pops

       Title   : get_Pops Usage   : @pops = $stream->pops() Function: returns an array of
       population sizes (order taken from the -I flag in
                 the msOUT header line).  This array will include the last hap even if
                 it looks like an outgroup.  Returns : array of scalars > 0 Args    : NONE

       get_next_run_num

       Title   : get_next_run_num Usage   : $next_run_number = $stream->next_run_num() Function:
       returns the number of the ms run that the next haplotype (sequence)
                 will be taken from (starting at 1).  Returns undef if the complete
                 file has been read.  Returns : scalar > 0 or undef Args    : NONE

       get_last_haps_run_num

       Title   : get_last_haps_run_num Usage   : $last_haps_run_number =
       $stream->get_last_haps_run_num() Function: returns the number of the ms run that the last
       haplotype (sequence)
                 was taken from (starting at 1).  Returns undef if no hap has been
                 read yet.  Returns : scalar > 0 or undef Args    : NONE

       get_last_read_hap_num

       Title   : get_last_read_hap_num Usage   : $last_read_hap_num =
       $stream->get_last_read_hap_num() Function: returns the number (starting with 1) of the
       last haplotype read from
                 the ms file Returns : scalar >= 0 Args    : NONE Details   : 0 means that no
       haplotype has been read yet.  Is reset to 0 every run.

       outgroup

       Title   : outgroup Usage   : $outgroup = $stream->outgroup() Function: returns '1' if the
       msOUT stream has an outgroup.  Returns '0'
                 otherwise.  Returns : '1' or '0' Args    : NONE Details   : This method will
       return '1' only if the last population in the msOUT
                 file contains only one haplotype.  If the last population is not an
                 outgroup then create the msOUT object using 'no_og' as input flag.
                 Also, return 0, if the run has only one population.

       get_next_haps_pop_num

       Title   : get_next_haps_pop_num Usage   : ($next_haps_pop_num, $num_haps_left_in_pop) =
       $stream->get_next_haps_pop_num() Function: First return value is the population number
       (starting with 1) the
                 next hap will come from. The second return value is the number of haps
                 left to read in the population from which the next hap will come.  Returns :
       (scalar > 0, scalar > 0) Args    : NONE

       get_next_seq

       Title   : get_next_seq Usage   : $seq = $stream->get_next_seq() Function: reads and
       returns the next sequence (haplotype) in the stream Returns : Bio::Seq object or void if
       end of file Args    : NONE Note : This function is included only to conform to convention.
       The
                 returned Bio::Seq object holds a halpotype in coded form. Use the hash
                 returned by get_base_conversion_table() to convert 'A', 'T', 'C', 'G'
                 back into 1,2,4 and 5. Use get_next_hap() to retrieve the halptoype as
                 a string of 1,2,4 and 5s instead.

       next_seq

       Title   : next_seq Usage   : $seq = $stream->next_seq() Function: Alias to get_next_seq()
       Returns : Bio::Seq object or void if end of file Args    : NONE Note    : This function is
       only included for convention.  It calls get_next_seq().
                 See get_next_seq() for details.

       get_next_hap

       Title   : get_next_hap Usage   : $hap = $stream->next_hap() Function: reads and returns
       the next sequence (haplotype) in the stream.
                 Returns undef if all sequences in stream have been read.  Returns : Haplotype
       string (e.g. '110110000101101045454000101' Args    : NONE Note : Use get_next_seq() if you
       want the halpotype returned as a
                 Bio::Seq object.

       get_next_pop

       Title   : get_next_pop Usage   : @seqs = $stream->next_pop() Function: reads and returns
       all the remaining sequences (haplotypes) in the
                 population of the next sequence.  Returns an empty list if no more
                 haps remain to be read in the stream Returns : array of Bio::Seq objects Args
       : NONE

       next_run

       Title   : next_run Usage   : @seqs = $stream->next_run() Function: reads and returns all
       the remaining sequences (haplotypes) in the ms
                 run of the next sequence.  Returns an empty list if all haps have been
                 read from the stream.  Returns : array of Bio::Seq objects Args    : NONE

   Methods to Retrieve Constants
       base_conversion_table

       Title   : get_base_conversion_table Usage   : $table_hash_ref =
       $stream->get_base_conversion_table() Function: returns a reference to a hash.  The keys of
       the hash are the letters '
                 A','T','G','C'. The values associated with each key are the value that
                 each letter in the sequence of a seq object returned by a
                 Bio::SeqIO::msout stream should be translated to.  Returns : reference to a hash
       Args    : NONE Synopsis:

               # retrieve the Bio::Seq object's sequence
               my $haplotype = $seq->seq;

               # need to convert all letters to their corresponding numbers.
               foreach my $base (keys %{$rh_base_conversion_table}){
                       $haplotype =~ s/($base)/$rh_base_conversion_table->{$base}/g;
               }

               # $haplotype is now an ms style haplotype. (e.g. '100101101455')