Provided by: grinder_0.5.4-4_all bug

NAME

       Grinder::KmerCollection - A collection of kmers from sequences

SYNOPSIS

         my $col = Grinder::KmerCollection->new( -k    => 10,
                                                 -file => 'seqs.fa' );

DESCRIPTION

       Manage a collection of kmers found in various sequences. Store information about what
       sequence a kmer was found in and its starting position on the sequence.

AUTHOR

       Florent Angly <florent.angly@gmail.com>

APPENDIX

       The rest of the documentation details each of the object methods. Internal methods are
       usually preceded with a _

   new
        Title   : new
        Usage   : my $col = Grinder::KmerCollection->new( -k => 10, -file => 'seqs.fa', -revcom => 1 );
        Function: Build a new kmer collection
        Args    : -k        set the kmer length (default: 10 bp)
                  -revcom   count kmers before and after reverse-complementing sequences
                            (default: 0)
                  -seqs     count kmers in the provided arrayref of sequences (Bio::Seq
                            or Bio::SeqFeature objects)
                  -ids      if specified, index the sequences provided to -seq using the
                            the IDs in this arrayref instead of using the sequences
                            $seq->id() method
                  -file     count kmers in the provided file of sequences
                  -weights  if specified, assign the abundance of each sequence from the
                            values in this arrayref

        Returns : Grinder::KmerCollection object

   k
        Usage   : $col->k;
        Function: Get the length of the kmers
        Args    : None
        Returns : Positive integer

   weights
        Usage   : $col->weights({'seq1' => 3, 'seq10' => 0.45});
        Function: Get or set the weight of each sequence. Each sequence is given a
                  weight of 1 by default.
        Args    : hashref where the keys are sequence IDs and the values are the weight
                  of the corresponding (e.g. their relative abundance)
        Returns : Grinder::KmerCollection object

   collection_by_kmer
        Usage   : $col->collection_by_kmer;
        Function: Get the collection of kmers, indexed by kmer
        Args    : None
        Returns : A hashref of hashref of arrayref:
                     hash->{kmer}->{ID of sequences with this kmer}->[starts of kmer on sequence]

   collection_by_seq
        Usage   : $col->collection_by_seq;
        Function: Get the collection of kmers, indexed by sequence ID
        Args    : None
        Returns : A hashref of hashref of arrayref:
                     hash->{ID of sequences with this kmer}->{kmer}->[starts of kmer on sequence]

   add_file
        Usage   : $col->add_file('seqs.fa');
        Function: Process the kmers in the given file of sequences.
        Args    : filename
        Returns : Grinder::KmerCollection object

   add_seqs
        Usage   : $col->add_seqs([$seq1, $seq2]);
        Function: Process the kmers in the given sequences.
        Args    : * arrayref of Bio::Seq or Bio::SeqFeature objects
                  * arrayref of IDs to use for the indexing of the sequences
        Returns : Grinder::KmerCollection object

   filter_rare
        Usage   : $col->filter_rare( 2 );
        Function: Remove kmers occurring at less than the (weighted) abundance specified
        Args    : integer
        Returns : Grinder::KmerCollection object

   filter_shared
        Usage   : $col->filter_shared( 2 );
        Function: Remove kmers occurring in less than the number of sequences specified
        Args    : integer
        Returns : Grinder::KmerCollection object

   counts
        Usage   : $col->counts
        Function: Calculate the total count of each kmer. Counts are affected by the
                  weights given to the sequences.
        Args    : * restrict sequences to search to specified sequence ID (optional)
                  * starting position from which counting should start (optional)
                  * 0 to report counts (default), 1 to report frequencies (normalize to 1)
        Returns : * arrayref of the different kmers
                  * arrayref of the corresponding total counts

   sources
        Usage   : $col->sources()
        Function: Return the sources of a kmer and their (weighted) abundance.
        Args    : * kmer to get the sources of
                  * sources to exclude from the results (optional)
                  * 0 to report counts (default), 1 to report frequencies (normalize to 1)
        Returns : * arrayref of the different sources
                  * arrayref of the corresponding total counts
                  If the kmer requested does not exist, the array will be empty.

   kmers
        Usage   : $col->kmers('seq1');
        Function: This is the inverse of sources(). Return the kmers found in a sequence
                  (given its ID) and their (weighted) abundance.
        Args    : * sequence ID to get the kmers of
                  * 0 to report counts (default), 1 to report frequencies (normalize to 1)
        Returns : * arrayref of sequence IDs
                  * arrayref of the corresponding total counts
                  If the sequence ID requested does not exist, the arrays will be empty.

   positions
        Usage   : $col->positions()
        Function: Return the positions of the given kmer on a given sequence. An error
                  is reported if the kmer requested does not exist
        Args    : * desired kmer
                  * desired sequence with this kmer
        Returns : Arrayref of the different positions. The arrays will be empty if the
                  desired combination of kmer and sequence was not found.