Provided by: grinder_0.5.4-4_all
NAME
Grinder::KmerCollection - A collection of kmers from sequences
SYNOPSIS
my $col = Grinder::KmerCollection->new( -k => 10, -file => 'seqs.fa' );
DESCRIPTION
Manage a collection of kmers found in various sequences. Store information about what sequence a kmer was found in and its starting position on the sequence.
AUTHOR
Florent Angly <florent.angly@gmail.com>
APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ new Title : new Usage : my $col = Grinder::KmerCollection->new( -k => 10, -file => 'seqs.fa', -revcom => 1 ); Function: Build a new kmer collection Args : -k set the kmer length (default: 10 bp) -revcom count kmers before and after reverse-complementing sequences (default: 0) -seqs count kmers in the provided arrayref of sequences (Bio::Seq or Bio::SeqFeature objects) -ids if specified, index the sequences provided to -seq using the the IDs in this arrayref instead of using the sequences $seq->id() method -file count kmers in the provided file of sequences -weights if specified, assign the abundance of each sequence from the values in this arrayref Returns : Grinder::KmerCollection object k Usage : $col->k; Function: Get the length of the kmers Args : None Returns : Positive integer weights Usage : $col->weights({'seq1' => 3, 'seq10' => 0.45}); Function: Get or set the weight of each sequence. Each sequence is given a weight of 1 by default. Args : hashref where the keys are sequence IDs and the values are the weight of the corresponding (e.g. their relative abundance) Returns : Grinder::KmerCollection object collection_by_kmer Usage : $col->collection_by_kmer; Function: Get the collection of kmers, indexed by kmer Args : None Returns : A hashref of hashref of arrayref: hash->{kmer}->{ID of sequences with this kmer}->[starts of kmer on sequence] collection_by_seq Usage : $col->collection_by_seq; Function: Get the collection of kmers, indexed by sequence ID Args : None Returns : A hashref of hashref of arrayref: hash->{ID of sequences with this kmer}->{kmer}->[starts of kmer on sequence] add_file Usage : $col->add_file('seqs.fa'); Function: Process the kmers in the given file of sequences. Args : filename Returns : Grinder::KmerCollection object add_seqs Usage : $col->add_seqs([$seq1, $seq2]); Function: Process the kmers in the given sequences. Args : * arrayref of Bio::Seq or Bio::SeqFeature objects * arrayref of IDs to use for the indexing of the sequences Returns : Grinder::KmerCollection object filter_rare Usage : $col->filter_rare( 2 ); Function: Remove kmers occurring at less than the (weighted) abundance specified Args : integer Returns : Grinder::KmerCollection object filter_shared Usage : $col->filter_shared( 2 ); Function: Remove kmers occurring in less than the number of sequences specified Args : integer Returns : Grinder::KmerCollection object counts Usage : $col->counts Function: Calculate the total count of each kmer. Counts are affected by the weights given to the sequences. Args : * restrict sequences to search to specified sequence ID (optional) * starting position from which counting should start (optional) * 0 to report counts (default), 1 to report frequencies (normalize to 1) Returns : * arrayref of the different kmers * arrayref of the corresponding total counts sources Usage : $col->sources() Function: Return the sources of a kmer and their (weighted) abundance. Args : * kmer to get the sources of * sources to exclude from the results (optional) * 0 to report counts (default), 1 to report frequencies (normalize to 1) Returns : * arrayref of the different sources * arrayref of the corresponding total counts If the kmer requested does not exist, the array will be empty. kmers Usage : $col->kmers('seq1'); Function: This is the inverse of sources(). Return the kmers found in a sequence (given its ID) and their (weighted) abundance. Args : * sequence ID to get the kmers of * 0 to report counts (default), 1 to report frequencies (normalize to 1) Returns : * arrayref of sequence IDs * arrayref of the corresponding total counts If the sequence ID requested does not exist, the arrays will be empty. positions Usage : $col->positions() Function: Return the positions of the given kmer on a given sequence. An error is reported if the kmer requested does not exist Args : * desired kmer * desired sequence with this kmer Returns : Arrayref of the different positions. The arrays will be empty if the desired combination of kmer and sequence was not found.