oracular (3) Bio::DB::GFF::Adaptor::berkeleydb.3pm.gz

Provided by: libbio-db-gff-perl_1.7.4-1_all bug

NAME

       Bio::DB::GFF::Adaptor::berkeleydb -- Bio::DB::GFF database adaptor for in-memory databases

SYNOPSIS

         use Bio::DB::GFF;
         my $db = Bio::DB::GFF->new(-adaptor=> 'berkeleydb',
                                    -create => 1, # on initial build you need this
                                    -dsn    => '/usr/local/share/gff/dmel');

         # initialize an empty database, then load GFF and FASTA files
         $db->initialize(1);
         $db->load_gff('/home/drosophila_R3.2.gff');
         $db->load_fasta('/home/drosophila_R3.2.fa');

         # do queries
         my $segment  = $db->segment(Chromosome => '1R');
         my $subseg   = $segment->subseq(5000,6000);
         my @features = $subseg->features('gene');

       See Bio::DB::GFF for other methods.

DESCRIPTION

       This adaptor implements a berkeleydb-indexed version of Bio::DB::GFF.  It requires the
       DB_File and Storable modules. It can be used to store and retrieve short to medium-length
       GFF files of several million features in length.

CONSTRUCTOR

       Use Bio::DB::GFF->new() to construct new instances of this class.  Three named arguments
       are recommended:

        Argument    Description
        --------    -----------

        -adaptor    Set to "berkeleydb" to create an instance of this class.

        -dsn        Path to directory where the database index files will be stored (alias -db)

        -autoindex  Monitor the indicated directory path for FASTA and GFF files, and update the
                      indexes automatically if they change (alias -dir)

        -write      Set to a true value in order to update the database.

        -create     Set to a true value to create the database the first time
                      (implies -write)

        -tmp        Location of temporary directory for storing intermediate files
                      during certain queries.

        -preferred_groups  Specify the grouping tag. See L<Bio::DB::GFF>

       The -dsn argument selects the directory in which to store the database index files. If the
       directory does not exist it will be created automatically, provided that the current
       process has sufficient privileges. If no -dsn argument is specified, a database named
       "test" will be created in your system's temporary files directory.

       The -tmp argument specifies the temporary directory to use for storing intermediate search
       results. If not specified, your system's temporary files directory will be used. On Unix
       systems, the TMPDIR environment variable is honored. Note that some queries can require a
       lot of space.

       The -autoindex argument, if present, selects a directory to be monitored for GFF and FASTA
       files (which can be compressed with the gzip program if desired). Whenever any file in
       this directory is changed, the index files will be updated. Note that the indexing can
       take a long time to run: anywhere from 5 to 10 minutes for a million features. An alias
       for this argument is -dir, which gives this adaptor a similar flavor to the "memory"
       adaptor.

       -dsn and -dir can point to the same directory. If -dir is given but -dsn is absent the
       index files will be stored into the directory containing the source files.  For
       autoindexing to work, you must specify the same -dir path each time you open the database.

       If you do not choose autoindexing, then you will want to load the database using the
       bp_load_gff.pl command-line tool. For example:

        bp_load_gff.pl -a berkeleydb -c -d /usr/local/share/gff/dmel dna1.fa dna2.fa features.gff

METHODS

       See Bio::DB::GFF for inherited methods

BUGS

       The various get_Stream_* methods and the features() method with the -iterator argument
       only return an iterator after the query runs completely and the module has been able to
       generate a temporary results file on disk. This means that iteration is not as big a win
       as it is for the relational-database adaptors.

       Like the dbi::mysqlopt adaptor, this module uses a binning scheme to speed up range-based
       searches. The binning scheme used here imposes a hard-coded 1 gigabase (1000 Mbase) limit
       on the size of the largest chromosome or other reference sequence.

SEE ALSO

       Bio::DB::GFF, bioperl

AUTHORS

       Vsevolod (Simon) Ilyushchenko >simonf@cshl.edu< Lincoln Stein >lstein@cshl.edu<

       Copyright (c) 2005 Cold Spring Harbor Laboratory.

       This library is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself.

   _feature_by_name
        Title   : _feature_by_name
        Usage   : $db->get_features_by_name($class,$name,$callback)
        Function: get a list of features by name and class
        Returns : count of number of features retrieved
        Args    : name of feature, class of feature, and a callback
        Status  : protected

       This method is used internally.  The callback arguments are those used by make_feature().