oracular (3) Bio::DB::GFF::Adaptor::dbi::pg_fts.3pm.gz

Provided by: libbio-db-gff-perl_1.7.4-1_all bug

NAME

       Bio::DB::GFF::Adaptor::dbi::pg_fts -- Database adaptor for a specific postgres schema with
       a TSearch2 implementation

SYNOPSIS

           #create new GFF database connection
           my $db      = Bio::DB::GFF->new( -adaptor => 'dbi::pg_fts',
                                            -dsn     => 'dbi:Pg:dbname=worm');

           #add full text indexing 'stuff'
           #assumes that TSearch2 is available to PostgreSQL
           #this will take a VERY long time for a reasonably large database
           $db->install_TSearch2();

           ...some time later...
           #we don't like full text searching...
           $db->remove_TSearch2();

DESCRIPTION

       This adaptor is based on Bio::DB::GFF::Adaptor::dbi::pg but it implements the TSearch2
       PostgreSQL contrib module for fast full text searching.  To use this module with your
       PostgreSQL GFF database, you need to make TSearch2 available in the database.

       To use this adaptor, follow these steps:

       Install TSearch2 contrib module for Pg
           Can be as easy as `sudo yum install postgresql-contrib`, or you may need to recompile
           PostgreSQL to include it.  See
           <http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html>
           for more details

       Load the TSearch2 functions to you database
             % cat tsearch2.sql | psql <your database>

       Load your data using the pg adaptor:
            % bp_pg_bulk_load_gff.pl -c -d yeast saccharomyces_cerevisiae.gff

           or

            % bp_load_gff.pl -c -d yeast -a dbi::pg saccharomyces_cerevisiae.gff

       Add GFF/TSearch2 specific modifications
           Execute a perl script like this one:

             #!/usr/bin/perl -w
             use strict;

             use Bio::DB::GFF;

             my $db = Bio::DB::GFF->new(
                 -adaptor   => 'dbi::pg_fts',
                 -dsn       => 'dbi:Pg:dbname=yeast',
                 -user      => 'scott',
               );

             print "Installing TSearch2 columns...\n";

             $db->install_TSearch2();

             print "Done\n";

       Note that this last step will take a long time.  For a S. cerevisiae database with 15K
       rows, it took over an hour on my laptop, and with a C. elegans database (~10 million rows)
       it took well over a day.

       If at some point you add more data you your database, you need to run a similar script to
       the one above, only executing the update_TSearch2() method.  Finally, if you want to
       remove the TSearch2 columns from your database and go back to using the pg adaptor, you
       can execute a script like the one above, only executing the remove_TSearch2() method.

NOTES ABOUT TSearch2 SEARCHING

       You should know a few things about how searching with TSearch2 works in the GBrowse
       environment:

       1.  TSearch2 does not do wild cards, so you should encourage your users not to use them.
           If wild cards are used, the adaptor will fall back on an ILIKE search, which will be
           much slower.

       2.  However, TSearch2 does do 'word stemming'.  That is, if you search for 'copy', it will
           find 'copy', 'copies', and 'copied'.

       3.  TSearch2 does not do phrase searching; all of the terms in the search string are ANDed
           together.

ACKNOWLEDGEMENTS

       Special thanks to Russell Smithies and Paul Smale at AgResearch in New Zealand for giving
       me their recipe for doing full text indexing in a GFF database.

BUGS

       Please report bugs to the BioPerl and/or GBrowse mailing lists
       (<mailto:bioperl-l@lists.open-bio.org> and <mailto:gmod-gbrowse@lists.sourceforge.net>
       respectively).

SEE ALSO

       Please see Bio::DB::GFF::Adaptor::dbi::pg for more information about tuning your
       PostgreSQL server for GFF data, and for general information about GFF database access, see
       Bio::DB::GFF.

AUTHOR

       Scott Cain, cain@cshl.edu

APPENDIX

   search_notes
        Title   : search_notes
        Usage   : @search_results = $db->search_notes("full text string",$limit)
        Function: Search the notes for a text string, using PostgreSQL TSearch2
        Returns : array of results
        Args    : full text search string, and an optional row limit
        Status  : public

       This is based on the mysql-specific method that makes use of the TSearch2 functionality in
       PosgreSQL's contrib directory. Given a search string, it performs a full-text search of
       the notes table and returns an array of results.  Each row of the returned array is a
       arrayref containing the following fields:

         column 1   A Bio::DB::GFF::Featname object, for passing to segment()
         column 2   The text of the note
         column 3   A relevance score.

   make_features_by_name_where_part
        Title   : make_features_by_name_where_part
        Function: constructs a TSearch2-compliant WHERE clause for a name search
        Status  : protected

   install_TSearch2
        Title   : install_TSearch2
        Function: installs schema modifications for use with TSearch2
        Usage   : $db->install_TSearch2
        Status  : public

   update_TSearch2
        Title   : update_TSearch2
        Function: Updates TSearch2 columns
        Usage   : $db->update_TSearch2
        Status  : public

   remove_TSearch2
        Title   : remove_TSearch2
        Function: Removes TSearch2 columns
        Usage   : $db->remove_TSearch2
        Status  : public