Ubuntu Manpage: Tie::RDBM - Tie hashes to relational databases

NAME

       Tie::RDBM - Tie hashes to relational databases

SYNOPSIS

         use Tie::RDBM;
         tie %h,'Tie::RDBM','mysql:test',{table=>'Demo',create=>1,autocommit=>0};
         $h{'key1'} = 'Some data here';
         $h{'key2'} = 42;
         $h{'key3'} = { complex=>['data','structure','here'],works=>'true' };
         $h{'key4'} = new Foobar('Objects work too');
         print $h{'key3'}->{complex}->[0];
         tied(%h)->commit;
         untie %h;

DESCRIPTION

       This module allows you to tie Perl associative arrays (hashes) to SQL databases using the
       DBI interface.  The tied hash is associated with a table in a local or networked database.
       One field of the table becomes the hash key, and another becomes the value.  Once tied,
       all the standard hash operations work, including iteration over keys and values.

       If you have the Storable module installed, you may store arbitrarily complex Perl
       structures (including objects) into the hash and later retrieve them.  When used in
       conjunction with a network-accessible database, this provides a simple way to transmit
       data structures between Perl programs on two different machines.

TIEING A DATABASE

tie %VARIABLE,Tie::RDBM,DSN [,\%OPTIONS]

You tie a variable to a database by providing the variable name, the tie interface (always
"Tie::RDBM"), the data source name, and an optional hash reference containing various
options to be passed to the module and the underlying database driver.

The data source may be a valid DBI-style data source string of the form
"dbi:driver:database_name[:other information]", or a previously-opened database handle.
See the documentation for DBI and your DBD driver for details. Because the initial "dbi"
is always present in the data source, Tie::RDBM will automatically add it for you.

The options array contains a set of option/value pairs. If not provided, defaults are
assumed. The options are:

user ['']
Account name to use for database authentication, if necessary. Default is an empty
string (no authentication necessary).

password ['']
Password to use for database authentication, if necessary. Default is an empty string
(no authentication necessary).

db ['']
The data source, if not provided in the argument. This allows an alternative calling
style:

tie(%h,Tie::RDBM,{db=>'dbi:mysql:test',create=>1};

table ['pdata']
The name of the table in which the hash key/value pairs will be stored.

key ['pkey']
The name of the column in which the hash key will be found. If not provided, defaults
to "pkey".

value ['pvalue']
The name of the column in which the hash value will be found. If not provided,
defaults to "pvalue".

frozen ['pfrozen']
The name of the column that stores the boolean information indicating that a complex
data structure has been "frozen" using Storable's freeze() function. If not provided,
defaults to "pfrozen".

NOTE: if this field is not present in the database table, or if the database is
incapable of storing binary structures, Storable features will be disabled.

create [0]
If set to a true value, allows the module to create the database table if it does not
already exist. The module emits a CREATE TABLE command and gives the key, value and
frozen fields the data types most appropriate for the database driver (from a lookup
table maintained in a package global, see DATATYPES below).

The success of table creation depends on whether you have table create access for the
database.

The default is not to create a table. tie() will fail with a fatal error.

drop [0]
If the indicated database table exists, but does not have the required key and value
fields, Tie::RDBM can try to add the required fields to the table. Currently it does
this by the drastic expedient of DROPPING the table entirely and creating a new empty
one. If the drop option is set to true, Tie::RDBM will perform this radical
restructuring. Otherwise tie() will fail with a fatal error. "drop" implies
"create". This option defaults to false.

A future version of Tie::RDBM may implement a last radical restructuring method;
differences in DBI drivers and database capabilities make this task harder than it
would seem.

autocommit [1]
If set to a true value, the "autocommit" option causes the database driver to commit
after every store statement. If set to a false value, this option will not commit to
the database until you explicitly call the Tie::RDBM commit() method.

The autocommit option defaults to true.

DEBUG [0]
When the "DEBUG" option is set to a true value the module will echo the contents of
SQL statements and other debugging information to standard error.

USING THE TIED ARRAY

       The standard fetch, store, keys(), values() and each() functions will work as expected on
       the tied array.  In addition, the following methods are available on the underlying
       object, which you can obtain with the standard tie() operator:

       commit()
              (tied %h)->commit();

           When using a database with the autocommit option turned off, values that are stored
           into the hash will not become permanent until commit() is called.  Otherwise they are
           lost when the application terminates or the hash is untied.

           Some SQL databases don't support transactions, in which case you will see a warning
           message if you attempt to use this function.

       rollback()
              (tied %h)->rollback();

           When using a database with the autocommit option turned off, this function will roll
           back changes to the database to the state they were in at the last commit().  This
           function has no effect on database that don't support transactions.

DATABASES AND DATATYPES

Perl is a weakly typed language. Databases are strongly typed. When translating from
databases to Perl there is inevitably some data type conversion that you must worry about.
I have tried to keep the details as transparent as possible without sacrificing power;
this section discusses the tradeoffs.

If you wish to tie a hash to a preexisting database, specify the database name, the table
within the database, and the fields you wish to use for the keys and values. These fields
can be of any type that you choose, but the data type will limit what can be stored there.
For example, if the key field is of type "int", then any numeric value will be a valid
key, but an attempt to use a string as a key will result in a run time error. If a key or
value is too long to fit into the data field, it will be truncated silently.

For performance reasons, the key field should be a primary key, or at least an indexed
field. It should also be unique. If a key is present more than once in a table, an
attempt to fetch it will return the first record found by the SQL select statement.

If you wish to store Perl references in the database, the module needs an additional field
in which it can store a flag indicating whether the data value is a simple or a complex
type. This "frozen" field is treated as a boolean value. A "tinyint" data type is
recommended, but strings types will work as well.

In a future version of this module, the "frozen" field may be turned into a general
"datatype" field in order to minimize storage. For future compatibility, please use an
integer for the frozen field.

If you use the "create" and/or "drop" options, the module will automatically attempt to
create a table for its own use in the database if a suitable one isn't found. It uses
information defined in the package variable %Tie::RDBM::Types to determine what kind of
data types to create. This variable is indexed by database driver. Each index contains a
four-element array indicating what data type to use for each of the key, value and frozen
fields, and whether the database can support binary types. Since I have access to only a
limited number of databases, the table is currently short:

Driver Key Field Value Field Frozen Field Binary?

mysq varchar(127) longblob tinyint 1
mSQL char(255) char(255) int 0
Sybase varchar(255) varbinary(255) tinyint 1
default varchar(255) varbinary(255) tinyint 1

The "default" entry is used for any driver not specifically mentioned.

You are free to add your own entries to this table, or make corrections. Please send me
e-mail with any revisions you make so that I can share the wisdom.

STORABLE CAVEATS

       Because the Storable module packs Perl structures in a binary format, only those databases
       that support a "varbinary" or "blob" type can handle complex datatypes.  Furthermore, some
       databases have strict limitations on the size of these structures.  For example, SyBase
       and MS SQL Server have a "varbinary" type that maxes out at 255 bytes.  For structures
       larger than this, the databases provide an "image" type in which storage is allocated in
       2K chunks!  Worse, access to this image type uses a non-standard SQL extension that is not
       supported by DBI.

       Databases that do not support binary fields cannot use the Storable feature.  If you
       attempt to store a reference to a complex data type in one of these databases it will be
       converted into strings like "HASH(0x8222cf4)", just as it would be if you tried the same
       trick with a conventional tied DBM hash.  If the database supports binary fields of
       restricted length, large structures may be silently truncated.  Caveat emptor.

       It's also important to realize the limitations of the Storable mechanism.  You can store
       and retrieve entire data structures, but you can't twiddle with individual substructures
       and expect them to persist when the process exits.  To update a data structure, you must
       fetch it from the hash, make the desired modifications, then store it back into the hash,
       as the example below shows:

       Process #1:
          tie %h,'Tie::RDBM','mysql:Employees:host.somewhere.com',
                          {table=>'employee',user=>'fred',password=>'xyzzy'};
          $h{'Anne'} = { office=>'999 Infinity Drive, Rm 203',
                         age    =>  29,
                         salary =>  32100 };
          $h{'Mark'} = { office=>'000 Iteration Circle, Rm -123',
                         age    =>  32,
                         salary =>  35000 };

       Process #2:
          tie %i,'Tie::RDBM','mysql:Employees:host.somewhere.com',
                          {table=>'employee',user=>'george',
                           password=>'kumquat2'};
          foreach (keys %i) {
             $info = $i{$_};
             if ($info->{age} > 30) {
                # Give the oldies a $1000 raise
                $info->{salary} += 1000;
                $i{$_} = $info;
             }
          }

       This example also demonstrates how two Perl scripts running on different machines can use
       Tie::RDBM to share complex data structures (in this case, the employee record) without
       resorting to sockets, remote procedure calls, shared memory, or other gadgets

PERFORMANCE

       What is the performance hit when you use this module?  It can be significant.  I used a
       simple benchmark in which Perl parsed a 6180 word text file into individual words and
       stored them into a database, incrementing the word count with each store.  The benchmark
       then read out the words and their counts in an each() loop.  The database driver was
       mySQL, running on a 133 MHz Pentium laptop with Linux 2.0.30.  I compared Tie::RDBM, to
       DB_File, and to the same task using vanilla DBI SQL statements.  The results are shown
       below:

                     STORE       EACH() LOOP
         Tie::RDBM     28 s        2.7  s
         Vanilla DBI   15 s        2.0  s
         DB_File        3 s        1.08 s

       During stores, there is an approximately 2X penalty compared to straight DBI, and a 15X
       penalty over using DB_File databases.  For the each() loop (which is dominated by reads),
       the performance is 2-3 times worse than DB_File and much worse than a vanilla SQL
       statement.  I have not investigated the bottlenecks.

TO DO LIST

          - Store strings, numbers and data structures in separate
            fields for space and performance efficiency.

           - Expand data types table to other database engines.

           - Catch internal changes to data structures and write them into
             database automatically.

BUGS

       Yes.

AUTHOR

       Lincoln Stein, lstein@w3.org

COPYRIGHT

         Copyright (c) 1998, Lincoln D. Stein

       This library is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself.

AVAILABILITY

       The latest version can be obtained from:

          http://www.genome.wi.mit.edu/~lstein/Tie-DBM/