Ubuntu Manpage: Paranoid::IO::FileMultiplexer

name
version
synopsis
description
subroutines/methods
dependencies
bugs and limitations
author
license and copyright

kinetic (3) Paranoid::IO::FileMultiplexer.3pm.gz

Provided by: libparanoid-perl_2.10-1_all

NAME

       Paranoid::IO::FileMultiplexer - File Multiplexer

VERSION

       $Id: lib/Paranoid/IO/FileMultiplexer.pm, 2.10 2022/03/08 00:01:04 acorliss Exp $

SYNOPSIS

           $obj = Paranoid::IO::FileMultiplexer->new(
               file        => $fn,
               readOnly    => 0,
               perms       => $perms,
               blockSize   => $bsize,
               );

           $header = $obj->header;

           $rv = $obj->chkConsistency;
           $rv = $obj->addStream($name);

           $rv = $obj->strmSeek($sname, $pos, $whence);
           $rv = $obj->strmTell($sname);
           $bw = $obj->strmWrite($sname, $content);
           $br = $obj->strmRead($stream, \$content, $bytes);
           $bw = $obj->strmAppend($sname, $content);
           $bw = $obj->strmTruncate($sname, $neos);

DESCRIPTION

This class produces file multiplexer objects that multiplex I/O streams into a single
file. This allows I/O patterns that would normally be applied to multiple files to be
applied to one, with full support for concurrent access by multiple processes on the same
system.

At its most basic, one could use these objects as an archive format for multiple files.
At its most complex, this could be a database backend file, similar to sqlite or Berkeley
DB.

This does require flock support for the file.

CAVEATS FOR USAGE
This class is built essentially as a block allocation tool, which does have some side
effects that must be anticipated. Full support is available for both 32-bit and 64-bit
file systems, and files produced can be exchange across both types of platforms with no
special handling, at least until the point the file grows beyond the capabilities of a 32
bit platform. Similarly, portability should work fine across both endian platforms.

That said, the simplicity of this design did require some compromises, the first being the
number of supported "streams" that can be stored inside a single file. That is a function
of the block size chosen for the file. All allocated streams are tracked in the file
header block, so the number of streams is constrained by the number that can be recorded
in that block.

Likewise, the maximum size of a stream is also limited by the block size, since the stream
head block can only track so many block allocation tables, and each block allocation table
can only track so many data blocks.

Practically speaking, for many use cases this should not be an issue, but you can get an
idea of the impact on both 32-bit and 64-bit systems like so:

32b/4KB 64b/4KB
--------------------------------------------------------------------------
Max File Size: 4294967295 (4.00GB) 18446744073709551615 (16.00EX)
Max Streams: 135 135
Max Stream Size: 1052872704 (1004.10MB) 1052872704 (1004.10MB)

32b/8KB 64b/8KB
--------------------------------------------------------------------------
Max File Size: 4294967295 (4.00GB) 18446744073709551615 (16.00EX)
Max Streams: 272 272
Max Stream Size: 4294967295 (4.00GB) 8506253312 (7.92GB)

As you can see, 8KB blocks will provide full utilization of your file system capabilities
on a 32-bit platform, but on a 64-bit platform, you are still artificially capped on how
much data can be stored in an individual stream. The number of streams will always
limited identically on both platforms based on the block size.

NOTE: The actual limits of file sizes aren't dependent upon the native size of longs or
quads, but the file system design, itself. Some file systems designed for 32-bit
processors reserved the highest bit, which made the highest addressable space in a file
2GB instead of 4GB. Other filesystems had limits that were a function of inode size and
other aspects of the formatted file system. End sum, the true limit for file size may be
outside of the ability for this module to detect and accomodate gracefully.

One final caveat should be noted regarding I/O performance. The supported block sizes are
intentionally limited in hopes of avoiding double-write penalties due to block alignment
issues on the underlying file system. At the same time, the block size also serves as a
kind of crude tuning capability for the size of I/O operations. No individual I/O,
whether read or write, will exceed the size of a block. You, as the developer, can call
the class API with reads of any size you wish, of course, but behind the scenes it will be
broken up into block-sized reads at most.

For those reasons, when choosing your block size one should choose based on the best
compromise between I/O performance and the minimum number of streams (or maximum stream
size) anticipated.

As a final note, one should also remember that space is allocated to the file in block
sized chunks. That means creating a new file w/1MB block size, containing one stream, but
with nothing written to the stream, will create a file 4MB in size. That's due to the
preallocation of the file header, a stream header, the stream's first block allocation
table, and an initial data block.

SUBROUTINES/METHODS

new
$obj = Paranoid::IO::FileMultiplexer->new(
file => $fn,
readOnly => 0,
perms => $perms,
blockSize => $bsize,
);

This class method creates new objects for accessing the contents of the pass file. It
will create a new file if missing, or open an existing file and retrieve the metadata for
tuning.

Only the file name is mandatory. Block size defaults to 4KB, but if specified, can
support from 4KB to 1MB block sizes, as long as the block size is a multiple of 4KB.

header
$header = $obj->header;

This method returns a reference to the file header block object. Typically, this has no
practical value to the developer, but the file header does provide a model method that
returns a hash with some predicted sizing limitations. if you want to know the maximum
number of supported streams or the maximum size of an individual stream, this could be
useful. Calling any other method for that class, however, could cause corruption of your
file.

chkConsistency
$rv = $obj->chkConsistency;

This method performs a high-level consistency check of the file structure. At this time
it is limited to ensuring that every header block (file, stream, and BAT) has a viable
signature, and all records inside those blocks are allocated and match signatures where
appropriate.

If this method detects any inconsistencies it will mark the object as corrupted, which
will prevent any further writes to the file in hopes that further corruption can be
avoided.

The file format of this multiplexer is such that a good deal of data can be recovered even
with the complete loss of the file header. Corruption in a stream header can even be
recovered from. Only the loss of a BAT header can prevent data from being recovered, but
even then that will only impact the stream it belongs to. It should not impact other
streams.

Take this with a grain of salt, of course. There are always caveats to that rule,
depending on whether the corruption has been detected prior to dangerous writes. Every
read and write to a stream triggers a few basic consistency checks prior to progressing,
but they are not as thorough as this method's process, lest it have and adverse impact on
performance.

This returns a boolean value.

addStream
$rv = $obj->addStream($name);

This method adds a stream to the file, triggering the automatic allocation of three blocks
(a stream header, the first stream BAT, and the first data block). It returns a boolean
value, denoting success or failure.

strmSeek
$rv = $obj->strmSeek($sname, $pos, $whence);

This method acts the same as the core sysseek, taking the same arguments, but with the
substitution of the stream name for the file handle. It's return value is also the same.

Note that the position returned is relative to the data stream, not the file itself.

strmTell
$rv = $obj->strmTell($sname);

This method acts the same as the core tell, taking the same arguments, but with the
substitution of the stream name for the file handle. Like strmSeek, the position returned
is relative to the data stream, not the file itself.

strmWrite
$bw = $obj->strmWrite($sname, $content);

This method acts similarly to a very simplifed syswrite. It does not support length and
offset arguments, only the content itself. It will presume that the stream position has
been adjusted as needed prior to invocation.

This returns the number of bytes written. If everything is working appropriately, that
should match the byte length of the content itself.

strmRead
$br = $obj->strmRead($stream, \$content, $bytes);

This method acts similarly to a very simplified sysread. It does not support offset
arguments, only a scalar reference and the number of bytes to read. It also presumes that
the stream position has been adjusted as needed prior to invocation.

This returns the number of bytes read. Unless you've asked for more data than has been
written to the stream, this should match the number of bytes requested.

strmAppend
$bw = $obj->strmAppend($sname, $content);

This method acts similarly to Paranoid::IO's pappend. It always seeks to the end of the
written data stream before appending the requested content. Like strmWrite, it will
return the number of bytes written. Like pappend, it does not move the stream position,
should you perform additional writes or reads.

strmTruncate
$bw = $obj->strmTruncate($sname, $neos);

This method acts similarly to truncate. It returns a boolean value denoting failure or
success.

DESTROY
Obviously, one would never need to call this directly, but it is documented here to inform
the developer that once an object goes out of scope, it will call pclose on the file,
explicitly closing and purging any cached file handles from Paranoid::IO's internal cache.

DEPENDENCIES

       o   Carp

       o   Fcntl

       o   Paranoid

       o   Paranoid::Debug

       o   Paranoid::IO

       o   Paranoid::IOFileMultiplexer::Block::FileHeader

       o   Paranoid::IOFileMultiplexer::Block::StreamHeader

       o   Paranoid::IOFileMultiplexer::Block::BATHeader

BUGS AND LIMITATIONS

AUTHOR

       Arthur Corliss (corliss@digitalmages.com)

LICENSE AND COPYRIGHT

       This software is free software.  Similar to Perl, you can redistribute it and/or modify it
       under the terms of either:

         a)     the GNU General Public License
                <https://www.gnu.org/licenses/gpl-1.0.html> as published by the
                Free Software Foundation <http://www.fsf.org/>; either version 1
                <https://www.gnu.org/licenses/gpl-1.0.html>, or any later version
                <https://www.gnu.org/licenses/license-list.html#GNUGPL>, or
         b)     the Artistic License 2.0
                <https://opensource.org/licenses/Artistic-2.0>,

       subject to the following additional term:  No trademark rights to "Paranoid" have been or
       are conveyed under any of the above licenses.  However, "Paranoid" may be used fairly to
       describe this unmodified software, in good faith, but not as a trademark.

       (c) 2005 - 2021, Arthur Corliss (corliss@digitalmages.com) (tm) 2008 - 2021, Paranoid Inc.
       (www.paranoid.com)