oracular (3) Paranoid::IO::FileMultiplexer.3pm.gz

Provided by: libparanoid-perl_2.10-2_all bug

NAME

       Paranoid::IO::FileMultiplexer - File Multiplexer

VERSION

       $Id: lib/Paranoid/IO/FileMultiplexer.pm, 2.10 2022/03/08 00:01:04 acorliss Exp $

SYNOPSIS

           $obj = Paranoid::IO::FileMultiplexer->new(
               file        => $fn,
               readOnly    => 0,
               perms       => $perms,
               blockSize   => $bsize,
               );

           $header = $obj->header;

           $rv = $obj->chkConsistency;
           $rv = $obj->addStream($name);

           $rv = $obj->strmSeek($sname, $pos, $whence);
           $rv = $obj->strmTell($sname);
           $bw = $obj->strmWrite($sname, $content);
           $br = $obj->strmRead($stream, \$content, $bytes);
           $bw = $obj->strmAppend($sname, $content);
           $bw = $obj->strmTruncate($sname, $neos);

DESCRIPTION

       This class produces file multiplexer objects that multiplex I/O streams into a single file.  This allows
       I/O patterns that would normally be applied to multiple files to be applied to one, with full support for
       concurrent access by multiple processes on the same system.

       At its most basic, one could use these objects as an archive format for multiple files.  At its most
       complex, this could be a database backend file, similar to sqlite or Berkeley DB.

       This does require flock support for the file.

   CAVEATS FOR USAGE
       This class is built essentially as a block allocation tool, which does have some side effects that must
       be anticipated.  Full support is available for both 32-bit and 64-bit file systems, and files produced
       can be exchange across both types of platforms with no special handling, at least until the point the
       file grows beyond the capabilities of a 32 bit platform.  Similarly, portability should work fine across
       both endian platforms.

       That said, the simplicity of this design did require some compromises, the first being the number of
       supported "streams" that can be stored inside a single file.  That is a function of the block size chosen
       for the file.  All allocated streams are tracked in the file header block, so the number of streams is
       constrained by the number that can be recorded in that block.

       Likewise, the maximum size of a stream is also limited by the block size, since the stream head block can
       only track so many block allocation tables, and each block allocation table can only track so many data
       blocks.

       Practically speaking, for many use cases this should not be an issue, but you can get an idea of the
       impact on both 32-bit and 64-bit systems like so:

                               32b/4KB                 64b/4KB
           --------------------------------------------------------------------------
           Max File Size:      4294967295 (4.00GB)     18446744073709551615 (16.00EX)
           Max Streams:        135                     135
           Max Stream Size:    1052872704 (1004.10MB)  1052872704 (1004.10MB)

                               32b/8KB                 64b/8KB
           --------------------------------------------------------------------------
           Max File Size:      4294967295 (4.00GB)     18446744073709551615 (16.00EX)
           Max Streams:        272                     272
           Max Stream Size:    4294967295 (4.00GB)     8506253312 (7.92GB)

       As you can see, 8KB blocks will provide full utilization of your file system capabilities on a 32-bit
       platform, but on a 64-bit platform, you are still artificially capped on how much data can be stored in
       an individual stream.  The number of streams will always limited identically on both platforms based on
       the block size.

       NOTE: The actual limits of file sizes aren't dependent upon the native size of longs or quads, but the
       file system design, itself.  Some file systems designed for 32-bit processors reserved the highest bit,
       which made the highest addressable space in a file 2GB instead of 4GB.  Other filesystems had limits that
       were a function of inode size and other aspects of the formatted file system.  End sum, the true limit
       for file size may be outside of the ability for this module to detect and accomodate gracefully.

       One final caveat should be noted regarding I/O performance.  The supported block sizes are intentionally
       limited in hopes of avoiding double-write penalties due to block alignment issues on the underlying file
       system.  At the same time, the block size also serves as a kind of crude tuning capability for the size
       of I/O operations.  No individual I/O, whether read or write, will exceed the size of a block.  You, as
       the developer, can call the class API with reads of any size you wish, of course, but behind the scenes
       it will be broken up into block-sized reads at most.

       For those reasons, when choosing your block size one should choose based on the best compromise between
       I/O performance and the minimum number of streams (or maximum stream size) anticipated.

       As a final note, one should also remember that space is allocated to the file in block sized chunks.
       That means creating a new file w/1MB block size, containing one stream, but with nothing written to the
       stream, will create a file 4MB in size.  That's due to the preallocation of the file header, a stream
       header, the stream's first block allocation table, and an initial data block.

SUBROUTINES/METHODS

   new
           $obj = Paranoid::IO::FileMultiplexer->new(
               file        => $fn,
               readOnly    => 0,
               perms       => $perms,
               blockSize   => $bsize,
               );

       This class method creates new objects for accessing the contents of the pass file.  It will create a new
       file if missing, or open an existing file and retrieve the metadata for tuning.

       Only the file name is mandatory.  Block size defaults to 4KB, but if specified, can support from 4KB to
       1MB block sizes, as long as the block size is a multiple of 4KB.

   header
           $header = $obj->header;

       This method returns a reference to the file header block object.  Typically, this has no practical value
       to the developer, but the file header does provide a model method that returns a hash with some predicted
       sizing limitations.  if you want to know the maximum number of supported streams or the maximum size of
       an individual stream, this could be useful.  Calling any other method for that class, however, could
       cause corruption of your file.

   chkConsistency
           $rv = $obj->chkConsistency;

       This method performs a high-level consistency check of the file structure.  At this time it is limited to
       ensuring that every header block (file, stream, and BAT) has a viable signature, and all records inside
       those blocks are allocated and match signatures where appropriate.

       If this method detects any inconsistencies it will mark the object as corrupted, which will prevent any
       further writes to the file in hopes that further corruption can be avoided.

       The file format of this multiplexer is such that a good deal of data can be recovered even with the
       complete loss of the file header.  Corruption in a stream header can even be recovered from.  Only the
       loss of a BAT header can prevent data from being recovered, but even then that will only impact the
       stream it belongs to.  It should not impact other streams.

       Take this with a grain of salt, of course.  There are always caveats to that rule, depending on whether
       the corruption has been detected prior to dangerous writes.  Every read and write to a stream triggers a
       few basic consistency checks prior to progressing, but they are not as thorough as this method's process,
       lest it have and adverse impact on performance.

       This returns a boolean value.

   addStream
           $rv = $obj->addStream($name);

       This method adds a stream to the file, triggering the automatic allocation of three blocks (a stream
       header, the first stream BAT, and the first data block).  It returns a boolean value, denoting success or
       failure.

   strmSeek
           $rv = $obj->strmSeek($sname, $pos, $whence);

       This method acts the same as the core sysseek, taking the same arguments, but with the substitution of
       the stream name for the file handle.  It's return value is also the same.

       Note that the position returned is relative to the data stream, not the file itself.

   strmTell
           $rv = $obj->strmTell($sname);

       This method acts the same as the core tell, taking the same arguments, but with the substitution of the
       stream name for the file handle.  Like strmSeek, the position returned is relative to the data stream,
       not the file itself.

   strmWrite
           $bw = $obj->strmWrite($sname, $content);

       This method acts similarly to a very simplifed syswrite.  It does not support length and offset
       arguments, only the content itself.  It will presume that the stream position has been adjusted as needed
       prior to invocation.

       This returns the number of bytes written.  If everything is working appropriately, that should match the
       byte length of the content itself.

   strmRead
           $br = $obj->strmRead($stream, \$content, $bytes);

       This method acts similarly to a very simplified sysread.  It does not support offset arguments, only a
       scalar reference and the number of bytes to read.  It also presumes that the stream position has been
       adjusted as needed prior to invocation.

       This returns the number of bytes read.  Unless you've asked for more data than has been written to the
       stream, this should match the number of bytes requested.

   strmAppend
           $bw = $obj->strmAppend($sname, $content);

       This method acts similarly to Paranoid::IO's pappend.  It always seeks to the end of the written data
       stream before appending the requested content.  Like strmWrite, it will return the number of bytes
       written.  Like pappend, it does not move the stream position, should you perform additional writes or
       reads.

   strmTruncate
           $bw = $obj->strmTruncate($sname, $neos);

       This method acts similarly to truncate.  It returns a boolean value denoting failure or success.

   DESTROY
       Obviously, one would never need to call this directly, but it is documented here to inform the developer
       that once an object goes out of scope, it will call pclose on the file, explicitly closing and purging
       any cached file handles from Paranoid::IO's internal cache.

DEPENDENCIES

       o   Carp

       o   Fcntl

       o   Paranoid

       o   Paranoid::Debug

       o   Paranoid::IO

       o   Paranoid::IOFileMultiplexer::Block::FileHeader

       o   Paranoid::IOFileMultiplexer::Block::StreamHeader

       o   Paranoid::IOFileMultiplexer::Block::BATHeader

BUGS AND LIMITATIONS

AUTHOR

       Arthur Corliss (corliss@digitalmages.com)

       This software is free software.  Similar to Perl, you can redistribute it and/or modify it under the
       terms of either:

         a)     the GNU General Public License
                <https://www.gnu.org/licenses/gpl-1.0.html> as published by the
                Free Software Foundation <http://www.fsf.org/>; either version 1
                <https://www.gnu.org/licenses/gpl-1.0.html>, or any later version
                <https://www.gnu.org/licenses/license-list.html#GNUGPL>, or
         b)     the Artistic License 2.0
                <https://opensource.org/licenses/Artistic-2.0>,

       subject to the following additional term:  No trademark rights to "Paranoid" have been or are conveyed
       under any of the above licenses.  However, "Paranoid" may be used fairly to describe this unmodified
       software, in good faith, but not as a trademark.

       (c) 2005 - 2021, Arthur Corliss (corliss@digitalmages.com) (tm) 2008 - 2021, Paranoid Inc.
       (www.paranoid.com)