Provided by: libparanoid-perl_0.34-1_all bug

NAME

       Paranoid::Input - Paranoid input functions

VERSION

       $Id: Input.pm,v 0.20 2011/04/13 22:01:43 acorliss Exp $

SYNOPSIS

         use Paranoid::Input;

         FSZLIMIT  = 64 * 1024;
         LNSZLIMIT = 2 * 1024;

         $rv = slurp($filename, \@lines);

         $rv = sip($filename, \@lines);
         $rv = sip($filename, \@lines, 1);
         $rv = tail($filename, \@lines);
         $rv = tail($filename, \@lines, -100);
         $rv = tail($filename, \@lines, -100, 1);
         $rv = closeFile($filename);

         addTaintRegex("telephone", qr/\(\d{3}\)\s+\d{3}-\d{4}/);
         $rv = detaint($userInput, "login", \$val);

         $rv = stringMatch($input, @strings);

DESCRIPTION

       The modules provide safer routines to use for input activities such as reading files and
       detainting user input.

       The sip and tail functions keep open file handles.  Even so, it's specifically built to be
       safe for use in fork scenarios.  You can being a tail or sip in a parent, fork children,
       and all process can independently continue sipping with no confusion between processes.
       This is possible because we check to see if the PID matches the PID in effect with the
       file was opened.  If not, we reopen the file and seek to the same position so we can pick
       up where we left off.

       The slurp function isn't affected by this since it reads entire files in a single call, no
       filehandles are kept open between calls.

       All file-reading functions use and obey flock.

       addTaintRegex is only exported if this module is used with the :all target.

SUBROUTINES/METHODS

   FSZLIMIT
       The value returned/set by this lvalue function is the maximum file size that will be read
       into memory.  This affects functions like slurp (documented below).  Unless explicitly set
       this defaults to 16KB.

   LNSZLIMIT
       The valute returned/set by this lvalue function is the maximum line length supported by
       functions like sip (documented below).  Unless explicitly set this defaults to 2KB.

   slurp
         $rv = slurp($filename, \@lines);

       This function allows you to read a text file in its entirety into memory, the lines of
       which are placed into the passed array reference.  This function will only read files up
       to FSZLIMIT in size.  Flocking is used (with LOCK_SH) and the read is a blocking read.

       An optional third argument sets a boolean flag which, if true, determines if all lines are
       automatically chomped.  If chomping is enabled this will strip both UNIX and DOS line
       separators.

       The return value is false if the read was unsuccessful or the file's size exceeded
       FSZLIMIT.  In the latter case the array reference will still be populated with what was
       read.  The reason for the failure can be retrieved from Paranoid::ERROR.

   sip
           $rv = sip($filename, \@lines);
           $rv = sip($filename, \@lines, 1);

       This function allows you to read a text file into memory in chunks, the lines of which are
       placed into the passed array reference.  The chunks are read in at up to FSZLIMIT in size
       at a time.  Like slurp file locking is used and autochomping is also supported.

       This function returns true if there was input read, but if any or all of the input splits
       into lines greater than LNSZLIMIT it will discard that input and return -1 (which is still
       technically boolean true).

       The reason why we now care about line lengths is because it's very likely that line
       boundaries will not fall neatly along our chunk boundaries, so we need to take trailing
       portions of unterminated lines and store them to be joined with the remainder from the
       next sip.

       When sip comes up to then end of the file it does not close the file, you're required to
       close it explicitly with closeFile.  This is done intentionally to allow the process to
       continue to effectively tail a growing file.  Unlike the tail function provided here,
       though, it does perform any additional checks to see if the file you're reading was
       truncated or replaced.

       An optional third argument tells sip whether or not to chomp all the read lines before
       returning.

   tail
           $rv = tail($filename, \@lines);
           $rv = tail($filename, \@lines, -100);
           $rv = tail($filename, \@lines, -100, 1);

       The only difference between this function and sip is that tail opens the file and
       immediately seeks to the end.  If an optional third argument is passed it will seek
       backwards to extract and return that number of lines (if possible).  Depending on the
       number passed one must be prepared for enough memory to be allocated to store LNSZLIMIT *
       that number.

       This function returns true if the file is successfully open, regardless of whether any new
       input was there to be read.  It only returns false if there was a problem opening or
       reading the file.

       Tail should be called with the third argument for the first tail of a file.  Continuing to
       use it for subsequent calls will cause the number of lines returned to be truncated to fit
       within that limit.

       Like sip, one must explicitly close a file with closeFile.

   closeFile
         $rv = closeFile($filename);

       This function closes any open file descriptors that may have been opened via sip or tail
       for the named file.  This returns the value of the close function if the file was open,
       otherwise it returns true.

   addTaintRegex
         addTaintRegex("telephone", qr/\(\d{3}\)\s+\d{3}-\d{4}/);

       This adds a regular expression which can used by name to detaint user input via the
       detaint function.  This will allow you to overwrite the internally provided regexes or as
       well as your own.

   detaint
         $rv = detaint($userInput, "login", \$val);

       This function populates the passed reference with the detainted input from the first
       argument.  The second argument specifies the type of data in the first argument, and is
       used to validate the input before detainting.  The following data types are currently
       known:

         alphabetic            ^([a-zA-Z]+)$
         alphanumeric          ^([a-zA-Z0-9])$
         email                 ^([a-zA-Z][\w\.\-]*\@
                               (?:[a-zA-Z0-9][a-zA-Z0-9\-]*\.)*
                               [a-zA-Z0-9]+)$
         filename              ^[/ \w\-\.:,@\+]+\[?$
         fileglob              ^[/ \w\-\.:,@\+\*\?\{\}\[\]]+\[?$
         hostname              ^(?:[a-zA-Z0-9][a-zA-Z0-9\-]*\.)*
                               [a-zA-Z0-9]+)$
         ipaddr                ^(?:\d+\.){3}\d+$
         netaddr               ^(?:\d+\.){3}\d+(?:/(?:\d+|
                               (?:\d+\.){3}\d+))?$
         login                 ^([a-zA-Z][\w\.\-]*)$
         nometa                ^([^\`\$\!\@]+)$
         number                ^([+\-]?[0-9]+(?:\.[0-9]+)?)$

       If the first argument fails to match against these regular expressions the function will
       return 0.  If the string passed is either undefined or a zero-length string it will also
       return 0.  And finally, if you attempt to use an unknown (or unregistered) data type it
       will also return 0, and log an error message in Paranoid::ERROR.

       NOTE:  This is a small alteration in previous behavior.  In previous versions if an undef
       or zero-length string was passed, or if the data type was unknown the code would croak.
       That was, perhaps, a tad overzealous on my part.

   stringMatch
         $rv = stringMatch($input, @strings);

       This function does a multiline case insensitive regex match against the input for every
       string passed for matching.  This does safe quoted matches (\Q$string\E) for all the
       strings, unless the string is a perl Regexp (defined with qr//) or begins and ends with /.

       NOTE: this performs a study in hopes that for a large number of regexes will be performed
       faster.  This may not always be the case.

DEPENDENCIES

       o   Fcntl

       o   Paranoid

       o   Paranoid::Debug

BUGS AND LIMITATIONS

       If you fork a process that's already opened a file with sip or tail a new file descriptor
       will be opened for the child process.  But what may be less obvious is that with a newly
       opened file descriptor you will be starting back from the beginning (or end, in the case
       of tail) of the file, rather than from where ever you were before the fork.

AUTHOR

       Arthur Corliss (corliss@digitalmages.com)

LICENSE AND COPYRIGHT

       This software is licensed under the same terms as Perl, itself.  Please see
       http://dev.perl.org/licenses/ for more information.

       (c) 2005, Arthur Corliss (corliss@digitalmages.com)