Provided by: libdatetime-format-builder-perl_0.8100-1_all bug

NAME

       DateTime::Format::Builder - Create DateTime parser classes and objects.

VERSION

       version 0.81

SYNOPSIS

           package DateTime::Format::Brief;

           use DateTime::Format::Builder
           (
               parsers => {
                   parse_datetime => [
                   {
                       regex => qr/^(\d{4})(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)$/,
                       params => [qw( year month day hour minute second )],
                   },
                   {
                       regex => qr/^(\d{4})(\d\d)(\d\d)$/,
                       params => [qw( year month day )],
                   },
                   ],
               }
           );

DESCRIPTION

       DateTime::Format::Builder creates DateTime parsers.  Many string formats of dates and
       times are simple and just require a basic regular expression to extract the relevant
       information. Builder provides a simple way to do this without writing reams of structural
       code.

       Builder provides a number of methods, most of which you'll never need, or at least rarely
       need. They're provided more for exposing of the module's innards to any subclasses, or for
       when you need to do something slightly beyond what I expected.

       This creates the end methods. Coderefs die on bad parses, return "DateTime" objects on
       good parse.

TUTORIAL

       See DateTime::Format::Builder::Tutorial.

ERROR HANDLING AND BAD PARSES

       Often, I will speak of "undef" being returned, however that's not strictly true.

       When a simple single specification is given for a method, the method isn't given a single
       parser directly. It's given a wrapper that will call "on_fail()" if the single parser
       returns "undef". The single parser must return "undef" so that a multiple parser can work
       nicely and actual errors can be thrown from any of the callbacks.

       Similarly, any multiple parsers will only call "on_fail()" right at the end when it's
       tried all it could.

       "on_fail()" (see later) is defined, by default, to throw an error.

       Multiple parser specifications can also specify "on_fail" with a coderef as an argument in
       the options block. This will take precedence over the inheritable and over-ridable method.

       That said, don't throw real errors from callbacks in multiple parser specifications unless
       you really want parsing to stop right there and not try any other parsers.

       In summary: calling a method will result in either a "DateTime" object being returned or
       an error being thrown (unless you've overridden "on_fail()" or "create_method()", or
       you've specified a "on_fail" key to a multiple parser specification).

       Individual parsers (be they multiple parsers or single parsers) will return either the
       "DateTime" object or "undef".

SINGLE SPECIFICATIONS

       A single specification is a hash ref of instructions on how to create a parser.

       The precise set of keys and values varies according to parser type. There are some common
       ones though:

       •   length is an optional parameter that can be used to specify that this particular regex
           is only applicable to strings of a certain fixed length. This can be used to make
           parsers more efficient. It's strongly recommended that any parser that can use this
           parameter does.

           You may happily specify the same length twice. The parsers will be tried in order of
           specification.

           You can also specify multiple lengths by giving it an arrayref of numbers rather than
           just a single scalar.  If doing so, please keep the number of lengths to a minimum.

           If any specifications without lengths are given and the particular length parser
           fails, then the non-length parsers are tried.

           This parameter is ignored unless the specification is part of a multiple parser
           specification.

       •   label provides a name for the specification and is passed to some of the callbacks
           about to mentioned.

       •   on_match and on_fail are callbacks. Both routines will be called with parameters of:

           •   input, being the input to the parser (after any preprocessing callbacks).

           •   label, being the label of the parser, if there is one.

           •   self, being the object on which the method has been invoked (which may just be a
               class name). Naturally, you can then invoke your own methods on it do get
               information you want.

           •   args, being an arrayref of any passed arguments, if any.  If there were no
               arguments, then this parameter is not given.

           These routines will be called depending on whether the regex match succeeded or
           failed.

       •   preprocess is a callback provided for cleaning up input prior to parsing. It's given a
           hash as arguments with the following keys:

           •   input being the datetime string the parser was given (if using multiple
               specifications and an overall preprocess then this is the date after it's been
               through that preprocessor).

           •   parsed being the state of parsing so far. Usually empty at this point unless an
               overall preprocess was given.  Items may be placed in it and will be given to any
               postprocessor and "DateTime->new" (unless the postprocessor deletes it).

           •   self, args, label as per on_match and on_fail.

           The return value from the routine is what is given to the regex. Note that this is
           last code stop before the match.

           Note: mixing length and a preprocess that modifies the length of the input string is
           probably not what you meant to do. You probably meant to use the multiple parser
           variant of preprocess which is done before any length calculations. This "single
           parser" variant of preprocess is performed after any length calculations.

       •   postprocess is the last code stop before "DateTime->new()" is called. It's given the
           same arguments as preprocess. This allows it to modify the parsed parameters after the
           parse and before the creation of the object. For example, you might use:

               {
                   regex  => qr/^(\d\d) (\d\d) (\d\d)$/,
                   params => [qw( year  month  day   )],
                   postprocess => \&_fix_year,
               }

           where "_fix_year" is defined as:

               sub _fix_year
               {
                   my %args = @_;
                   my ($date, $p) = @args{qw( input parsed )};
                   $p->{year} += $p->{year} > 69 ? 1900 : 2000;
                   return 1;
               }

           This will cause the two digit years to be corrected according to the cut off. If the
           year was '69' or lower, then it is made into 2069 (or 2045, or whatever the year was
           parsed as). Otherwise it is assumed to be 19xx. The DateTime::Format::Mail module uses
           code similar to this (only it allows the cut off to be configured and it doesn't use
           Builder).

           Note: It is very important to return an explicit value from the postprocess callback.
           If the return value is false then the parse is taken to have failed. If the return
           value is true, then the parse is taken to have succeeded and "DateTime->new()" is
           called.

       See the documentation for the individual parsers for their valid keys.

       Parsers at the time of writing are:

       •   DateTime::Format::Builder::Parser::Regex - provides regular expression based parsing.

       •   DateTime::Format::Builder::Parser::Strptime - provides strptime based parsing.

   Subroutines / coderefs as specifications.
       A single parser specification can be a coderef. This was added mostly because it could be
       and because I knew someone, somewhere, would want to use it.

       If the specification is a reference to a piece of code, be it a subroutine, anonymous, or
       whatever, then it's passed more or less straight through. The code should return "undef"
       in event of failure (or any false value, but "undef" is strongly preferred), or a true
       value in the event of success (ideally a "DateTime" object or some object that has the
       same interface).

       This all said, I generally wouldn't recommend using this feature unless you have to.

   Callbacks
       I mention a number of callbacks in this document.

       Any time you see a callback being mentioned, you can, if you like, substitute an arrayref
       of coderefs rather than having the straight coderef.

MULTIPLE SPECIFICATIONS

       These are very easily described as an array of single specifications.

       Note that if the first element of the array is an arrayref, then you're specifying
       options.

       •   preprocess lets you specify a preprocessor that is called before any of the parsers
           are tried. This lets you do things like strip off timezones or any unnecessary data.
           The most common use people have for it at present is to get the input date to a
           particular length so that the length is usable (DateTime::Format::ICal would use it to
           strip off the variable length timezone).

           Arguments are as for the single parser preprocess variant with the exception that
           label is never given.

       •   on_fail should be a reference to a subroutine that is called if the parser fails. If
           this is not provided, the default action is to call
           "DateTime::Format::Builder::on_fail", or the "on_fail" method of the subclass of DTFB
           that was used to create the parser.

EXECUTION FLOW

       Builder allows you to plug in a fair few callbacks, which can make following how a parse
       failed (or succeeded unexpectedly) somewhat tricky.

   For Single Specifications
       A single specification will do the following:

       User calls parser:

              my $dt = $class->parse_datetime( $string );

       1.  preprocess is called. It's given $string and a reference to the parsing workspace
           hash, which we'll call $p. At this point, $p is empty. The return value is used as
           $date for the rest of this single parser.  Anything put in $p is also used for the
           rest of this single parser.

       2.  regex is applied.

       3.  If regex did not match, then on_fail is called (and is given $date and also label if
           it was defined). Any return value is ignored and the next thing is for the single
           parser to return "undef".

           If regex did match, then on_match is called with the same arguments as would be given
           to on_fail. The return value is similarly ignored, but we then move to step 4 rather
           than exiting the parser.

       4.  postprocess is called with $date and a filled out $p. The return value is taken as a
           indication of whether the parse was a success or not. If it wasn't a success then the
           single parser will exit at this point, returning undef.

       5.  "DateTime->new()" is called and the user is given the resultant "DateTime" object.

       See the section on error handling regarding the "undef"s mentioned above.

   For Multiple Specifications
       With multiple specifications:

       User calls parser:

             my $dt = $class->complex_parse( $string );

       1.  The overall preprocessor is called and is given $string and the hashref $p
           (identically to the per parser preprocess mentioned in the previous flow).

           If the callback modifies $p then a copy of $p is given to each of the individual
           parsers.  This is so parsers won't accidentally pollute each other's workspace.

       2.  If an appropriate length specific parser is found, then it is called and the single
           parser flow (see the previous section) is followed, and the parser is given a copy of
           $p and the return value of the overall preprocessor as $date.

           If a "DateTime" object was returned so we go straight back to the user.

           If no appropriate parser was found, or the parser returned "undef", then we progress
           to step 3!

       3.  Any non-length based parsers are tried in the order they were specified.

           For each of those the single specification flow above is performed, and is given a
           copy of the output from the overall preprocessor.

           If a real "DateTime" object is returned then we exit back to the user.

           If no parser could parse, then an error is thrown.

       See the section on error handling regarding the "undef"s mentioned above.

METHODS

       In the general course of things you won't need any of the methods. Life often throws
       unexpected things at us so the methods are all available for use.

   import
       "import()" is a wrapper for "create_class()". If you specify the class option (see
       documentation for "create_class()") it will be ignored.

   create_class
       This method can be used as the runtime equivalent of "import()". That is, it takes the
       exact same parameters as when one does:

          use DateTime::Format::Builder ( blah blah blah )

       That can be (almost) equivalently written as:

          use DateTime::Format::Builder;
          DateTime::Format::Builder->create_class( blah blah blah );

       The difference being that the first is done at compile time while the second is done at
       run time.

       In the tutorial I said there were only two parameters at present. I lied. There are
       actually three of them.

       •   parsers takes a hashref of methods and their parser specifications. See the
           DateTime::Format::Builder::Tutorial for details.

           Note that if you define a subroutine of the same name as one of the methods you define
           here, an error will be thrown.

       •   constructor determines whether and how to create a "new()" function in the new class.
           If given a true value, a constructor is created. If given a false value, one isn't.

           If given an anonymous sub or a reference to a sub then that is used as "new()".

           The default is 1 (that is, create a constructor using our default code which simply
           creates a hashref and blesses it).

           If your class defines its own "new()" method it will not be overwritten. If you define
           your own "new()" and also tell Builder to define one an error will be thrown.

       •   verbose takes a value. If the value is undef, then logging is disabled. If the value
           is a filehandle then that's where logging will go. If it's a true value, then output
           will go to "STDERR".

           Alternatively, call "$DateTime::Format::Builder::verbose()" with the relevant value.
           Whichever value is given more recently is adhered to.

           Be aware that verbosity is a global wide setting.

       •   class is optional and specifies the name of the class in which to create the specified
           methods.

           If using this method in the guise of "import()" then this field will cause an error so
           it is only of use when calling as "create_class()".

       •   version is also optional and specifies the value to give $VERSION in the class. It's
           generally not recommended unless you're combining with the class option. A
           "ExtUtils::MakeMaker" / "CPAN" compliant version specification is much better.

       In addition to creating any of the methods it also creates a "new()" method that can
       instantiate (or clone) objects.

SUBCLASSING

       In the rest of the documentation I've often lied in order to get some of the ideas across
       more easily. The thing is, this module's very flexible. You can get markedly different
       behaviour from simply subclassing it and overriding some methods.

   create_method
       Given a parser coderef, returns a coderef that is suitable to be a method.

       The default action is to call "on_fail()" in the event of a non-parse, but you can make it
       do whatever you want.

   on_fail
       This is called in the event of a non-parse (unless you've overridden "create_method()" to
       do something else.

       The single argument is the input string. The default action is to call "croak()". Above,
       where I've said parsers or methods throw errors, this is the method that is doing the
       error throwing.

       You could conceivably override this method to, say, return "undef".

USING BUILDER OBJECTS aka USERS USING BUILDER

       The methods listed in the METHODS section are all you generally need when creating your
       own class. Sometimes you may not want a full blown class to parse something just for this
       one program. Some methods are provided to make that task easier.

   new
       The basic constructor. It takes no arguments, merely returns a new
       "DateTime::Format::Builder" object.

           my $parser = DateTime::Format::Builder->new();

       If called as a method on an object (rather than as a class method), then it clones the
       object.

           my $clone = $parser->new();

   clone
       Provided for those who prefer an explicit "clone()" method rather than using "new()" as an
       object method.

           my $clone_of_clone = $clone->clone();

   parser
       Given either a single or multiple parser specification, sets the object to have a parser
       based on that specification.

           $parser->parser(
               regex  => qr/^ (\d{4}) (\d\d) (\d\d) $/x;
               params => [qw( year    month  day    )],
           );

       The arguments given to "parser()" are handed directly to "create_parser()". The resultant
       parser is passed to "set_parser()".

       If called as an object method, it returns the object.

       If called as a class method, it creates a new object, sets its parser and returns that
       object.

   set_parser
       Sets the parser of the object to the given parser.

          $parser->set_parser( $coderef );

       Note: this method does not take specifications. It also does not take anything except
       coderefs. Luckily, coderefs are what most of the other methods produce.

       The method return value is the object itself.

   get_parser
       Returns the parser the object is using.

          my $code = $parser->get_parser();

   parse_datetime
       Given a string, it calls the parser and returns the "DateTime" object that results.

          my $dt = $parser->parse_datetime( "1979 07 16" );

       The return value, if not a "DateTime" object, is whatever the parser wants to return.
       Generally this means that if the parse failed an error will be thrown.

   format_datetime
       If you call this function, it will throw an errror.

LONGER EXAMPLES

       Some longer examples are provided in the distribution. These implement some of the common
       parsing DateTime modules using Builder. Each of them are, or were, drop in replacements
       for the modules at the time of writing them.

THANKS

       Dave Rolsky (DROLSKY) for kickstarting the DateTime project, writing
       DateTime::Format::ICal and DateTime::Format::MySQL, and some much needed review.

       Joshua Hoblitt (JHOBLITT) for the concept, some of the API, impetus for writing the
       multilength code (both one length with multiple parsers and single parser with multiple
       lengths), blame for the Regex custom constructor code, spotting a bug in Dispatch, and
       more much needed review.

       Kellan Elliott-McCrea (KELLAN) for even more review, suggestions, DateTime::Format::W3CDTF
       and the encouragement to rewrite these docs almost 100%!

       Claus Faerber (CFAERBER) for having me get around to fixing the auto-constructor writing,
       providing the 'args'/'self' patch, and suggesting the multi-callbacks.

       Rick Measham (RICKM) for DateTime::Format::Strptime which Builder now supports.

       Matthew McGillis for pointing out that "on_fail" overriding should be simpler.

       Simon Cozens (SIMON) for saying it was cool.

SUPPORT

       Support for this module is provided via the datetime@perl.org email list. See
       http://lists.perl.org/ for more details.

       Alternatively, log them via the CPAN RT system via the web or email:

           http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DateTime%3A%3AFormat%3A%3ABuilder
           bug-datetime-format-builder@rt.cpan.org

       This makes it much easier for me to track things and thus means your problem is less
       likely to be neglected.

SEE ALSO

       "datetime@perl.org" mailing list.

       http://datetime.perl.org/

       perl, DateTime, DateTime::Format::Builder::Tutorial, DateTime::Format::Builder::Parser

AUTHORS

       •   Dave Rolsky <autarch@urth.org>

       •   Iain Truskett

COPYRIGHT AND LICENSE

       This software is Copyright (c) 2013 by Dave Rolsky.

       This is free software, licensed under:

         The Artistic License 2.0 (GPL Compatible)