Provided by: libxml-sax-machines-perl_0.46-2_all
NAME
XML::SAX::ByRecord - Record oriented processing of (data) documents
VERSION
version 0.46
SYNOPSIS
use XML::SAX::Machines qw( ByRecord ) ; my $m = ByRecord( "My::RecordFilter1", "My::RecordFilter2", ... { Handler => $h, ## optional } ); $m->parse_uri( "foo.xml" );
DESCRIPTION
XML::SAX::ByRecord is a SAX machine that treats a document as a series of records. Everything before and after the records is emitted as-is while the records are excerpted in to little mini-documents and run one at a time through the filter pipeline contained in ByRecord. The output is a document that has the same exact things before, after, and between the records that the input document did, but which has run each record through a filter. So if a document has 10 records in it, the per-record filter pipeline will see 10 sets of ( start_document, body of record, end_document ) events. An example is below. This has several use cases: • Big, record oriented documents Big documents can be treated a record at a time with various DOM oriented processors like XML::Filter::XSLT. • Streaming XML Small sections of an XML stream can be run through a document processor without holding up the stream. • Record oriented style sheets / processors Sometimes it's just plain easier to write a style sheet or SAX filter that applies to a single record at at time, rather than having to run through a series of records. Topology Here's how the innards look: +-----------------------------------------------------------+ | An XML:SAX::ByRecord | | Intake | | +----------+ +---------+ +--------+ Exhaust | --+-->| Splitter |--->| Stage_1 |-->...-->| Merger |----------+-----> | +----------+ +---------+ +--------+ | | \ ^ | | \ | | | +---------->---------------+ | | Events not in any records | | | +-----------------------------------------------------------+ The "Splitter" is an XML::Filter::DocSplitter by default, and the "Merger" is an XML::Filter::Merger by default. The line that bypasses the "Stage_1 ..." filter pipeline is used for all events that do not occur in a record. All events that occur in a record pass through the filter pipeline. Example Here's a quick little filter to uppercase text content: package My::Filter::Uc; use vars qw( @ISA ); @ISA = qw( XML::SAX::Base ); use XML::SAX::Base; sub characters { my $self = shift; my ( $data ) = @_; $data->{Data} = uc $data->{Data}; $self->SUPER::characters( @_ ); } And here's a little machine that uses it: $m = Pipeline( ByRecord( "My::Filter::Uc" ), \$out, ); When fed a document like: <root> a <rec>b</rec> c <rec>d</rec> e <rec>f</rec> g </root> the output looks like: <root> a <rec>B</rec> c <rec>C</rec> e <rec>D</rec> g </root> and the My::Filter::Uc got three sets of events like: start_document start_element: <rec> characters: 'b' end_element: </rec> end_document start_document start_element: <rec> characters: 'd' end_element: </rec> end_document start_document start_element: <rec> characters: 'f' end_element: </rec> end_document
NAME
XML::SAX::ByRecord - Record oriented processing of (data) documents
METHODS
new my $d = XML::SAX::ByRecord->new( @channels, \%options ); Longhand for calling the ByRecord function exported by XML::SAX::Machines.
CREDIT
Proposed by Matt Sergeant, with advise by Kip Hampton and Robin Berjon.
Writing an aggregator.
To be written. Pretty much just that "start_manifold_processing" and "end_manifold_processing" need to be provided. See XML::Filter::Merger and it's source code for a starter.
AUTHORS
• Barry Slaymaker • Chris Prather <chris@prather.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2013 by Barry Slaymaker. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.