Ubuntu Manpage: flexml - generate validating XML processor and applications from DTD

NAME

       flexml - generate validating XML processor and applications from DTD

SYNOPSIS

       flexml [-ASHDvdnLXV] [-sskel] [-ppubid] [-iinit_header] [-uuri] [-rrootags] [-aactions]
       name[.dtd]

DESCRIPTION

       Flexml reads name.dtd which must be a DTD (Document Type Definition) describing the format
       of XML (Extensible Markup Language) documents, and produces a "validating" XML processor
       with an interface to support XML applications.  Proper applications can be generated
       optionally from special "action files", either for linking or textual combination with the
       processor.

       The generated processor will only validate documents that conform strictly to the DTD,
       without extending it, more precisely we in practice restrict XML rule [28] to

         [28r] doctypedecl ::= '<!DOCTYPE' S Name S ExternalID S? '>'

       where the "ExternalId" denotes the used DTD.  (One might say, in fact, that flexml
       implements "non-extensible" markup. :)

       The generated processor is a flex(1) scanner, by default named name.l with a corresponding
       C header file name.h for separate compilation of generated applications.  Optionally
       flexml takes an actions file with per-element actions and produces a C file with element
       functions for an XML application with entry points called from the XML processor (it can
       also fold the XML application into the XML processor to make stand-alone XML applications
       but this prevents sharing of the processor between applications).

       In "OPTIONS" we list the possible options, in "ACTION FILE FORMAT" we explain how to write
       applications, in "COMPILATION" we explain how to compile produced processors and
       applications into executables, and in "BUGS" we list the current limitations of the system
       before giving standard references.

OPTIONS

Flexml takes the following options.

--stand-alone, -A
Generate a stand-alone scanner application. If combined with -aactions then the
application will be named as actions with the extension replaced by .l, otherwise it
will be in name.l. Conflicts with -S, -H, and -D.

--actions actions, -a actions
Uses the actions file to produce an XML application in the file with the same name as
actions after replacing the extension with .c. If combined with -A then instead the
stand-alone application will include the action functions.

--dummy [app_name], -D [app_name]
Generate a dummy application with just empty functions to be called by the XML
processor. If app_name is not specified on the command line, it defaults to
name-dummy.c. If combined with -a actions then the application will insert the
specified actions and be named as actions with the extension replaced by .c.
Conflicts with -A; implied by -a unless either of -SHD is specified.

--debug, -d
Turns on debug mode in the flex scanner and also prints out the details of the DTD
analysis performed by flexml.

--header [header_name], -H [header_name]
Generate the header file. If the header_name is not specified on the command line,
defaults to name.h. Conflicts with -A; on by default if none of -SHD specified.

--lineno, -L
Makes the XML processor (as produced by flex(1)) count the lines in the input and keep
it available to XML application actions in the integer "yylineno". (This is off by
default as the performance overhead is significant.)

--quiet, -q
Prevents the XML processor (as produced by flex(1)) from reporting the error it runs
into on stderr. Instead, users will have to pool for error messages with the
parse_err_msg() function. By default, error messages are written on stderr.

--dry-run, -n
"Dry-run": do not produce any of the output files.

--pubid pubid, -p pubid
Sets the document type to be "PUBLIC" with the identifier pubid instead of "SYSTEM",
the default.

--init_header init_header, -i init_header
Puts a line containing "#include "init_header"" in the "%{...%}" section at the top of
the generated .l file. This may be useful for making various flex "#define"s, for
example "YY_INPUT" or "YY_DECL".

--sysid=sysid
Overrides the "SYSTEM" id of the accepted DTD. Sometimes useful when your dtd is
placed in a subdirectory.

--root-tags roottags, -r roottags
Restricts the XML processor to validate only documents with one of the root elements
listed in the comma-separated roottags.

--scanner [scanner_name], -S [scanner_name]
Generate the scanner. If scanner_name is not given on command line, it defaults to
name.l. Conflicts with -A; on by default if none of -SHD specified.

--skel skel, -s skel
Use the skeleton scanner skel instead of the default.

--act-bin flexml-act, -T flexml-act
This is an internal option mainly used to test versions of flexml not installed yet.

--stack-increment stack_increment, -b stack_increment
Sets the FLEXML_BUFFERSTACKSIZE to stack_increment (100000 by default). This controls
how much the data stack grows in each realloc().

--tag-prefix STRING, -O STRING
Use STRING to differentiate multiple versions of flexml in the same C code, just like
the -P flex argument.

--uri uri, -u uri
Sets the URI of the DTD, used in the "DOCTYPE" header, to the specified uri (the
default is the DTD name).

--verbose, -v
Be verbose: echo each DTD declaration (after parameter expansion).

--version, -V
Print the version of flexml and exit.

ACTION FILE FORMAT

       Action files, passed to the -a option, are XML documents conforming to the DTD
       flexml-act.dtd which is the following:

         <!ELEMENT actions ((top|start|end)*,main?)>
         <!ENTITY % C-code "(#PCDATA)">
         <!ELEMENT top   %C-code;>
         <!ELEMENT start %C-code;>  <!ATTLIST start tag NMTOKEN #REQUIRED>
         <!ELEMENT end   %C-code;>  <!ATTLIST end   tag NMTOKEN #REQUIRED>
         <!ELEMENT main  %C-code;>

       The elements should be used as follows:

       "top"
           Use for top-level C code such as global declarations, utility functions, etc.

       "start"
           Attaches the code as an action to the element with the name of the required ""tag""
           attribute.  The ""%C-code;"" component should be C code suitable for inclusion in a C
           block (i.e., within "{"..."}" so it may contain local variables); furthermore the
           following extensions are available:

           "{"attribute"}": Can be used to access the value of the attribute as set with
           attribute"="value in the start tag.  In C, "{"attribute"}" will be interpreted
           depending on the declaration of the attribute. If the attribute is declared as an
           enumerated type like

             <!ATTLIST attrib (alt1 | alt2 |...) ...>

           then the C attribute value is of an enumerated type with the elements written
           "{"attribute"="alt1"}", "{"attribute"="alt2"}", etc.; furthermore an unset attribute
           has the "value" "{!"attribute"}".  If the attribute is not an enumeration then
           "{"attribute"}" is a null-terminated C string (of type "char*") and "{!"attribute"}"
           is "NULL".

       "end"
           Similarly attaches the code as an action to the end tag with the name of the required
           ""tag"" attribute; also here the ""%C-code;"" component should be C code suitable for
           inclusion in a C block.  In case the element has "Mixed" contents, i.e, was declared
           to permit "#PCDATA", then the following variable is available:

           "{#PCDATA}": Contains the text ("#PCDATA") of the element as a null-terminated C
           string (of type "char*").  In case the Mixed contents element actually mixed text and
           child elements then "pcdata" contains the plain concatenation of the text fragments as
           one string.

       "main"
           Finally, an optional ""main"" element can contain the C "main" function of the XML
           application.  Normally the "main" function should include (at least) one call of the
           XML processor:

           "yylex()": Invokes the XML processor produced by flex(1) on the XML document found on
           the standard input (actually the "yyin" file handle: see the manual for flex(1) for
           information on how to change this as well as the name "yylex").

           If no "main" action is provided then the following is used:

             int main() { exit(yylex()); }

       It is advisable to use XML <"![CDATA[" ... "]]"> sections for the C code to make sure that
       all characters are properly passed to the output file.

       Finally note that Flexml handles empty elements <tag"/"> as equivalent to <tag><"/"tag>.

COMPILATION

       The following make(1) file fragment shows how one can compile flexml-generated programs:

         # Programs.
         FLEXML = flexml -v

         # Generate linkable XML processor with header for application.
         %.l %.h: %.dtd
                 $(FLEXML) $<

         # Generate C source from flex scanner.
         %.c:    %.l
                 $(FLEX) -Bs -o"$@" "$<"

         # Generate XML application C source to link with processor.
         # Note: The dependency must be of the form "appl.c: appl.act proc.dtd".
         %.c:    %.act
                 $(FLEXML) -D -a $^

         # Direct generation of stand-alone XML processor+application.
         # Note: The dependency must be of the form "appl.l: appl.act proc.dtd".
         %.l:    %.act
                 $(FLEXML) -A -a $^

BUGS

       The present version of flexml is to be considered in "early beta" state thus bugs should
       be expected (and the author would like to hear about them).  Here are some known
       restrictions that we hope to overcome in the future:

       •   The character set is merely ASCII (actually flex(1) handles 8 bit characters but only
           the ASCII character set is common with the XML default UTF-8 encoding).

       •   "ID" type attributes are not validated for uniqueness; "IDREF" and "IDREFS" attributes
           are not validated for existence.

       •   The "ENTITY" and "ENTITIES" attribute types are not supported.

       •   "NOTATION" declarations are not supported.

       •   The various "xml:"-attributes are treated like any other attributes; in particular
           "xml:spaces" should be supported.

       •   The DTD parser is presently a perl hack so it may parse some DTDs badly; in particular
           the expansion of parameter entities may not conform fully to the XML specification.

       •   A child should be able to "return" a value for the parent (also called a synthesised
           attribute).  Similarly an element in Mixed contents should be able to inject text into
           the "pcdata" of the parent.

FILES

       /usr/share/flexml/skel
           The skeleton scanner with the generic parts of XML scanning.

       /usr/share/doc/flexml/flexml/
           License, further documentation, and examples.

AUTHOR

       Flexml was written by Kristoffer Rose, <"krisrose@debian.org">.

COPYRIGHT

       The program is Copyright (c) 1999 Kristoffer Rose (all rights reserved) and distributed
       under the GNU General Public License (GPL, also known as "copyleft", which clarifies that
       the author provides absolutely no warranty for flexml and ensures that flexml is and will
       remain available for all uses, even comercial).

ACKNOWLEDGEMENT

       I am grateful to NTSys (France) for supporting the development of flexml.  Finally extend
       my severe thanks to Jef Poskanzer, Vern Paxson, and the rest of the flex maintainers and
       GNU developers for a great tool.