Provided by: erlang-manpages_16.b.3-dfsg-1ubuntu2_all bug

NAME

       xmerl_sax_parser - XML SAX parser API

DESCRIPTION

       A SAX parser for XML that sends the events through a callback interface. SAX is the Simple
       API for XML, originally a Java-only API. SAX was the first widely adopted API for  XML  in
       Java, and is a de facto standard where there are versions for several programming language
       environments other than Java.

DATA TYPES

         option():
           Options used to customize the behaviour of the parser. Possible options are:

           {continuation_fun, ContinuationFun}:
             ContinuationFun is a call back function to decide what to do if the parser runs into
             EOF before the document is complete.

           {continuation_state, term()}:
              State that is accessible in the continuation call back function.

           {event_fun, EventFun}:
             EventFun is the call back function for parser events.

           {event_state, term()}:
              State that is accessible in the event call back function.

           {file_type, FileType}:
              Flag  that  tells  the  parser  if it's parsing a DTD or a normal XML file (default
             normal).

             * FileType = normal | dtd

           {encoding, Encoding}:
              Set default character set used (default UTF-8). This character set is used only  if
             not explicitly given by the XML document.

             * Encoding = utf8 | {utf16,big} | {utf16,little} | latin1 | list

           skip_external_dtd:
              Skips the external DTD during parsing.

         :

         event():
           The SAX events that are sent to the user via the callback.

           startDocument:
              Receive  notification of the beginning of a document. The SAX parser will send this
             event only once before any other event callbacks.

           endDocument:
              Receive notification of the end of a document. The SAX parser will send this  event
             only once, and it will be the last event during the parse.

           {startPrefixMapping, Prefix, Uri}:
              Begin the scope of a prefix-URI Namespace mapping. Note that start/endPrefixMapping
             events are not guaranteed  to  be  properly  nested  relative  to  each  other:  all
             startPrefixMapping   events   will   occur   immediately  before  the  corresponding
             startElement event, and all endPrefixMapping events will occur immediately after the
             corresponding  endElement  event, but their order is not otherwise guaranteed. There
             will not be  start/endPrefixMapping  events  for  the  "xml"  prefix,  since  it  is
             predeclared and immutable.

             * Prefix = string()

             * Uri = string()

           {endPrefixMapping, Prefix}:
              End the scope of a prefix-URI mapping.

             * Prefix = string()

           {startElement, Uri, LocalName, QualifiedName, Attributes}:
              Receive  notification  of  the  beginning  of an element. The Parser will send this
             event at the beginning of every element  in  the  XML  document;  there  will  be  a
             corresponding  endElement  event for every startElement event (even when the element
             is empty). All of the element's content will  be  reported,  in  order,  before  the
             corresponding endElement event.

             * Uri = string()

             * LocalName = string()

             * QualifiedName = {Prefix, LocalName}

             * Prefix = string()

             * Attributes = [{Uri, Prefix, AttributeName, Value}]

             * AttributeName = string()

             * Value = string()

           {endElement, Uri, LocalName, QualifiedName}:
              Receive  notification of the end of an element. The SAX parser will send this event
             at the end of every element in the XML  document;  there  will  be  a  corresponding
             startElement event for every endElement event (even when the element is empty).

             * Uri = string()

             * LocalName = string()

             * QualifiedName = {Prefix, LocalName}

             * Prefix = string()

           {characters, string()}:
              Receive notification of character data.

           {ignorableWhitespace, string()}:
              Receive notification of ignorable whitespace in element content.

           {processingInstruction, Target, Data}:
              Receive  notification  of a processing instruction. The Parser will send this event
             once for each processing instruction found: note that  processing  instructions  may
             occur before or after the main document element.

             * Target = string()

             * Data = string()

           {comment, string()}:
              Report  an  XML  comment  anywhere  in the document (both inside and outside of the
             document element).

           startCDATA:
              Report the start of a CDATA section. The contents of  the  CDATA  section  will  be
             reported through the regular characters event.

           endCDATA:
              Report the end of a CDATA section.

           {startDTD, Name, PublicId, SystemId}:
              Report  the  start  of  DTD  declarations,  it's reporting the start of the DOCTYPE
             declaration. If the document has no DOCTYPE declaration,  this  event  will  not  be
             sent.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

           endDTD:
              Report  the  end  of  DTD  declarations,  it's  reporting  the  end  of the DOCTYPE
             declaration.

           {startEntity, SysId}:
              Report the beginning of some internal and external XML entities. ???

           {endEntity, SysId}:
              Report the end of an entity. ???

           {elementDecl, Name, Model}:
              Report an element type declaration. The content model will consist  of  the  string
             "EMPTY",  the  string  "ANY",  or  a  parenthesised group, optionally followed by an
             occurrence indicator. The model will be normalized so that  all  parameter  entities
             are  fully  resolved  and  all  whitespace is removed,and will include the enclosing
             parentheses.  Other  normalization  (such  as  removing  redundant  parentheses   or
             simplifying occurrence indicators) is at the discretion of the parser.

             * Name = string()

             * Model = string()

           {attributeDecl, ElementName, AttributeName, Type, Mode, Value}:
              Report an attribute type declaration.

             * ElementName = string()

             * AttributeName = string()

             * Type = string()

             * Mode = string()

             * Value = string()

           {internalEntityDecl, Name, Value}:
              Report an internal entity declaration.

             * Name = string()

             * Value = string()

           {externalEntityDecl, Name, PublicId, SystemId}:
              Report a parsed external entity declaration.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

           {unparsedEntityDecl, Name, PublicId, SystemId, Ndata}:
              Receive notification of an unparsed entity declaration event.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

             * Ndata = string()

           {notationDecl, Name, PublicId, SystemId}:
              Receive notification of a notation declaration event.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

         unicode_char():
            Integer representing valid unicode codepoint.

         unicode_binary():
            Binary with characters encoded in UTF-8 or UTF-16.

         latin1_binary():
            Binary with characters encoded in iso-latin-1.

EXPORTS

       file(Filename, Options) -> Result

              Types:

                 Filename = string()
                 Options = [option()]
                 Result = {ok, EventState, Rest} |
                  {Tag, Location, Reason, EndTags, EventState}
                 Rest = unicode_binary() | latin1_binary()
                 Tag = atom() (fatal_error, or user defined tag)
                 Location = {CurrentLocation, EntityName, LineNo}
                 CurrentLocation = string()
                 EntityName = string()
                 LineNo = integer()
                 EventState = term()
                 Reason = term()

              Parse  file  containing an XML document. This functions uses a default continuation
              function to read the file in blocks.

       stream(Xml, Options) -> Result

              Types:

                 Xml = unicode_binary() | latin1_binary() | [unicode_char()]
                 Options = [option()]
                 Result = {ok, EventState, Rest} |
                  {Tag, Location, Reason, EndTags, EventState}
                 Rest = unicode_binary() | latin1_binary() | [unicode_char()]
                 Tag = atom() (fatal_error or user defined tag)
                 Location = {CurrentLocation, EntityName, LineNo}
                 CurrentLocation = string()
                 EntityName = string()
                 LineNo = integer()
                 EventState = term()
                 Reason = term()

              Parse a stream containing an XML document.

CALLBACK FUNCTIONS

       The callback interface is based on that the user sends a fun with the correct signature to
       the parser.

EXPORTS

       ContinuationFun(State) -> {NewBytes, NewState}

              Types:

                 State = NewState = term()
                 NewBytes = binary() | list() (should be same as start input in stream/2)

              This function is called whenever the parser runs out of input data. If the function
              can't get hold of more input an empty list or binary (depends  on  start  input  in
              stream/2)  is  returned.  Other  types of errors is handled through exceptions. Use
              throw/1 to send the following tuple {Tag  =  atom(),  Reason  =  string()}  if  the
              continuation  function encounters a fatal error. Tag is an atom that identifies the
              functional entity that sends the exception and Reason is a  string  that  describes
              the problem.

       EventFun(Event, Location, State) -> NewState

              Types:

                 Event = event()
                 Location = {CurrentLocation, Entityname, LineNo}
                 CurrentLocation = string()
                 Entityname = string()
                 LineNo = integer()
                 State = NewState = term()

              This  function  is called for every event sent by the parser. The error handling is
              done through exceptions. Use throw/1 to send the following  tuple  {Tag  =  atom(),
              Reason = string()} if the application encounters a fatal error. Tag is an atom that
              identifies the functional entity that sends the exception and Reason  is  a  string
              that describes the problem.