Provided by: erlang-manpages_25.3.2.8+dfsg-1ubuntu4_all bug

NAME

       xmerl_sax_parser - XML SAX parser API

DESCRIPTION

       A SAX parser for XML that sends the events through a callback interface. SAX is the Simple
       API for XML, originally a Java-only API. SAX was the first widely adopted API for  XML  in
       Java, and is a de facto standard where there are versions for several programming language
       environments other than Java.

DATA TYPES

         option():
           Options used to customize the behaviour of the parser. Possible options are:

           {continuation_fun, ContinuationFun}:
             ContinuationFun is a call back function to decide what to do if the parser runs into
             EOF before the document is complete.

           {continuation_state, term()}:
              State that is accessible in the continuation call back function.

           {event_fun, EventFun}:
             EventFun is the call back function for parser events.

           {event_state, term()}:
              State that is accessible in the event call back function.

           {file_type, FileType}:
              Flag  that  tells  the  parser  if it's parsing a DTD or a normal XML file (default
             normal).

             * FileType = normal | dtd

           {encoding, Encoding}:
              Set default character set used (default UTF-8). This character set is used only  if
             not explicitly given by the XML document.

             * Encoding = utf8 | {utf16,big} | {utf16,little} | latin1 | list

           skip_external_dtd:
              Skips   the   external   DTD   during   parsing.   This   option  is  the  same  as
             {external_entities, none} and {fail_undeclared_ref, false} but just for the DTD.

           disallow_entities:
              Implies that parsing fails if an ENTITY declaration is found.

           {entity_recurse_limit, N}:
              Sets how many levels of recursion that  is  allowed  for  entities.  Default  is  3
             levels.

           {external_entities, AllowedType}:
              Sets  which  types of external entities that should be allowed, if not allowed it's
             just skipped.

             * AllowedType = all | file | none

           {fail_undeclared_ref, Boolean}:
              Decides how the parser should behave when an undeclared reference is found. Can  be
             useful if one has turned of external entities so that an external DTD is not parsed.
             Default is true.

         :

         event():
           The SAX events that are sent to the user via the callback.

           startDocument:
              Receive notification of the beginning of a document. The SAX parser will send  this
             event only once before any other event callbacks.

           endDocument:
              Receive  notification of the end of a document. The SAX parser will send this event
             only once, and it will be the last event during the parse.

           {startPrefixMapping, Prefix, Uri}:
              Begin the scope of a prefix-URI Namespace mapping. Note that start/endPrefixMapping
             events  are  not  guaranteed  to  be  properly  nested  relative  to each other: all
             startPrefixMapping  events  will  occur   immediately   before   the   corresponding
             startElement event, and all endPrefixMapping events will occur immediately after the
             corresponding endElement event, but their order is not otherwise  guaranteed.  There
             will  not  be  start/endPrefixMapping  events  for  the  "xml"  prefix,  since it is
             predeclared and immutable.

             * Prefix = string()

             * Uri = string()

           {endPrefixMapping, Prefix}:
              End the scope of a prefix-URI mapping.

             * Prefix = string()

           {startElement, Uri, LocalName, QualifiedName, Attributes}:
              Receive notification of the beginning of an element.  The  Parser  will  send  this
             event  at  the  beginning  of  every  element  in  the XML document; there will be a
             corresponding endElement event for every startElement event (even when  the  element
             is  empty).  All  of  the  element's  content will be reported, in order, before the
             corresponding endElement event.

             * Uri = string()

             * LocalName = string()

             * QualifiedName = {Prefix, LocalName}

             * Prefix = string()

             * Attributes = [{Uri, Prefix, AttributeName, Value}]

             * AttributeName = string()

             * Value = string()

           {endElement, Uri, LocalName, QualifiedName}:
              Receive notification of the end of an element. The SAX parser will send this  event
             at  the  end  of  every  element  in the XML document; there will be a corresponding
             startElement event for every endElement event (even when the element is empty).

             * Uri = string()

             * LocalName = string()

             * QualifiedName = {Prefix, LocalName}

             * Prefix = string()

           {characters, string()}:
              Receive notification of character data.

           {ignorableWhitespace, string()}:
              Receive notification of ignorable whitespace in element content.

           {processingInstruction, Target, Data}:
              Receive notification of a processing instruction. The Parser will send  this  event
             once  for  each  processing instruction found: note that processing instructions may
             occur before or after the main document element.

             * Target = string()

             * Data = string()

           {comment, string()}:
              Report an XML comment anywhere in the document (both  inside  and  outside  of  the
             document element).

           startCDATA:
              Report  the  start  of  a  CDATA section. The contents of the CDATA section will be
             reported through the regular characters event.

           endCDATA:
              Report the end of a CDATA section.

           {startDTD, Name, PublicId, SystemId}:
              Report the start of DTD declarations, it's  reporting  the  start  of  the  DOCTYPE
             declaration.  If  the  document  has  no DOCTYPE declaration, this event will not be
             sent.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

           endDTD:
              Report the end  of  DTD  declarations,  it's  reporting  the  end  of  the  DOCTYPE
             declaration.

           {startEntity, SysId}:
              Report the beginning of some internal and external XML entities. ???

           {endEntity, SysId}:
              Report the end of an entity. ???

           {elementDecl, Name, Model}:
              Report  an  element  type declaration. The content model will consist of the string
             "EMPTY", the string "ANY", or a  parenthesised  group,  optionally  followed  by  an
             occurrence  indicator.  The  model will be normalized so that all parameter entities
             are fully resolved and all whitespace is  removed,and  will  include  the  enclosing
             parentheses.   Other  normalization  (such  as  removing  redundant  parentheses  or
             simplifying occurrence indicators) is at the discretion of the parser.

             * Name = string()

             * Model = string()

           {attributeDecl, ElementName, AttributeName, Type, Mode, Value}:
              Report an attribute type declaration.

             * ElementName = string()

             * AttributeName = string()

             * Type = string()

             * Mode = string()

             * Value = string()

           {internalEntityDecl, Name, Value}:
              Report an internal entity declaration.

             * Name = string()

             * Value = string()

           {externalEntityDecl, Name, PublicId, SystemId}:
              Report a parsed external entity declaration.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

           {unparsedEntityDecl, Name, PublicId, SystemId, Ndata}:
              Receive notification of an unparsed entity declaration event.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

             * Ndata = string()

           {notationDecl, Name, PublicId, SystemId}:
              Receive notification of a notation declaration event.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

         unicode_char():
            Integer representing valid unicode codepoint.

         unicode_binary():
            Binary with characters encoded in UTF-8 or UTF-16.

         latin1_binary():
            Binary with characters encoded in iso-latin-1.

EXPORTS

       file(Filename, Options) -> Result

              Types:

                 Filename = string()
                 Options = [option()]
                 Result = {ok, EventState, Rest} |
                  {Tag, Location, Reason, EndTags, EventState}
                 Rest = unicode_binary() | latin1_binary()
                 Tag = atom() (fatal_error, or user defined tag)
                 Location = {CurrentLocation, EntityName, LineNo}
                 CurrentLocation = string()
                 EntityName = string()
                 LineNo = integer()
                 EventState = term()
                 Reason = term()

              Parse file containing an XML document. This functions uses a  default  continuation
              function to read the file in blocks.

       stream(Xml, Options) -> Result

              Types:

                 Xml = unicode_binary() | latin1_binary() | [unicode_char()]
                 Options = [option()]
                 Result = {ok, EventState, Rest} |
                  {Tag, Location, Reason, EndTags, EventState}
                 Rest = unicode_binary() | latin1_binary() | [unicode_char()]
                 Tag = atom() (fatal_error or user defined tag)
                 Location = {CurrentLocation, EntityName, LineNo}
                 CurrentLocation = string()
                 EntityName = string()
                 LineNo = integer()
                 EventState = term()
                 Reason = term()

              Parse a stream containing an XML document.

CALLBACK FUNCTIONS

       The callback interface is based on that the user sends a fun with the correct signature to
       the parser.

EXPORTS

       Module:ContinuationFun(State) -> {NewBytes, NewState}

              Types:

                 State = NewState = term()
                 NewBytes = binary() | list() (should be same as start input in stream/2)

              This function is called whenever the parser runs out of input data. If the function
              can't  get  hold  of  more input an empty list or binary (depends on start input in
              stream/2) is returned. Other types of errors is  handled  through  exceptions.  Use
              throw/1  to  send  the  following  tuple  {Tag  = atom(), Reason = string()} if the
              continuation function encounters a fatal error. Tag is an atom that identifies  the
              functional  entity  that  sends the exception and Reason is a string that describes
              the problem.

       Module:EventFun(Event, Location, State) -> NewState

              Types:

                 Event = event()
                 Location = {CurrentLocation, Entityname, LineNo}
                 CurrentLocation = string()
                 Entityname = string()
                 LineNo = integer()
                 State = NewState = term()

              This function is called for every event sent by the parser. The error  handling  is
              done  through  exceptions.  Use  throw/1 to send the following tuple {Tag = atom(),
              Reason = string()} if the application encounters a fatal error. Tag is an atom that
              identifies  the  functional  entity that sends the exception and Reason is a string
              that describes the problem.