Provided by: erlang-manpages_25.3.2.8+dfsg-1ubuntu4_all bug

NAME

       xmerl_sax_parser - XML SAX parser API

DESCRIPTION

       A  SAX  parser for XML that sends the events through a callback interface. SAX is the Simple API for XML,
       originally a Java-only API. SAX was the first widely adopted API for XML in  Java,  and  is  a  de  facto
       standard where there are versions for several programming language environments other than Java.

DATA TYPES

         option():
           Options used to customize the behaviour of the parser. Possible options are:

           {continuation_fun, ContinuationFun}:
             ContinuationFun is a call back function to decide what to do if the parser runs into EOF before the
             document is complete.

           {continuation_state, term()}:
              State that is accessible in the continuation call back function.

           {event_fun, EventFun}:
             EventFun is the call back function for parser events.

           {event_state, term()}:
              State that is accessible in the event call back function.

           {file_type, FileType}:
              Flag that tells the parser if it's parsing a DTD or a normal XML file (default normal).

             * FileType = normal | dtd

           {encoding, Encoding}:
              Set default character set used (default UTF-8). This character set is used only if not  explicitly
             given by the XML document.

             * Encoding = utf8 | {utf16,big} | {utf16,little} | latin1 | list

           skip_external_dtd:
              Skips  the  external  DTD during parsing. This option is the same as {external_entities, none} and
             {fail_undeclared_ref, false} but just for the DTD.

           disallow_entities:
              Implies that parsing fails if an ENTITY declaration is found.

           {entity_recurse_limit, N}:
              Sets how many levels of recursion that is allowed for entities. Default is 3 levels.

           {external_entities, AllowedType}:
              Sets which types of external entities that should be allowed, if not allowed it's just skipped.

             * AllowedType = all | file | none

           {fail_undeclared_ref, Boolean}:
              Decides how the parser should behave when an undeclared reference is found. Can be useful  if  one
             has turned of external entities so that an external DTD is not parsed. Default is true.

         :

         event():
           The SAX events that are sent to the user via the callback.

           startDocument:
              Receive notification of the beginning of a document. The SAX parser will send this event only once
             before any other event callbacks.

           endDocument:
              Receive notification of the end of a document. The SAX parser will send this event only once,  and
             it will be the last event during the parse.

           {startPrefixMapping, Prefix, Uri}:
              Begin the scope of a prefix-URI Namespace mapping. Note that start/endPrefixMapping events are not
             guaranteed to be properly nested relative to each other: all startPrefixMapping events  will  occur
             immediately before the corresponding startElement event, and all endPrefixMapping events will occur
             immediately after the corresponding endElement event, but their order is not otherwise  guaranteed.
             There  will  not be start/endPrefixMapping events for the "xml" prefix, since it is predeclared and
             immutable.

             * Prefix = string()

             * Uri = string()

           {endPrefixMapping, Prefix}:
              End the scope of a prefix-URI mapping.

             * Prefix = string()

           {startElement, Uri, LocalName, QualifiedName, Attributes}:
              Receive notification of the beginning of an element. The  Parser  will  send  this  event  at  the
             beginning  of every element in the XML document; there will be a corresponding endElement event for
             every startElement event (even when the element is empty). All of the  element's  content  will  be
             reported, in order, before the corresponding endElement event.

             * Uri = string()

             * LocalName = string()

             * QualifiedName = {Prefix, LocalName}

             * Prefix = string()

             * Attributes = [{Uri, Prefix, AttributeName, Value}]

             * AttributeName = string()

             * Value = string()

           {endElement, Uri, LocalName, QualifiedName}:
              Receive  notification  of the end of an element. The SAX parser will send this event at the end of
             every element in the XML document; there will be  a  corresponding  startElement  event  for  every
             endElement event (even when the element is empty).

             * Uri = string()

             * LocalName = string()

             * QualifiedName = {Prefix, LocalName}

             * Prefix = string()

           {characters, string()}:
              Receive notification of character data.

           {ignorableWhitespace, string()}:
              Receive notification of ignorable whitespace in element content.

           {processingInstruction, Target, Data}:
              Receive  notification  of  a processing instruction. The Parser will send this event once for each
             processing instruction found: note that processing instructions may occur before or after the  main
             document element.

             * Target = string()

             * Data = string()

           {comment, string()}:
              Report an XML comment anywhere in the document (both inside and outside of the document element).

           startCDATA:
              Report  the  start  of a CDATA section. The contents of the CDATA section will be reported through
             the regular characters event.

           endCDATA:
              Report the end of a CDATA section.

           {startDTD, Name, PublicId, SystemId}:
              Report the start of DTD declarations, it's reporting the start of the DOCTYPE declaration. If  the
             document has no DOCTYPE declaration, this event will not be sent.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

           endDTD:
              Report the end of DTD declarations, it's reporting the end of the DOCTYPE declaration.

           {startEntity, SysId}:
              Report the beginning of some internal and external XML entities. ???

           {endEntity, SysId}:
              Report the end of an entity. ???

           {elementDecl, Name, Model}:
              Report  an  element  type  declaration.  The content model will consist of the string "EMPTY", the
             string "ANY", or a parenthesised group, optionally followed by an occurrence indicator.  The  model
             will  be  normalized  so  that  all  parameter  entities  are  fully resolved and all whitespace is
             removed,and will include the enclosing parentheses. Other normalization (such as removing redundant
             parentheses or simplifying occurrence indicators) is at the discretion of the parser.

             * Name = string()

             * Model = string()

           {attributeDecl, ElementName, AttributeName, Type, Mode, Value}:
              Report an attribute type declaration.

             * ElementName = string()

             * AttributeName = string()

             * Type = string()

             * Mode = string()

             * Value = string()

           {internalEntityDecl, Name, Value}:
              Report an internal entity declaration.

             * Name = string()

             * Value = string()

           {externalEntityDecl, Name, PublicId, SystemId}:
              Report a parsed external entity declaration.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

           {unparsedEntityDecl, Name, PublicId, SystemId, Ndata}:
              Receive notification of an unparsed entity declaration event.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

             * Ndata = string()

           {notationDecl, Name, PublicId, SystemId}:
              Receive notification of a notation declaration event.

             * Name = string()

             * PublicId = string()

             * SystemId = string()

         unicode_char():
            Integer representing valid unicode codepoint.

         unicode_binary():
            Binary with characters encoded in UTF-8 or UTF-16.

         latin1_binary():
            Binary with characters encoded in iso-latin-1.

EXPORTS

       file(Filename, Options) -> Result

              Types:

                 Filename = string()
                 Options = [option()]
                 Result = {ok, EventState, Rest} |
                  {Tag, Location, Reason, EndTags, EventState}
                 Rest = unicode_binary() | latin1_binary()
                 Tag = atom() (fatal_error, or user defined tag)
                 Location = {CurrentLocation, EntityName, LineNo}
                 CurrentLocation = string()
                 EntityName = string()
                 LineNo = integer()
                 EventState = term()
                 Reason = term()

              Parse file containing an XML document. This functions uses a default continuation function to read
              the file in blocks.

       stream(Xml, Options) -> Result

              Types:

                 Xml = unicode_binary() | latin1_binary() | [unicode_char()]
                 Options = [option()]
                 Result = {ok, EventState, Rest} |
                  {Tag, Location, Reason, EndTags, EventState}
                 Rest = unicode_binary() | latin1_binary() | [unicode_char()]
                 Tag = atom() (fatal_error or user defined tag)
                 Location = {CurrentLocation, EntityName, LineNo}
                 CurrentLocation = string()
                 EntityName = string()
                 LineNo = integer()
                 EventState = term()
                 Reason = term()

              Parse a stream containing an XML document.

CALLBACK FUNCTIONS

       The callback interface is based on that the user sends a fun with the correct signature to the parser.

EXPORTS

       Module:ContinuationFun(State) -> {NewBytes, NewState}

              Types:

                 State = NewState = term()
                 NewBytes = binary() | list() (should be same as start input in stream/2)

              This function is called whenever the parser runs out of input data. If the function can't get hold
              of  more  input  an  empty  list or binary (depends on start input in stream/2) is returned. Other
              types of errors is handled through exceptions. Use throw/1 to send  the  following  tuple  {Tag  =
              atom(),  Reason  = string()} if the continuation function encounters a fatal error. Tag is an atom
              that identifies the functional entity that sends  the  exception  and  Reason  is  a  string  that
              describes the problem.

       Module:EventFun(Event, Location, State) -> NewState

              Types:

                 Event = event()
                 Location = {CurrentLocation, Entityname, LineNo}
                 CurrentLocation = string()
                 Entityname = string()
                 LineNo = integer()
                 State = NewState = term()

              This  function  is  called  for every event sent by the parser. The error handling is done through
              exceptions. Use throw/1 to send the following tuple {Tag =  atom(),  Reason  =  string()}  if  the
              application  encounters  a  fatal error. Tag is an atom that identifies the functional entity that
              sends the exception and Reason is a string that describes the problem.