Ubuntu Manpage: erl_scan - The Erlang Token Scanner

Provided by: erlang-manpages_16.b.3-dfsg-1ubuntu2.2_all

NAME

       erl_scan - The Erlang Token Scanner

DESCRIPTION

       This module contains functions for tokenizing characters into Erlang tokens.

DATA TYPES

       attribute_info() = {column, column()}
                        | {length, integer() >= 1}
                        | {line, info_line()}
                        | {location, info_location()}
                        | {text, string()}

       attributes() = line() | attributes_data()

       attributes_data() = [{column, column()} |
                            {line, info_line()} |
                            {text, string()}]
                         | {line(), column()}

       category() = atom()

       column() = integer() >= 1

       error_description() = term()

       error_info() = {location(), module(), error_description()}

       info_line() = integer() | term()

       info_location() = location() | term()

       line() = integer()

       location() = line() | {line(), column()}

       option() = return
                | return_white_spaces
                | return_comments
                | text
                | {reserved_word_fun, resword_fun()}

       options() = option() | [option()]

       symbol() = atom() | float() | integer() | string()

       resword_fun() = fun((atom()) -> boolean())

       token() = {category(), attributes(), symbol()}
               | {category(), attributes()}

       token_info() = {category, category()}
                    | {symbol, symbol()}
                    | attribute_info()

       tokens() = [token()]

       tokens_result() = {ok,
                          Tokens :: tokens(),
                          EndLocation :: location()}
                       | {eof, EndLocation :: location()}
                       | {error,
                          ErrorInfo :: error_info(),
                          EndLocation :: location()}

EXPORTS

       string(String) -> Return

       string(String, StartLocation) -> Return

       string(String, StartLocation, Options) -> Return

              Types:

                 String = string()
                 Options = options()
                 Return = {ok, Tokens :: tokens(), EndLocation}
                        | {error, ErrorInfo :: error_info(), ErrorLocation}
                 StartLocation = EndLocation = ErrorLocation = location()

              Takes the list of characters String and tries to scan (tokenize) them. Returns {ok,
              Tokens, EndLocation}, where Tokens are the Erlang tokens from  String.  EndLocation
              is the first location after the last token.

              {error,  ErrorInfo, ErrorLocation} is returned if an error occurs. ErrorLocation is
              the first location after the erroneous token.

              string(String)   is   equivalent   to   string(String,   1),   and   string(String,
              StartLocation) is equivalent to string(String, StartLocation, []).

              StartLocation indicates the initial location when scanning starts. If StartLocation
              is a line attributes() as well as EndLocation and ErrorLocation will be  lines.  If
              StartLocation  is  a  pair of a line and a column attributes() takes the form of an
              opaque compound data type, and EndLocation and ErrorLocation will  be  pairs  of  a
              line  and  a  column. The token attributes contain information about the column and
              the line where the token begins, as well as the text of  the  token  (if  the  text
              option  is  given),  all  of  which  can  be  accessed by calling token_info/1,2 or
              attributes_info/1,2.

              A token is a tuple containing  information  about  syntactic  category,  the  token
              attributes,  and the actual terminal symbol. For punctuation characters (e.g. ;, |)
              and reserved words, the  category  and  the  symbol  coincide,  and  the  token  is
              represented  by  a  two-tuple. Three-tuples have one of the following forms: {atom,
              Info, atom()}, {char, Info, integer()}, {comment, Info,  string()},  {float,  Info,
              float()},  {integer, Info, integer()}, {var, Info, atom()}, and {white_space, Info,
              string()}.

              The valid options are:

                {reserved_word_fun, reserved_word_fun()}:
                  A callback function that is called when the scanner has found an unquoted atom.
                  If  the function returns true, the unquoted atom itself will be the category of
                  the token; if the function returns false, atom will  be  the  category  of  the
                  unquoted atom.

                return_comments:
                  Return comment tokens.

                return_white_spaces:
                  Return  white  space tokens. By convention, if there is a newline character, it
                  is always the first character of the  text  (there  cannot  be  more  than  one
                  newline in a white space token).

                return:
                  Short for [return_comments, return_white_spaces].

                text:
                  Include  the  token's text in the token attributes. The text is the part of the
                  input corresponding to the token.

       tokens(Continuation, CharSpec, StartLocation) -> Return

       tokens(Continuation, CharSpec, StartLocation, Options) -> Return

              Types:

                 Continuation = return_cont() | []
                 CharSpec = char_spec()
                 StartLocation = location()
                 Options = options()
                 Return = {done,
                           Result :: tokens_result(),
                           LeftOverChars :: char_spec()}
                        | {more, Continuation1 :: return_cont()}
                 char_spec() = string() | eof
                 return_cont()
                   An opaque continuation

              This is the re-entrant scanner which scans characters until a dot ('.' followed  by
              a white space) or eof has been reached. It returns:

                {done, Result, LeftOverChars}:
                  This  return  indicates  that  there  is sufficient input data to get a result.
                  Result is:

                  {ok, Tokens, EndLocation}:
                    The scanning was successful. Tokens is the list of tokens including dot.

                  {eof, EndLocation}:
                    End of file was encountered before any more tokens.

                  {error, ErrorInfo, EndLocation}:
                    An error occurred. LeftOverChars is the remaining  characters  of  the  input
                    data, starting from EndLocation.

                {more, Continuation1}:
                  More  data  is  required for building a term. Continuation1 must be passed in a
                  new call to tokens/3,4 when more data is available.

              The CharSpec eof signals end of file. LeftOverChars will then take the value eof as
              well.

              tokens(Continuation, CharSpec, StartLocation) is equivalent to tokens(Continuation,
              CharSpec, StartLocation, []).

              See string/3 for a description of the various options.

       reserved_word(Atom :: atom()) -> boolean()

              Returns true if Atom is an Erlang reserved word, otherwise false.

       token_info(Token) -> TokenInfo

              Types:

                 Token = token()
                 TokenInfo = [TokenInfoTuple :: token_info()]

              Returns a list containing information about the  token  Token.  The  order  of  the
              TokenInfoTuples  is  not  defined.  See token_info/2 for information about specific
              TokenInfoTuples.

              Note that if token_info(Token, TokenItem) returns undefined for some TokenItem, the
              item is not included in TokenInfo.

       token_info(Token, TokenItem) -> TokenInfoTuple | undefined

       token_info(Token, TokenItems) -> TokenInfo

              Types:

                 Token = token()
                 TokenItems = [TokenItem :: token_item()]
                 TokenInfo = [TokenInfoTuple :: token_info()]
                 token_item() = category | symbol | attribute_item()
                 attribute_item() = column | length | line | location | text

              Returns  a  list  containing  information  about  the  token  Token.  If one single
              TokenItem is given the returned  value  is  the  corresponding  TokenInfoTuple,  or
              undefined  if  the  TokenItem  has  no  value. If a list of TokenItems is given the
              result is a list of  TokenInfoTuple.  The  TokenInfoTuples  will  appear  with  the
              corresponding  TokenItems in the same order as the TokenItems appear in the list of
              TokenItems.  TokenItems  with  no  value  are  not  included   in   the   list   of
              TokenInfoTuple.

              The following TokenInfoTuples with corresponding TokenItems are valid:

                {category,  category()}:
                  The category of the token.

                {column,  column()}:
                  The column where the token begins.

                {length, integer() > 0}:
                  The length of the token's text.

                {line,  line()}:
                  The line where the token begins.

                {location,  location()}:
                  The  line  and  column  where  the token begins, or just the line if the column
                  unknown.

                {symbol,  symbol()}:
                  The token's symbol.

                {text, string()}:
                  The token's text.

       attributes_info(Attributes) -> AttributesInfo

              Types:

                 Attributes = attributes()
                 AttributesInfo = [AttributeInfoTuple :: attribute_info()]

              Returns a list containing information about the token  attributes  Attributes.  The
              order  of  the  AttributeInfoTuples  is  not  defined.  See  attributes_info/2  for
              information about specific AttributeInfoTuples.

              Note that if  attributes_info(Token,  AttributeItem)  returns  undefined  for  some
              AttributeItem in the list above, the item is not included in AttributesInfo.

       attributes_info(Attributes, AttributeItem) ->
                          AttributeInfoTuple | undefined

       attributes_info(Attributes, AttributeItems) -> AttributeInfo

              Types:

                 Attributes = attributes()
                 AttributeItems = [AttributeItem :: attribute_item()]
                 AttributeInfo = [AttributeInfoTuple :: attribute_info()]
                 attribute_item() = column | length | line | location | text

              Returns a list containing information about the token attributes Attributes. If one
              single  AttributeItem  is  given  the   returned   value   is   the   corresponding
              AttributeInfoTuple,  or  undefined  if the AttributeItem has no value. If a list of
              AttributeItem  is  given  the  result  is  a  list   of   AttributeInfoTuple.   The
              AttributeInfoTuples  will  appear with the corresponding AttributeItems in the same
              order as the AttributeItems appear in the list  of  AttributeItems.  AttributeItems
              with no value are not included in the list of AttributeInfoTuple.

              The following AttributeInfoTuples with corresponding AttributeItems are valid:

                {column,  column()}:
                  The column where the token begins.

                {length, integer() > 0}:
                  The length of the token's text.

                {line,  line()}:
                  The line where the token begins.

                {location,  location()}:
                  The  line  and  column  where  the token begins, or just the line if the column
                  unknown.

                {text, string()}:
                  The token's text.

       set_attribute(AttributeItem, Attributes, SetAttributeFun) ->
                        Attributes

              Types:

                 AttributeItem = line
                 Attributes = attributes()
                 SetAttributeFun = fun((info_line()) -> info_line())

              Sets the value of the line attribute of the token attributes Attributes.

              The SetAttributeFun is called with the value of  the  line  attribute,  and  is  to
              return the new value of the line attribute.

       format_error(ErrorDescriptor) -> string()

              Types:

                 ErrorDescriptor = error_description()

              Takes an ErrorDescriptor and returns a string which describes the error or warning.
              This function is usually called implicitly when processing an  ErrorInfo  structure
              (see below).

ERROR INFORMATION

       The  ErrorInfo  mentioned above is the standard ErrorInfo structure which is returned from
       all IO modules. It has the following format:

       {ErrorLocation, Module, ErrorDescriptor}

       A string which describes the error is obtained with the following call:

       Module:format_error(ErrorDescriptor)

NOTES

       The continuation of the first call to the re-entrant input functions must be []. Refer  to
       Armstrong,  Virding  and  Williams,  'Concurrent Programming in Erlang', Chapter 13, for a
       complete description of how the re-entrant input scheme works.

NAME

DESCRIPTION

DATA TYPES

EXPORTS

ERROR INFORMATION

NOTES

SEE ALSO