Ubuntu Manpage: erl_scan - The Erlang Token Scanner

name
description
data types
exports
error information
notes
see also

Provided by: erlang-manpages_16.b.3-dfsg-1ubuntu2.2_all

NAME

       erl_scan - The Erlang Token Scanner

DESCRIPTION

       This module contains functions for tokenizing characters into Erlang tokens.

DATA TYPES

       attribute_info() = {column, column()}
                        | {length, integer() >= 1}
                        | {line, info_line()}
                        | {location, info_location()}
                        | {text, string()}

       attributes() = line() | attributes_data()

       attributes_data() = [{column, column()} |
                            {line, info_line()} |
                            {text, string()}]
                         | {line(), column()}

       category() = atom()

       column() = integer() >= 1

       error_description() = term()

       error_info() = {location(), module(), error_description()}

       info_line() = integer() | term()

       info_location() = location() | term()

       line() = integer()

       location() = line() | {line(), column()}

       option() = return
                | return_white_spaces
                | return_comments
                | text
                | {reserved_word_fun, resword_fun()}

       options() = option() | [option()]

       symbol() = atom() | float() | integer() | string()

       resword_fun() = fun((atom()) -> boolean())

       token() = {category(), attributes(), symbol()}
               | {category(), attributes()}

       token_info() = {category, category()}
                    | {symbol, symbol()}
                    | attribute_info()

       tokens() = [token()]

       tokens_result() = {ok,
                          Tokens :: tokens(),
                          EndLocation :: location()}
                       | {eof, EndLocation :: location()}
                       | {error,
                          ErrorInfo :: error_info(),
                          EndLocation :: location()}

EXPORTS

       string(String) -> Return

       string(String, StartLocation) -> Return

       string(String, StartLocation, Options) -> Return

              Types:

                 String = string()
                 Options = options()
                 Return = {ok, Tokens :: tokens(), EndLocation}
                        | {error, ErrorInfo :: error_info(), ErrorLocation}
                 StartLocation = EndLocation = ErrorLocation = location()

              Takes  the  list  of  characters  String  and  tries to scan (tokenize) them. Returns {ok, Tokens,
              EndLocation}, where Tokens are the Erlang tokens from String. EndLocation is  the  first  location
              after the last token.

              {error,  ErrorInfo,  ErrorLocation}  is  returned  if  an error occurs. ErrorLocation is the first
              location after the erroneous token.

              string(String) is equivalent to string(String, 1), and string(String, StartLocation) is equivalent
              to string(String, StartLocation, []).

              StartLocation  indicates  the  initial  location  when scanning starts. If StartLocation is a line
              attributes() as well as EndLocation and ErrorLocation will be lines. If StartLocation is a pair of
              a  line  and a column attributes() takes the form of an opaque compound data type, and EndLocation
              and ErrorLocation will be pairs of a line and a column. The token attributes  contain  information
              about  the  column  and  the line where the token begins, as well as the text of the token (if the
              text  option  is  given),  all  of  which  can  be   accessed   by   calling   token_info/1,2   or
              attributes_info/1,2.

              A  token is a tuple containing information about syntactic category, the token attributes, and the
              actual terminal symbol. For punctuation characters (e.g. ;, |) and reserved  words,  the  category
              and the symbol coincide, and the token is represented by a two-tuple. Three-tuples have one of the
              following forms: {atom, Info, atom()}, {char, Info, integer()}, {comment, Info, string()}, {float,
              Info,  float()},  {integer,  Info,  integer()},  {var,  Info,  atom()},  and  {white_space,  Info,
              string()}.

              The valid options are:

                {reserved_word_fun, reserved_word_fun()}:
                  A callback function that is called when the  scanner  has  found  an  unquoted  atom.  If  the
                  function  returns  true,  the  unquoted  atom itself will be the category of the token; if the
                  function returns false, atom will be the category of the unquoted atom.

                return_comments:
                  Return comment tokens.

                return_white_spaces:
                  Return white space tokens. By convention, if there is a newline character, it  is  always  the
                  first character of the text (there cannot be more than one newline in a white space token).

                return:
                  Short for [return_comments, return_white_spaces].

                text:
                  Include  the  token's  text  in  the  token  attributes.  The  text  is  the part of the input
                  corresponding to the token.

       tokens(Continuation, CharSpec, StartLocation) -> Return

       tokens(Continuation, CharSpec, StartLocation, Options) -> Return

              Types:

                 Continuation = return_cont() | []
                 CharSpec = char_spec()
                 StartLocation = location()
                 Options = options()
                 Return = {done,
                           Result :: tokens_result(),
                           LeftOverChars :: char_spec()}
                        | {more, Continuation1 :: return_cont()}
                 char_spec() = string() | eof
                 return_cont()
                   An opaque continuation

              This is the re-entrant scanner which scans characters until a dot ('.' followed by a white  space)
              or eof has been reached. It returns:

                {done, Result, LeftOverChars}:
                  This return indicates that there is sufficient input data to get a result. Result is:

                  {ok, Tokens, EndLocation}:
                    The scanning was successful. Tokens is the list of tokens including dot.

                  {eof, EndLocation}:
                    End of file was encountered before any more tokens.

                  {error, ErrorInfo, EndLocation}:
                    An  error  occurred.  LeftOverChars  is the remaining characters of the input data, starting
                    from EndLocation.

                {more, Continuation1}:
                  More data is required for building a term. Continuation1 must be  passed  in  a  new  call  to
                  tokens/3,4 when more data is available.

              The CharSpec eof signals end of file. LeftOverChars will then take the value eof as well.

              tokens(Continuation,  CharSpec,  StartLocation)  is  equivalent  to tokens(Continuation, CharSpec,
              StartLocation, []).

              See string/3 for a description of the various options.

       reserved_word(Atom :: atom()) -> boolean()

              Returns true if Atom is an Erlang reserved word, otherwise false.

       token_info(Token) -> TokenInfo

              Types:

                 Token = token()
                 TokenInfo = [TokenInfoTuple :: token_info()]

              Returns a list containing information about the token Token. The order of the  TokenInfoTuples  is
              not defined. See token_info/2 for information about specific TokenInfoTuples.

              Note  that  if  token_info(Token, TokenItem) returns undefined for some TokenItem, the item is not
              included in TokenInfo.

       token_info(Token, TokenItem) -> TokenInfoTuple | undefined

       token_info(Token, TokenItems) -> TokenInfo

              Types:

                 Token = token()
                 TokenItems = [TokenItem :: token_item()]
                 TokenInfo = [TokenInfoTuple :: token_info()]
                 token_item() = category | symbol | attribute_item()
                 attribute_item() = column | length | line | location | text

              Returns a list containing information about the token Token. If one single TokenItem is given  the
              returned value is the corresponding TokenInfoTuple, or undefined if the TokenItem has no value. If
              a list of TokenItems is given the result is a list of  TokenInfoTuple.  The  TokenInfoTuples  will
              appear with the corresponding TokenItems in the same order as the TokenItems appear in the list of
              TokenItems. TokenItems with no value are not included in the list of TokenInfoTuple.

              The following TokenInfoTuples with corresponding TokenItems are valid:

                {category,  category()}:
                  The category of the token.

                {column,  column()}:
                  The column where the token begins.

                {length, integer() > 0}:
                  The length of the token's text.

                {line,  line()}:
                  The line where the token begins.

                {location,  location()}:
                  The line and column where the token begins, or just the line if the column unknown.

                {symbol,  symbol()}:
                  The token's symbol.

                {text, string()}:
                  The token's text.

       attributes_info(Attributes) -> AttributesInfo

              Types:

                 Attributes = attributes()
                 AttributesInfo = [AttributeInfoTuple :: attribute_info()]

              Returns a list containing information about the token attributes  Attributes.  The  order  of  the
              AttributeInfoTuples   is  not  defined.  See  attributes_info/2  for  information  about  specific
              AttributeInfoTuples.

              Note that if attributes_info(Token, AttributeItem) returns undefined for some AttributeItem in the
              list above, the item is not included in AttributesInfo.

       attributes_info(Attributes, AttributeItem) ->
                          AttributeInfoTuple | undefined

       attributes_info(Attributes, AttributeItems) -> AttributeInfo

              Types:

                 Attributes = attributes()
                 AttributeItems = [AttributeItem :: attribute_item()]
                 AttributeInfo = [AttributeInfoTuple :: attribute_info()]
                 attribute_item() = column | length | line | location | text

              Returns  a  list  containing  information  about  the  token  attributes Attributes. If one single
              AttributeItem is given the returned value is the corresponding AttributeInfoTuple, or undefined if
              the  AttributeItem  has  no  value.  If  a  list of AttributeItem is given the result is a list of
              AttributeInfoTuple. The AttributeInfoTuples will appear with the corresponding  AttributeItems  in
              the  same order as the AttributeItems appear in the list of AttributeItems. AttributeItems with no
              value are not included in the list of AttributeInfoTuple.

              The following AttributeInfoTuples with corresponding AttributeItems are valid:

                {column,  column()}:
                  The column where the token begins.

                {length, integer() > 0}:
                  The length of the token's text.

                {line,  line()}:
                  The line where the token begins.

                {location,  location()}:
                  The line and column where the token begins, or just the line if the column unknown.

                {text, string()}:
                  The token's text.

       set_attribute(AttributeItem, Attributes, SetAttributeFun) ->
                        Attributes

              Types:

                 AttributeItem = line
                 Attributes = attributes()
                 SetAttributeFun = fun((info_line()) -> info_line())

              Sets the value of the line attribute of the token attributes Attributes.

              The SetAttributeFun is called with the value of the line attribute, and is to return the new value
              of the line attribute.

       format_error(ErrorDescriptor) -> string()

              Types:

                 ErrorDescriptor = error_description()

              Takes  an ErrorDescriptor and returns a string which describes the error or warning. This function
              is usually called implicitly when processing an ErrorInfo structure (see below).

ERROR INFORMATION

       The ErrorInfo mentioned above is the standard ErrorInfo structure which is returned from all IO  modules.
       It has the following format:

       {ErrorLocation, Module, ErrorDescriptor}

       A string which describes the error is obtained with the following call:

       Module:format_error(ErrorDescriptor)

NOTES

       The  continuation  of  the  first  call to the re-entrant input functions must be []. Refer to Armstrong,
       Virding and Williams, 'Concurrent Programming in Erlang', Chapter 13, for a complete description  of  how
       the re-entrant input scheme works.

NAME

DESCRIPTION

DATA TYPES

EXPORTS

ERROR INFORMATION

NOTES

SEE ALSO