Provided by: manpages-posix_2.16-1_all bug

NAME

       awk - pattern scanning and processing language

SYNOPSIS

       awk [-F ERE][-v assignment] ... program [argument ...]

       awk [-F ERE] -f progfile ...  [-v assignment] ...[argument ...]

DESCRIPTION

       The  awk utility shall execute programs written in the awk programming language, which is specialized for
       textual data manipulation. An awk program is a sequence of patterns and corresponding actions. When input
       is read that matches a pattern, the action associated with that pattern is carried out.

       Input shall be interpreted as a sequence of records. By default, a record is a line, less its terminating
       <newline>, but this can be changed by using the RS built-in variable.  Each  record  of  input  shall  be
       matched  in  turn  against  each  pattern in the program. For each pattern matched, the associated action
       shall be executed.

       The awk utility shall interpret each input record as a sequence of fields where, by default, a field is a
       string of non- <blank>s. This default white-space field delimiter can be changed by using the FS built-in
       variable or -F ERE. The awk utility shall denote the first field in a record $1, the second  $2,  and  so
       on.  The  symbol $0 shall refer to the entire record; setting any other field causes the re-evaluation of
       $0. Assigning to $0 shall reset the values of all other fields and the NF built-in variable.

OPTIONS

       The awk utility shall conform to the Base  Definitions  volume  of  IEEE Std 1003.1-2001,  Section  12.2,
       Utility Syntax Guidelines.

       The following options shall be supported:

       -F  ERE
              Define  the  input  field separator to be the extended regular expression ERE, before any input is
              read; see Regular Expressions .

       -f  progfile
              Specify the pathname of the file progfile containing an awk program. If multiple instances of this
              option  are specified, the concatenation of the files specified as progfile in the order specified
              shall be the awk program. The awk program can alternatively be specified in the command line as  a
              single argument.

       -v  assignment
              The  application  shall  ensure  that the assignment argument is in the same form as an assignment
              operand. The specified variable assignment  shall  occur  prior  to  executing  the  awk  program,
              including the actions associated with BEGIN patterns (if any). Multiple occurrences of this option
              can be specified.

OPERANDS

       The following operands shall be supported:

       program
              If no -f option is specified, the first operand to awk shall be the text of the awk  program.  The
              application shall supply the program operand as a single argument to awk. If the text does not end
              in a <newline>, awk shall interpret the text as if it did.

       argument
              Either of the following two types of argument can be intermixed:

       file
              A pathname of a file that contains the input to be read, which  is  matched  against  the  set  of
              patterns  in  the  program.  If  no file operands are specified, or if a file operand is '-' , the
              standard input shall be used.

       assignment
              An operand that begins with an underscore or alphabetic character from the portable character  set
              (see  the  table  in  the  Base  Definitions volume of IEEE Std 1003.1-2001, Section 6.1, Portable
              Character Set), followed by a sequence of underscores, digits, and alphabetics from  the  portable
              character  set,  followed  by the '=' character, shall specify a variable assignment rather than a
              pathname. The characters before the '=' represent the name of an awk variable; if that name is  an
              awk  reserved  word  (see  Grammar ) the behavior is undefined. The characters following the equal
              sign shall be interpreted as if they appeared in the  awk  program  preceded  and  followed  by  a
              double-quote ( ' )' character, as a STRING token (see Grammar ), except that if the last character
              is an unescaped backslash, it shall be interpreted as a literal backslash rather than as the first
              character  of  the  sequence  "\"" . The variable shall be assigned the value of that STRING token
              and, if appropriate, shall be considered a numeric string (see Expressions in awk ), the  variable
              shall  also be assigned its numeric value. Each such variable assignment shall occur just prior to
              the processing of the following file, if any. Thus, an assignment before the first  file  argument
              shall  be  executed  after  the  BEGIN  actions  (if any), while an assignment after the last file
              argument shall occur before the END actions (if any). If there are no file arguments,  assignments
              shall be executed before processing the standard input.

STDIN

       The  standard  input  shall be used only if no file operands are specified, or if a file operand is '-' ;
       see the INPUT FILES section. If the awk program contains no actions and no patterns, but is  otherwise  a
       valid  awk  program,  standard  input  and  any file operands shall not be read and awk shall exit with a
       return status of zero.

INPUT FILES

       Input files to the awk program from any of the following sources shall be text files:

        * Any file operands or their equivalents, achieved by modifying the awk variables ARGV and ARGC

        * Standard input in the absence of any file operands

        * Arguments to the getline function

       Whether the variable RS is set to a value other than a <newline> or not, for these files, implementations
       shall  support  records  terminated  with  the specified separator up to {LINE_MAX} bytes and may support
       longer records.

       If -f progfile is specified, the application shall ensure that the files named by each  of  the  progfile
       option-arguments  are  text  files  and  their  concatenation,  in  the  same order as they appear in the
       arguments, is an awk program.

ENVIRONMENT VARIABLES

       The following environment variables shall affect the execution of awk:

       LANG   Provide a default value for the internationalization variables that are unset or  null.  (See  the
              Base  Definitions  volume of IEEE Std 1003.1-2001, Section 8.2, Internationalization Variables for
              the  precedence  of  internationalization  variables  used  to  determine  the  values  of  locale
              categories.)

       LC_ALL If  set  to  a  non-empty  string value, override the values of all the other internationalization
              variables.

       LC_COLLATE
              Determine the locale  for  the  behavior  of  ranges,  equivalence  classes,  and  multi-character
              collating elements within regular expressions and in comparisons of string values.

       LC_CTYPE
              Determine  the locale for the interpretation of sequences of bytes of text data as characters (for
              example, single-byte as opposed to multi-byte  characters  in  arguments  and  input  files),  the
              behavior  of  character  classes  within  regular expressions, the identification of characters as
              letters, and the mapping of uppercase  and  lowercase  characters  for  the  toupper  and  tolower
              functions.

       LC_MESSAGES
              Determine  the locale that should be used to affect the format and contents of diagnostic messages
              written to standard error.

       LC_NUMERIC
              Determine the radix character used when interpreting numeric input, performing conversions between
              numeric  and  string  values,  and  formatting  numeric  output.  Regardless of locale, the period
              character (the decimal-point character  of  the  POSIX  locale)  is  the  decimal-point  character
              recognized in processing awk programs (including assignments in command line arguments).

       NLSPATH
              Determine the location of message catalogs for the processing of LC_MESSAGES .

       PATH   Determine  the search path when looking for commands executed by system(expr), or input and output
              pipes; see the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 8, Environment Variables.

       In addition, all environment variables shall be visible via the awk variable ENVIRON.

ASYNCHRONOUS EVENTS

       Default.

STDOUT

       The nature of the output files depends on the awk program.

STDERR

       The standard error shall be used only for diagnostic messages.

OUTPUT FILES

       The nature of the output files depends on the awk program.

EXTENDED DESCRIPTION

   Overall Program Structure
       An awk program is composed of pairs of the form:

              pattern { action }

       Either the pattern or the action (including the enclosing brace characters) can be omitted.

       A missing pattern shall match any record of input, and a missing action shall be equivalent to:

              { print }

       Execution of the awk program shall start by  first  executing  the  actions  associated  with  all  BEGIN
       patterns  in  the  order they occur in the program. Then each file operand (or standard input if no files
       were specified) shall be processed in turn by reading data from the file until a record separator is seen
       (  <newline>  by  default).  Before the first reference to a field in the record is evaluated, the record
       shall be split into fields, according to the rules in Regular Expressions , using the value  of  FS  that
       was  current  at the time the record was read. Each pattern in the program then shall be evaluated in the
       order of occurrence, and the action  associated  with  each  pattern  that  matches  the  current  record
       executed.  The  action  for  a  matching pattern shall be executed before evaluating subsequent patterns.
       Finally, the actions associated with all END patterns shall be executed in the order they  occur  in  the
       program.

   Expressions in awk
       Expressions describe computations used in patterns and actions.  In the following table, valid expression
       operations are given in groups from highest precedence first  to  lowest  precedence  last,  with  equal-
       precedence  operators  grouped  between  horizontal lines. In expression evaluation, where the grammar is
       formally ambiguous, higher precedence operators shall be evaluated before lower precedence operators.  In
       this  table  expr,  expr1,  expr2, and expr3 represent any expression, while lvalue represents any entity
       that can be assigned to (that is, on the left side of an assignment  operator).  The  precise  syntax  of
       expressions is given in Grammar .

                                  Table: Expressions in Decreasing Precedence in awk

                    Syntax                Name                      Type of Result   Associativity
                    ( expr )              Grouping                  Type of expr     N/A
                    $expr                 Field reference           String           N/A
                    ++ lvalue             Pre-increment             Numeric          N/A
                    -- lvalue             Pre-decrement             Numeric          N/A
                    lvalue ++             Post-increment            Numeric          N/A
                    lvalue --             Post-decrement            Numeric          N/A
                    expr ^ expr           Exponentiation            Numeric          Right
                    ! expr                Logical not               Numeric          N/A
                    + expr                Unary plus                Numeric          N/A
                    - expr                Unary minus               Numeric          N/A
                    expr * expr           Multiplication            Numeric          Left
                    expr / expr           Division                  Numeric          Left
                    expr % expr           Modulus                   Numeric          Left
                    expr + expr           Addition                  Numeric          Left
                    expr - expr           Subtraction               Numeric          Left
                    expr expr             String concatenation      String           Left
                    expr < expr           Less than                 Numeric          None
                    expr <= expr          Less than or equal to     Numeric          None
                    expr != expr          Not equal to              Numeric          None
                    expr == expr          Equal to                  Numeric          None
                    expr > expr           Greater than              Numeric          None
                    expr >= expr          Greater than or equal to  Numeric          None
                    expr ~ expr           ERE match                 Numeric          None
                    expr !~ expr          ERE non-match             Numeric          None
                    expr in array         Array membership          Numeric          Left
                    ( index ) in array    Multi-dimension array     Numeric          Left
                                          membership
                    expr && expr          Logical AND               Numeric          Left
                    expr || expr          Logical OR                Numeric          Left
                    expr1 ? expr2 : expr3 Conditional expression    Type of selected Right
                                                                    expr2 or expr3
                    lvalue ^= expr        Exponentiation assignment Numeric          Right
                    lvalue %= expr        Modulus assignment        Numeric          Right
                    lvalue *= expr        Multiplication assignment Numeric          Right
                    lvalue /= expr        Division assignment       Numeric          Right
                    lvalue += expr        Addition assignment       Numeric          Right
                    lvalue -= expr        Subtraction assignment    Numeric          Right
                    lvalue = expr         Assignment                Type of expr     Right

       Each expression shall have either a string value, a numeric value, or both. Except as stated for specific
       contexts, the value of an expression shall be implicitly converted to the type needed for the context  in
       which it is used. A string value shall be converted to a numeric value by the equivalent of the following
       calls to functions defined by the ISO C standard:

              setlocale(LC_NUMERIC, "");
              numeric_value = atof(string_value);

       A numeric value that is exactly equal to the value of an integer (see Concepts Derived  from  the  ISO  C
       Standard ) shall be converted to a string by the equivalent of a call to the sprintf function (see String
       Functions ) with the string "%d" as the fmt argument and the numeric value being converted as  the  first
       and  only  expr  argument.  Any other numeric value shall be converted to a string by the equivalent of a
       call to the sprintf function with the value of the variable CONVFMT as the fmt argument and  the  numeric
       value being converted as the first and only expr argument. The result of the conversion is unspecified if
       the value of CONVFMT is not a floating-point format specification. This  volume  of  IEEE Std 1003.1-2001
       specifies  no explicit conversions between numbers and strings. An application can force an expression to
       be treated as a number by adding zero to it, or can force it to be treated as a string  by  concatenating
       the null string ( "" ) to it.

       A string value shall be considered a numeric string if it comes from one of the following:

        1. Field variables

        2. Input from the getline() function

        3. FILENAME

        4. ARGV array elements

        5. ENVIRON array elements

        6. Array elements created by the split() function

        7. A command line variable assignment

        8. Variable assignment from another numeric string variable

       and  after  all  the  following  conversions  have  been applied, the resulting string would lexically be
       recognized as a NUMBER token as described by the lexical conventions in Grammar :

        * All leading and trailing <blank>s are discarded.

        * If the first non- <blank> is '+' or '-' , it is discarded.

        * Changing each occurrence of the decimal point character from the current locale to a period.

       If a '-' character is ignored in the preceding description, the numeric value of the numeric string shall
       be the negation of the numeric value of the recognized NUMBER token.  Otherwise, the numeric value of the
       numeric string shall be the numeric value of the recognized NUMBER token. Whether or not a  string  is  a
       numeric string shall be relevant only in contexts where that term is used in this section.

       When  an  expression  is  used  in a Boolean context, if it has a numeric value, a value of zero shall be
       treated as false and any other value shall be treated as true. Otherwise, a  string  value  of  the  null
       string shall be treated as false and any other value shall be treated as true. A Boolean context shall be
       one of the following:

        * The first subexpression of a conditional expression

        * An expression operated on by logical NOT, logical AND, or logical OR

        * The second expression of a for statement

        * The expression of an if statement

        * The expression of the while clause in either a while or do... while statement

        * An expression used as a pattern (as in Overall Program Structure)

       All arithmetic shall follow the semantics of floating-point arithmetic as specified by the ISO C standard
       (see Concepts Derived from the ISO C Standard ).

       The value of the expression:

              expr1 ^ expr2

       shall be equivalent to the value returned by the ISO C standard function call:

              pow(expr1, expr2)

       The expression:

              lvalue ^= expr

       shall be equivalent to the ISO C standard expression:

              lvalue = pow(lvalue, expr)

       except that lvalue shall be evaluated only once. The value of the expression:

              expr1 % expr2

       shall be equivalent to the value returned by the ISO C standard function call:

              fmod(expr1, expr2)

       The expression:

              lvalue %= expr

       shall be equivalent to the ISO C standard expression:

              lvalue = fmod(lvalue, expr)

       except that lvalue shall be evaluated only once.

       Variables and fields shall be set by the assignment statement:

              lvalue = expression

       and  the  type  of  expression  shall  determine the resulting variable type. The assignment includes the
       arithmetic assignments ( "+=" , "-=" , "*=" , "/=" , "%=" , "^=" , "++" ,  "--"  )  all  of  which  shall
       produce  a  numeric result. The left-hand side of an assignment and the target of increment and decrement
       operators can be one of a variable, an array with index, or a field selector.

       The awk language supplies arrays that are used for  storing  numbers  or  strings.  Arrays  need  not  be
       declared.  They  shall  initially  be empty, and their sizes shall change dynamically. The subscripts, or
       element identifiers, are strings, providing a  type  of  associative  array  capability.  An  array  name
       followed  by  a  subscript  within square brackets can be used as an lvalue and thus as an expression, as
       described in the grammar; see Grammar . Unsubscripted array names can  be  used  in  only  the  following
       contexts:

        * A parameter in a function definition or function call

        * The  NAME token following any use of the keyword in as specified in the grammar (see Grammar ); if the
          name used in this context is not an array name, the behavior is undefined

       A valid array index shall consist of one or more comma-separated expressions, similar to the way in which
       multi-dimensional  arrays  are indexed in some programming languages.  Because awk arrays are really one-
       dimensional, such a comma-separated list shall be converted to  a  single  string  by  concatenating  the
       string  values  of  the  separate  expressions,  each separated from the other by the value of the SUBSEP
       variable.  Thus, the following two index operations shall be equivalent:

              var[expr1, expr2, ... exprn]

              var[expr1 SUBSEP expr2 SUBSEP ... SUBSEP exprn]

       The application shall ensure that a multi-dimensioned index used with the in operator  is  parenthesized.
       The  in  operator,  which  tests  for  the  existence of a particular array element, shall not cause that
       element to exist. Any other reference to a nonexistent array element shall automatically create it.

       Comparisons (with the '<' , "<=" , "!=" , "==" , '>' , and ">=" operators) shall be made  numerically  if
       both  operands  are numeric, if one is numeric and the other has a string value that is a numeric string,
       or if one is numeric and the other has the uninitialized value. Otherwise, operands shall be converted to
       strings  as  required and a string comparison shall be made using the locale-specific collation sequence.
       The value of the comparison expression shall be 1 if the relation is true, or 0 if the relation is false.

   Variables and Special Variables
       Variables can be used in an awk program by referencing them.  With the exception of  function  parameters
       (see  User-Defined Functions ), they are not explicitly declared. Function parameter names shall be local
       to the function; all other variable names shall be global. The same name shall not  be  used  as  both  a
       function  parameter name and as the name of a function or a special awk variable. The same name shall not
       be used both as a variable name with global scope and as the name of a function. The same name shall  not
       be  used  within  the  same  scope  both  as a scalar variable and as an array.  Uninitialized variables,
       including scalar variables, array elements, and field variables, shall have an  uninitialized  value.  An
       uninitialized  value  shall  have  both  a  numeric value of zero and a string value of the empty string.
       Evaluation of variables with an uninitialized value, to either string or numeric, shall be determined  by
       the context in which they are used.

       Field  variables shall be designated by a '$' followed by a number or numerical expression. The effect of
       the field number expression evaluating to anything other than  a  non-negative  integer  is  unspecified;
       uninitialized  variables  or  string  values need not be converted to numeric values in this context. New
       field variables can be created by assigning a value to them.  References to nonexistent fields (that  is,
       fields  after  $NF),  shall  evaluate  to  the  uninitialized value. Such references shall not create new
       fields. However, assigning to a nonexistent field (for example, $(NF+2)=5) shall increase  the  value  of
       NF;  create  any  intervening  fields  with  the  uninitialized  value;  and  cause the value of $0 to be
       recomputed, with the fields being separated by the value of OFS. Each field variable shall have a  string
       value  or  an  uninitialized value when created.  Field variables shall have the uninitialized value when
       created from $0 using FS and the variable does not contain any  characters.  If  appropriate,  the  field
       variable shall be considered a numeric string (see Expressions in awk ).

       Implementations shall support the following other special variables that are set by awk:

       ARGC   The number of elements in the ARGV array.

       ARGV   An array of command line arguments, excluding options and the program argument, numbered from zero
              to ARGC-1.

       The arguments in ARGV can be modified or added to; ARGC can be altered. As  each  input  file  ends,  awk
       shall  treat the next non-null element of ARGV, up to the current value of ARGC-1, inclusive, as the name
       of the next input file. Thus, setting an element of ARGV to null means that it shall not be treated as an
       input file. The name '-' indicates the standard input. If an argument matches the format of an assignment
       operand, this argument shall be treated as an assignment rather than a file argument.

       CONVFMT
              The printf format for converting numbers to strings (except for output statements, where  OFMT  is
              used); "%.6g" by default.

       ENVIRON
              An  array representing the value of the environment, as described in the exec functions defined in
              the System Interfaces volume of IEEE Std 1003.1-2001. The indices of the array  shall  be  strings
              consisting of the names of the environment variables, and the value of each array element shall be
              a string consisting of the value of that variable. If appropriate, the environment variable  shall
              be  considered  a  numeric string (see Expressions in awk ); the array element shall also have its
              numeric value.

       In all cases where the behavior of awk is affected by environment variables (including the environment of
       any  commands  that  awk  executes  via  the  system function or via pipeline redirections with the print
       statement, the printf statement, or the getline function), the environment used shall be the  environment
       at the time awk began executing; it is implementation-defined whether any modification of ENVIRON affects
       this environment.

       FILENAME
              A pathname of the current input file. Inside a BEGIN action the value is undefined. Inside an  END
              action the value shall be the name of the last input file processed.

       FNR    The  ordinal  number  of  the  current record in the current file. Inside a BEGIN action the value
              shall be zero. Inside an END action the value shall be the number of the last record processed  in
              the last file processed.

       FS     Input field separator regular expression; a <space> by default.

       NF     The  number  of  fields  in  the current record. Inside a BEGIN action, the use of NF is undefined
              unless a getline function without a var argument is executed previously.  Inside an END action, NF
              shall  retain  the value it had for the last record read, unless a subsequent, redirected, getline
              function without a var argument is performed prior to entering the END action.

       NR     The ordinal number of the current record from the start of input.  Inside a BEGIN action the value
              shall be zero. Inside an END action the value shall be the number of the last record processed.

       OFMT   The printf format for converting numbers to strings in output statements (see Output Statements );
              "%.6g" by default. The result of the conversion is unspecified if the  value  of  OFMT  is  not  a
              floating-point format specification.

       OFS    The print statement output field separation; <space> by default.

       ORS    The print statement output record separator; a <newline> by default.

       RLENGTH
              The length of the string matched by the match function.

       RS     The  first character of the string value of RS shall be the input record separator; a <newline> by
              default. If RS contains more than one character, the results are unspecified.  If RS is null, then
              records are separated by sequences consisting of a <newline> plus one or more blank lines, leading
              or trailing blank lines shall not result in empty records at the beginning or end  of  the  input,
              and a <newline> shall always be a field separator, no matter what the value of FS is.

       RSTART The  starting  position  of the string matched by the match function, numbering from 1. This shall
              always be equivalent to the return value of the match function.

       SUBSEP The subscript separator string for multi-dimensional arrays; the default value is  implementation-
              defined.

   Regular Expressions
       The  awk  utility  shall  make  use of the extended regular expression notation (see the Base Definitions
       volume of IEEE Std 1003.1-2001, Section 9.4, Extended Regular Expressions) except that it shall allow the
       use  of C-language conventions for escaping special characters within the EREs, as specified in the table
       in the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 5, File Format Notation ( '\\' ,  '\a'  ,
       '\b'  ,  '\f'  ,  '\n'  ,  '\r'  , '\t' , '\v' ) and the following table; these escape sequences shall be
       recognized both inside and outside bracket expressions.  Note that  records  need  not  be  separated  by
       <newline>s  and  string constants can contain <newline>s, so even the "\n" sequence is valid in awk EREs.
       Using a slash character within an ERE requires the escaping shown in the following table.

                                            Table: Escape Sequences in awk

                        Escape
                        Sequence Description                    Meaning
                        \"       Backslash quotation-mark       Quotation-mark character
                        \/       Backslash slash                Slash character
                        \ddd     A backslash character followed The character whose encoding
                                 by the longest sequence of     is represented by the one,
                                 one, two, or three octal-digit two, or three-digit octal
                                 characters (01234567). If all  integer. Multi-byte characters
                                 of the digits are 0 (that is,  require multiple, concatenated
                                 representation of the NUL      escape sequences of this type,
                                 character), the behavior is    including the leading '\' for
                                 undefined.                     each byte.
                        \c       A backslash character followed Undefined
                                 by any character not described
                                 in this table or in the table
                                 in the Base Definitions volume
                                 of IEEE Std 1003.1-2001,
                                 Chapter 5, File Format
                                 Notation ( '\\' , '\a' , '\b'
                                 , '\f' , '\n' , '\r' , '\t' ,
                                 '\v' ).

       A regular expression can be matched against a specific field or string by using one of  the  two  regular
       expression matching operators, '~' and "!~" . These operators shall interpret their right-hand operand as
       a regular expression and their left-hand operand as a string.  If  the  regular  expression  matches  the
       string,  the  '~'  expression shall evaluate to a value of 1, and the "!~" expression shall evaluate to a
       value of 0. (The regular expression matching operation is as defined by the  term  matched  in  the  Base
       Definitions  volume  of  IEEE Std 1003.1-2001, Section 9.1, Regular Expression Definitions, where a match
       occurs on any part of the string unless the regular expression is limited with the circumflex  or  dollar
       sign  special  characters.) If the regular expression does not match the string, the '~' expression shall
       evaluate to a value of 0, and the "!~" expression shall evaluate to a  value  of  1.  If  the  right-hand
       operand  is  any expression other than the lexical token ERE, the string value of the expression shall be
       interpreted as an extended regular expression, including the escape conventions  described  above.   Note
       that  these  same  escape  conventions shall also be applied in determining the value of a string literal
       (the lexical token STRING), and thus shall be applied a second time when a string literal is used in this
       context.

       When an ERE token appears as an expression in any context other than as the right-hand of the '~' or "!~"
       operator or as one of the built-in function  arguments  described  below,  the  value  of  the  resulting
       expression shall be the equivalent of:

              $0 ~ /ere/

       The ere argument to the gsub, match, sub functions, and the fs argument to the split function (see String
       Functions ) shall be interpreted as extended regular expressions. These  can  be  either  ERE  tokens  or
       arbitrary  expressions,  and shall be interpreted in the same manner as the right-hand side of the '~' or
       "!~" operator.

       An extended regular expression can be used to separate fields by using the -F ERE option or by  assigning
       a  string  containing  the  expression  to the built-in variable FS. The default value of the FS variable
       shall be a single <space>. The following describes FS behavior:

        1. If FS is a null string, the behavior is unspecified.

        2. If FS is a single character:

            a. If FS is <space>, skip leading and trailing <blank>s; fields shall be delimited by sets of one or
               more <blank>s.

            b. Otherwise, if FS is any other character c, fields shall be delimited by each single occurrence of
               c.

        3. Otherwise, the string value of FS shall be considered to be  an  extended  regular  expression.  Each
           occurrence of a sequence matching the extended regular expression shall delimit fields.

       Except  for  the  '~'  and "!~" operators, and in the gsub, match, split, and sub built-in functions, ERE
       matching shall be based on input records; that is, record separator characters (the  first  character  of
       the  value  of  the  variable  RS,  <newline>  by  default)  cannot be embedded in the expression, and no
       expression shall match the record  separator  character.  If  the  record  separator  is  not  <newline>,
       <newline>s  embedded  in the expression can be matched. For the '~' and "!~" operators, and in those four
       built-in functions, ERE matching shall be based on  text  strings;  that  is,  any  character  (including
       <newline>  and  the  record  separator)  can be embedded in the pattern, and an appropriate pattern shall
       match any character. However, in all awk ERE matching, the use of one  or  more  NUL  characters  in  the
       pattern, input record, or text string produces undefined results.

   Patterns
       A  pattern  is any valid expression, a range specified by two expressions separated by a comma, or one of
       the two special patterns BEGIN or END.

   Special Patterns
       The awk utility shall recognize two special patterns, BEGIN and END. Each BEGIN pattern shall be  matched
       once  and its associated action executed before the first record of input is read (except possibly by use
       of the getline function-see Input/Output and General Functions - in a  prior  BEGIN  action)  and  before
       command  line  assignment  is  done.  Each  END  pattern  shall be matched once and its associated action
       executed after the last record of input has been read. These two patterns shall have associated actions.

       BEGIN and END shall not combine with other patterns. Multiple BEGIN and END patterns  shall  be  allowed.
       The  actions  associated with the BEGIN patterns shall be executed in the order specified in the program,
       as are the END actions. An END pattern can precede a BEGIN pattern in a program.

       If an awk program consists of only actions with the pattern BEGIN,  and  the  BEGIN  action  contains  no
       getline  function,  awk  shall  exit  without reading its input when the last statement in the last BEGIN
       action is executed. If an awk program consists of only actions with the pattern END or only actions  with
       the  patterns  BEGIN  and  END,  the  input  shall  be  read before the statements in the END actions are
       executed.

   Expression Patterns
       An expression pattern shall be evaluated as if it were an expression in a Boolean context. If the  result
       is  true, the pattern shall be considered to match, and the associated action (if any) shall be executed.
       If the result is false, the action shall not be executed.

   Pattern Ranges
       A pattern range consists of two expressions separated by a comma; in  this  case,  the  action  shall  be
       performed  for  all records between a match of the first expression and the following match of the second
       expression, inclusive. At this point, the pattern  range  can  be  repeated  starting  at  input  records
       subsequent to the end of the matched range.

   Actions
       An  action  is  a sequence of statements as shown in the grammar in Grammar . Any single statement can be
       replaced by a statement list enclosed in braces. The  application  shall  ensure  that  statements  in  a
       statement  list  are  separated  by  <newline>s  or  semicolons.  Statements in a statement list shall be
       executed sequentially in the order that they appear.

       The expression acting as the conditional in an if statement shall be evaluated and if it is  non-zero  or
       non-null,  the  following  statement  shall  be  executed;  otherwise,  if else is present, the statement
       following the else shall be executed.

       The if, while, do... while, for, break, and continue statements are based  on  the  ISO C  standard  (see
       Concepts  Derived  from  the  ISO  C  Standard ), except that the Boolean expressions shall be treated as
       described in Expressions in awk , and except in the case of:

              for (variable in array)

       which shall iterate, assigning each index of array to variable in an unspecified order.  The  results  of
       adding  new  elements  to  array  within  such a for loop are undefined. If a break or continue statement
       occurs outside of a loop, the behavior is undefined.

       The delete statement shall remove an individual array element.   Thus,  the  following  code  deletes  an
       entire array:

              for (index in array)
                  delete array[index]

       The  next  statement  shall cause all further processing of the current input record to be abandoned. The
       behavior is undefined if a next statement appears or is invoked in a BEGIN or END action.

       The exit statement shall invoke all END actions in the order in which they occur in  the  program  source
       and  then  terminate  the  program  without reading further input. An exit statement inside an END action
       shall terminate the program without further execution of END actions. If an expression is specified in an
       exit  statement,  its  numeric  value  shall  be  the  exit  status  of awk, unless subsequent errors are
       encountered or a subsequent exit statement with an expression is executed.

   Output Statements
       Both print and printf statements shall write to standard output by default. The output shall  be  written
       to the location specified by output_redirection if one is supplied, as follows:

              > expression>> expression| expression

       In all cases, the expression shall be evaluated to produce a string that is used as a pathname into which
       to write (for '>' or ">>" ) or as a command to be executed (for '|' ). Using the first two forms, if  the
       file of that name is not currently open, it shall be opened, creating it if necessary and using the first
       form, truncating the file. The output then shall be appended to the file. As long  as  the  file  remains
       open,  subsequent calls in which expression evaluates to the same string value shall simply append output
       to the file. The file remains open until the close function (see Input/Output and General Functions )  is
       called with an expression that evaluates to the same string value.

       The  third  form  shall  write  output onto a stream piped to the input of a command. The stream shall be
       created if no stream is currently open with the value of expression as  its  command  name.   The  stream
       created  shall  be  equivalent  to  one  created  by a call to the popen() function defined in the System
       Interfaces volume of IEEE Std 1003.1-2001 with the value of expression as  the  command  argument  and  a
       value of w as the mode argument. As long as the stream remains open, subsequent calls in which expression
       evaluates to the same string value shall write output to the existing stream.  The  stream  shall  remain
       open until the close function (see Input/Output and General Functions ) is called with an expression that
       evaluates to the same string value.  At that time, the stream shall be closed as if  by  a  call  to  the
       pclose() function defined in the System Interfaces volume of IEEE Std 1003.1-2001.

       As  described  in detail by the grammar in Grammar , these output statements shall take a comma-separated
       list of expressions referred to in the grammar by the non-terminal symbols expr_list, print_expr_list, or
       print_expr_list_opt. This list is referred to here as the expression list, and each member is referred to
       as an expression argument.

       The print statement shall write the value of each expression argument onto the  indicated  output  stream
       separated  by  the  current output field separator (see variable OFS above), and terminated by the output
       record separator (see variable ORS above). All expression arguments shall  be  taken  as  strings,  being
       converted  if necessary; this conversion shall be as described in Expressions in awk , with the exception
       that the printf format in OFMT shall be used instead of the value in CONVFMT. An  empty  expression  list
       shall stand for the whole input record ($0).

       The printf statement shall produce output based on a notation similar to the File Format Notation used to
       describe file formats in this  volume  of  IEEE Std 1003.1-2001  (see  the  Base  Definitions  volume  of
       IEEE Std 1003.1-2001,  Chapter  5, File Format Notation).  Output shall be produced as specified with the
       first expression argument as the string format and subsequent expression arguments as the strings arg1 to
       argn, inclusive, with the following exceptions:

        1. The  format shall be an actual character string rather than a graphical representation. Therefore, it
           cannot contain empty character positions. The <space> in the format string, in any context other than
           a flag of a conversion specification, shall be treated as an ordinary character that is copied to the
           output.

        2. If the character set contains a ' ' character and that character appears in  the  format  string,  it
           shall be treated as an ordinary character that is copied to the output.

        3. The  escape  sequences beginning with a backslash character shall be treated as sequences of ordinary
           characters that are copied to the output.  Note  that  these  same  sequences  shall  be  interpreted
           lexically  by awk when they appear in literal strings, but they shall not be treated specially by the
           printf statement.

        4. A field width or precision can be specified as the '*' character instead of a digit string.  In  this
           case  the  next argument from the expression list shall be fetched and its numeric value taken as the
           field width or precision.

        5. The implementation shall not precede or follow output from the d or u conversion specifier characters
           with <blank>s not specified by the format string.

        6. The  implementation  shall  not precede output from the o conversion specifier character with leading
           zeros not specified by the format string.

        7. For the c conversion specifier character: if the argument has a numeric value,  the  character  whose
           encoding  is that value shall be output. If the value is zero or is not the encoding of any character
           in the character set, the behavior is undefined. If the argument does not have a numeric  value,  the
           first  character  of the string value shall be output; if the string does not contain any characters,
           the behavior is undefined.

        8. For each conversion specification that consumes an argument, the next expression  argument  shall  be
           evaluated.  With  the exception of the c conversion specifier character, the value shall be converted
           (according to the rules specified in Expressions in awk ) to the appropriate type for the  conversion
           specification.

        9. If  there  are  insufficient expression arguments to satisfy all the conversion specifications in the
           format string, the behavior is undefined.

       10. If any character sequence in the format string begins with a '%' character, but does not form a valid
           conversion specification, the behavior is unspecified.

       Both print and printf can output at least {LINE_MAX} bytes.

   Functions
       The awk language has a variety of built-in functions: arithmetic, string, input/output, and general.

   Arithmetic Functions
       The arithmetic functions, except for int, shall be based on the ISO C standard (see Concepts Derived from
       the ISO C Standard ). The behavior is undefined in cases where the ISO C standard specifies that an error
       be  returned  or  that  the  behavior  is undefined. Although the grammar (see Grammar ) permits built-in
       functions to appear with no arguments or parentheses, unless the argument or parentheses are indicated as
       optional in the following list (by displaying them within the "[]" brackets), such use is undefined.

       atan2(y,x)
              Return arctangent of y/x in radians in the range [-pi,pi].

       cos(x) Return cosine of x, where x is in radians.

       sin(x) Return sine of x, where x is in radians.

       exp(x) Return the exponential function of x.

       log(x) Return the natural logarithm of x.

       sqrt(x)
              Return the square root of x.

       int(x) Return the argument truncated to an integer. Truncation shall be toward 0 when x>0.

       rand() Return a random number n, such that 0<=n<1.

       srand([expr])
              Set  the  seed value for rand to expr or use the time of day if expr is omitted. The previous seed
              value shall be returned.

   String Functions
       The string functions in the following list shall be  supported.  Although  the  grammar  (see  Grammar  )
       permits built-in functions to appear with no arguments or parentheses, unless the argument or parentheses
       are indicated as optional in the following list (by displaying them within the "[]" brackets),  such  use
       is undefined.

       gsub(ere, repl[, in])
              Behave  like  sub  (see  below),  except  that  it  shall  replace  all occurrences of the regular
              expression (like the ed utility global substitute) in $0 or in the in argument, when specified.

       index(s, t)
              Return the position, in characters, numbering from 1, in string s where string t first occurs,  or
              zero if it does not occur at all.

       length[([s])]
              Return  the  length, in characters, of its argument taken as a string, or of the whole record, $0,
              if there is no argument.

       match(s, ere)
              Return the position, in characters, numbering from 1, in  string  s  where  the  extended  regular
              expression  ere  occurs,  or zero if it does not occur at all. RSTART shall be set to the starting
              position (which is the same as the returned value), zero if no match is found;  RLENGTH  shall  be
              set to the length of the matched string, -1 if no match is found.

       split(s, a[, fs  ])
              Split  the  string  s into array elements a[1], a[2], ..., a[n], and return n. All elements of the
              array shall be deleted before the split is performed. The separation shall be done with the ERE fs
              or  with  the  field separator FS if fs is not given. Each array element shall have a string value
              when created and, if appropriate, the array element shall be  considered  a  numeric  string  (see
              Expressions in awk ). The effect of a null string as the value of fs is unspecified.

       sprintf(fmt, expr, expr, ...)
              Format  the  expressions  according  to  the  printf  format given by fmt and return the resulting
              string.

       sub(ere, repl[, in  ])
              Substitute the string repl in place of the first instance of the extended regular  expression  ERE
              in  string in and return the number of substitutions. An ampersand ( '&' ) appearing in the string
              repl shall be replaced by the string from in that matches the ERE. An ampersand  preceded  with  a
              backslash  (  '\'  ) shall be interpreted as the literal ampersand character. An occurrence of two
              consecutive backslashes shall be interpreted as just a single  literal  backslash  character.  Any
              other occurrence of a backslash (for example, preceding any other character) shall be treated as a
              literal backslash character. Note that if repl is a string literal (the lexical token STRING;  see
              Grammar  ), the handling of the ampersand character occurs after any lexical processing, including
              any lexical backslash escape sequence processing. If in is specified and it is not an lvalue  (see
              Expressions  in  awk  ),  the  behavior  is undefined. If in is omitted, awk shall use the current
              record ($0) in its place.

       substr(s, m[, n  ])
              Return the at most n-character substring of s that begins at position m, numbering from 1. If n is
              omitted,  or  if  n  specifies  more  characters  than  are  left in the string, the length of the
              substring shall be limited by the length of the string s.

       tolower(s)
              Return a string based on the string s. Each character in s that is an uppercase  letter  specified
              to  have a tolower mapping by the LC_CTYPE category of the current locale shall be replaced in the
              returned string by the lowercase letter specified by the mapping. Other characters in s  shall  be
              unchanged in the returned string.

       toupper(s)
              Return a string based on the string s. Each character in s that is a lowercase letter specified to
              have a toupper mapping by the LC_CTYPE category of the current locale is replaced in the  returned
              string  by  the  uppercase letter specified by the mapping. Other characters in s are unchanged in
              the returned string.

       All of the preceding functions that take ERE  as  a  parameter  expect  a  pattern  or  a  string  valued
       expression that is a regular expression as defined in Regular Expressions .

   Input/Output and General Functions
       The input/output and general functions are:

       close(expression)
              Close  the  file  or pipe opened by a print or printf statement or a call to getline with the same
              string-valued expression. The limit on the number of open expression arguments is  implementation-
              defined.  If  the close was successful, the function shall return zero; otherwise, it shall return
              non-zero.

       expression |  getline [var]
              Read a record of input from a stream piped from the output of a  command.   The  stream  shall  be
              created  if  no  stream  is  currently  open with the value of expression as its command name. The
              stream created shall be equivalent to one created by a call to the popen() function with the value
              of expression as the command argument and a value of r as the mode argument. As long as the stream
              remains open, subsequent calls in which expression evaluates to the same string value  shall  read
              subsequent  records  from  the  stream.  The  stream shall remain open until the close function is
              called with an expression that evaluates to the same string value. At that time, the stream  shall
              be  closed  as  if  by a call to the pclose() function. If var is omitted, $0 and NF shall be set;
              otherwise, var shall be set and, if appropriate, it shall be  considered  a  numeric  string  (see
              Expressions in awk ).

       The  getline  operator  can form ambiguous constructs when there are unparenthesized operators (including
       concatenate) to the left of the '|' (to the beginning of  the  expression  containing  getline).  In  the
       context  of  the  '$' operator, '|' shall behave as if it had a lower precedence than '$' . The result of
       evaluating other operators is unspecified, and conforming applications shall  parenthesize  properly  all
       such usages.

       getline
              Set  $0  to  the next input record from the current input file. This form of getline shall set the
              NF, NR, and FNR variables.

       getline  var
              Set variable var to the next input record from the current input file  and,  if  appropriate,  var
              shall be considered a numeric string (see Expressions in awk ). This form of getline shall set the
              FNR and NR variables.

       getline [var]  < expression
              Read the next record of input from a named file. The expression shall be evaluated  to  produce  a
              string  that  is  used  as a pathname. If the file of that name is not currently open, it shall be
              opened. As long as the stream remains open, subsequent calls in which expression evaluates to  the
              same  string  value  shall read subsequent records from the file. The file shall remain open until
              the close function is called with an expression that evaluates to the same string value. If var is
              omitted,  $0  and  NF  shall  be set; otherwise, var shall be set and, if appropriate, it shall be
              considered a numeric string (see Expressions in awk ).

       The getline operator can form ambiguous  constructs  when  there  are  unparenthesized  binary  operators
       (including concatenate) to the right of the '<' (up to the end of the expression containing the getline).
       The result of evaluating such a construct is unspecified, and conforming applications shall  parenthesize
       properly all such usages.

       system(expression)
              Execute the command given by expression in a manner equivalent to the system() function defined in
              the System Interfaces volume of IEEE Std 1003.1-2001 and return the exit status of the command.

       All forms of getline shall return 1 for successful input, zero for end-of-file, and -1 for an error.

       Where strings are used as the name of a file or pipeline, the application shall ensure that  the  strings
       are  textually  identical.   The  terminology "same string value" implies that "equivalent strings", even
       those that differ only by <space>s, represent different files.

   User-Defined Functions
       The awk language also provides user-defined functions. Such functions can be defined as:

              function name([parameter, ...]) { statements }

       A function can be referred to anywhere in an  awk  program;  in  particular,  its  use  can  precede  its
       definition. The scope of a function is global.

       Function  parameters,  if present, can be either scalars or arrays; the behavior is undefined if an array
       name is passed as a parameter that the function uses as a scalar, or if a scalar expression is passed  as
       a  parameter  that  the function uses as an array. Function parameters shall be passed by value if scalar
       and by reference if array name.

       The number of parameters in the function definition need not  match  the  number  of  parameters  in  the
       function  call.  Excess formal parameters can be used as local variables. If fewer arguments are supplied
       in a function call than are in the function definition,  the  extra  parameters  that  are  used  in  the
       function  body as scalars shall evaluate to the uninitialized value until they are otherwise initialized,
       and the extra parameters that are used in the function body as arrays shall be treated  as  uninitialized
       arrays where each element evaluates to the uninitialized value until otherwise initialized.

       When  invoking  a  function,  no  white  space  can  be  placed between the function name and the opening
       parenthesis. Function calls can be nested and recursive calls can be made  upon  functions.  Upon  return
       from  any nested or recursive function call, the values of all of the calling function's parameters shall
       be unchanged, except for array parameters passed by reference. The return statement can be used to return
       a value. If a return statement appears outside of a function definition, the behavior is undefined.

       In  the  function definition, <newline>s shall be optional before the opening brace and after the closing
       brace. Function definitions can appear anywhere in the program where a pattern-action pair is allowed.

   Grammar
       The grammar in this section and the lexical conventions in the following section shall together  describe
       the  syntax  for awk programs. The general conventions for this style of grammar are described in Grammar
       Conventions . A valid program can be represented as the non-terminal symbol program in the grammar.  This
       formal syntax shall take precedence over the preceding text syntax description.

              %token NAME NUMBER STRING ERE
              %token FUNC_NAME   /* Name followed by '(' without white space. */

              /* Keywords  */
              %token       Begin   End
              /*          'BEGIN' 'END'                            */

              %token       Break   Continue   Delete   Do   Else
              /*          'break' 'continue' 'delete' 'do' 'else'  */

              %token       Exit   For   Function   If   In
              /*          'exit' 'for' 'function' 'if' 'in'        */

              %token       Next   Print   Printf   Return   While
              /*          'next' 'print' 'printf' 'return' 'while' */

              /* Reserved function names */
              %token BUILTIN_FUNC_NAME
                          /* One token for the following:
                           * atan2 cos sin exp log sqrt int rand srand
                           * gsub index length match split sprintf sub
                           * substr tolower toupper close system
                           */
              %token GETLINE
                          /* Syntactically different from other built-ins. */

              /* Two-character tokens. */
              %token ADD_ASSIGN SUB_ASSIGN MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN POW_ASSIGN
              /*     '+='       '-='       '*='       '/='       '%='       '^=' */

              %token OR   AND  NO_MATCH   EQ   LE   GE   NE   INCR  DECR  APPEND
              /*     '||' '&&' '!~' '==' '<=' '>=' '!=' '++'  '--'  '>>'   */

              /* One-character tokens. */
              %token '{' '}' '(' ')' '[' ']' ',' ';' NEWLINE
              %token '+' '-' '*' '%' '^' '!' '>' '<' '|' '?' ':' '~' '$' '='

              %start program
              %%

              program          : item_list
                               | actionless_item_list
                               ;

              item_list        : newline_opt
                               | actionless_item_list item terminator
                               | item_list            item terminator
                               | item_list          action terminator
                               ;

              actionless_item_list : item_list            pattern terminator
                               | actionless_item_list pattern terminator
                               ;

              item             : pattern action
                               | Function NAME      '(' param_list_opt ')'
                                     newline_opt action
                               | Function FUNC_NAME '(' param_list_opt ')'
                                     newline_opt action
                               ;

              param_list_opt   : /* empty */
                               | param_list
                               ;

              param_list       : NAME
                               | param_list ',' NAME
                               ;

              pattern          : Begin
                               | End
                               | expr
                               | expr ',' newline_opt expr
                               ;

              action           : '{' newline_opt                             '}'
                               | '{' newline_opt terminated_statement_list   '}'
                               | '{' newline_opt unterminated_statement_list '}'
                               ;

              terminator       : terminator ';'
                               | terminator NEWLINE
                               |            ';'
                               |            NEWLINE
                               ;

              terminated_statement_list : terminated_statement
                               | terminated_statement_list terminated_statement
                               ;

              unterminated_statement_list : unterminated_statement
                               | terminated_statement_list unterminated_statement
                               ;

              terminated_statement : action newline_opt
                               | If '(' expr ')' newline_opt terminated_statement
                               | If '(' expr ')' newline_opt terminated_statement
                                     Else newline_opt terminated_statement
                               | While '(' expr ')' newline_opt terminated_statement
                               | For '(' simple_statement_opt ';'
                                    expr_opt ';' simple_statement_opt ')' newline_opt
                                    terminated_statement
                               | For '(' NAME In NAME ')' newline_opt
                                    terminated_statement
                               | ';' newline_opt
                               | terminatable_statement NEWLINE newline_opt
                               | terminatable_statement ';'     newline_opt
                               ;

              unterminated_statement : terminatable_statement
                               | If '(' expr ')' newline_opt unterminated_statement
                               | If '(' expr ')' newline_opt terminated_statement
                                    Else newline_opt unterminated_statement
                               | While '(' expr ')' newline_opt unterminated_statement
                               | For '(' simple_statement_opt ';'
                                expr_opt ';' simple_statement_opt ')' newline_opt
                                    unterminated_statement
                               | For '(' NAME In NAME ')' newline_opt
                                    unterminated_statement
                               ;

              terminatable_statement : simple_statement
                               | Break
                               | Continue
                               | Next
                               | Exit expr_opt
                               | Return expr_opt
                               | Do newline_opt terminated_statement While '(' expr ')'
                               ;

              simple_statement_opt : /* empty */
                               | simple_statement
                               ;

              simple_statement : Delete NAME '[' expr_list ']'
                               | expr
                               | print_statement
                               ;

              print_statement  : simple_print_statement
                               | simple_print_statement output_redirection
                               ;

              simple_print_statement : Print  print_expr_list_opt
                               | Print  '(' multiple_expr_list ')'
                               | Printf print_expr_list
                               | Printf '(' multiple_expr_list ')'
                               ;

              output_redirection : '>'    expr
                               | APPEND expr
                               | '|'    expr
                               ;

              expr_list_opt    : /* empty */
                               | expr_list
                               ;

              expr_list        : expr
                               | multiple_expr_list
                               ;

              multiple_expr_list : expr ',' newline_opt expr
                               | multiple_expr_list ',' newline_opt expr
                               ;

              expr_opt         : /* empty */
                               | expr
                               ;

              expr             : unary_expr
                               | non_unary_expr
                               ;

              unary_expr       : '+' expr
                               | '-' expr
                               | unary_expr '^'      expr
                               | unary_expr '*'      expr
                               | unary_expr '/'      expr
                               | unary_expr '%'      expr
                               | unary_expr '+'      expr
                               | unary_expr '-'      expr
                               | unary_expr          non_unary_expr
                               | unary_expr '<'      expr
                               | unary_expr LE       expr
                               | unary_expr NE       expr
                               | unary_expr EQ       expr
                               | unary_expr '>'      expr
                               | unary_expr GE       expr
                               | unary_expr '~'      expr
                               | unary_expr NO_MATCH expr
                               | unary_expr In NAME
                               | unary_expr AND newline_opt expr
                               | unary_expr OR  newline_opt expr
                               | unary_expr '?' expr ':' expr
                               | unary_input_function
                               ;

              non_unary_expr   : '(' expr ')'
                               | '!' expr
                               | non_unary_expr '^'      expr
                               | non_unary_expr '*'      expr
                               | non_unary_expr '/'      expr
                               | non_unary_expr '%'      expr
                               | non_unary_expr '+'      expr
                               | non_unary_expr '-'      expr
                               | non_unary_expr          non_unary_expr
                               | non_unary_expr '<'      expr
                               | non_unary_expr LE       expr
                               | non_unary_expr NE       expr
                               | non_unary_expr EQ       expr
                               | non_unary_expr '>'      expr
                               | non_unary_expr GE       expr
                               | non_unary_expr '~'      expr
                               | non_unary_expr NO_MATCH expr
                               | non_unary_expr In NAME
                               | '(' multiple_expr_list ')' In NAME
                               | non_unary_expr AND newline_opt expr
                               | non_unary_expr OR  newline_opt expr
                               | non_unary_expr '?' expr ':' expr
                               | NUMBER
                               | STRING
                               | lvalue
                               | ERE
                               | lvalue INCR
                               | lvalue DECR
                               | INCR lvalue
                               | DECR lvalue
                               | lvalue POW_ASSIGN expr
                               | lvalue MOD_ASSIGN expr
                               | lvalue MUL_ASSIGN expr
                               | lvalue DIV_ASSIGN expr
                               | lvalue ADD_ASSIGN expr
                               | lvalue SUB_ASSIGN expr
                               | lvalue '=' expr
                               | FUNC_NAME '(' expr_list_opt ')'
                                    /* no white space allowed before '(' */
                               | BUILTIN_FUNC_NAME '(' expr_list_opt ')'
                               | BUILTIN_FUNC_NAME
                               | non_unary_input_function
                               ;

              print_expr_list_opt : /* empty */
                               | print_expr_list
                               ;

              print_expr_list  : print_expr
                               | print_expr_list ',' newline_opt print_expr
                               ;

              print_expr       : unary_print_expr
                               | non_unary_print_expr
                               ;

              unary_print_expr : '+' print_expr
                               | '-' print_expr
                               | unary_print_expr '^'      print_expr
                               | unary_print_expr '*'      print_expr
                               | unary_print_expr '/'      print_expr
                               | unary_print_expr '%'      print_expr
                               | unary_print_expr '+'      print_expr
                               | unary_print_expr '-'      print_expr
                               | unary_print_expr          non_unary_print_expr
                               | unary_print_expr '~'      print_expr
                               | unary_print_expr NO_MATCH print_expr
                               | unary_print_expr In NAME
                               | unary_print_expr AND newline_opt print_expr
                               | unary_print_expr OR  newline_opt print_expr
                               | unary_print_expr '?' print_expr ':' print_expr
                               ;

              non_unary_print_expr : '(' expr ')'
                               | '!' print_expr
                               | non_unary_print_expr '^'      print_expr
                               | non_unary_print_expr '*'      print_expr
                               | non_unary_print_expr '/'      print_expr
                               | non_unary_print_expr '%'      print_expr
                               | non_unary_print_expr '+'      print_expr
                               | non_unary_print_expr '-'      print_expr
                               | non_unary_print_expr          non_unary_print_expr
                               | non_unary_print_expr '~'      print_expr
                               | non_unary_print_expr NO_MATCH print_expr
                               | non_unary_print_expr In NAME
                               | '(' multiple_expr_list ')' In NAME
                               | non_unary_print_expr AND newline_opt print_expr
                               | non_unary_print_expr OR  newline_opt print_expr
                               | non_unary_print_expr '?' print_expr ':' print_expr
                               | NUMBER
                               | STRING
                               | lvalue
                               | ERE
                               | lvalue INCR
                               | lvalue DECR
                               | INCR lvalue
                               | DECR lvalue
                               | lvalue POW_ASSIGN print_expr
                               | lvalue MOD_ASSIGN print_expr
                               | lvalue MUL_ASSIGN print_expr
                               | lvalue DIV_ASSIGN print_expr
                               | lvalue ADD_ASSIGN print_expr
                               | lvalue SUB_ASSIGN print_expr
                               | lvalue '=' print_expr
                               | FUNC_NAME '(' expr_list_opt ')'
                                   /* no white space allowed before '(' */
                               | BUILTIN_FUNC_NAME '(' expr_list_opt ')'
                               | BUILTIN_FUNC_NAME
                               ;

              lvalue           : NAME
                               | NAME '[' expr_list ']'
                               | '$' expr
                               ;

              non_unary_input_function : simple_get
                               | simple_get '<' expr
                               | non_unary_expr '|' simple_get
                               ;

              unary_input_function : unary_expr '|' simple_get
                               ;

              simple_get       : GETLINE
                               | GETLINE lvalue
                               ;

              newline_opt      : /* empty */
                               | newline_opt NEWLINE
                               ;

       This grammar has several ambiguities that shall be resolved as follows:

        * Operator precedence and associativity shall be as described in Expressions in Decreasing Precedence in
          awk .

        * In case of ambiguity, an else shall be associated with the most immediately preceding  if  that  would
          satisfy the grammar.

        * In some contexts, a slash ( '/' ) that is used to surround an ERE could also be the division operator.
          This shall be resolved in such a way that wherever the division operator  could  appear,  a  slash  is
          assumed to be the division operator. (There is no unary division operator.)

       One  convention  that  might  not  be obvious from the formal grammar is where <newline>s are acceptable.
       There are several obvious placements such as terminating a statement, and a  backslash  can  be  used  to
       escape  <newline>s  between  any lexical tokens. In addition, <newline>s without backslashes can follow a
       comma, an open brace, logical AND operator ( "&&" ), logical OR operator ( "||" ), the  do  keyword,  the
       else keyword, and the closing parenthesis of an if, for, or while statement. For example:

              { print  $1,
                       $2 }

   Lexical Conventions
       The lexical conventions for awk programs, with respect to the preceding grammar, shall be as follows:

        1. Except  as  noted,  awk  shall recognize the longest possible token or delimiter beginning at a given
           point.

        2. A comment shall consist of any characters beginning with the number sign character and terminated by,
           but  excluding  the next occurrence of, a <newline>. Comments shall have no effect, except to delimit
           lexical tokens.

        3. The <newline> shall be recognized as the token NEWLINE.

        4. A backslash character immediately followed by a <newline> shall have no effect.

        5. The token STRING shall represent a string constant. A string constant shall begin with the  character
           '  .' Within a string constant, a backslash character shall be considered to begin an escape sequence
           as specified in the table in the Base Definitions volume of  IEEE Std 1003.1-2001,  Chapter  5,  File
           Format  Notation  (  '\\' , '\a' , '\b' , '\f' , '\n' , '\r' , '\t' , '\v' ). In addition, the escape
           sequences in Expressions in Decreasing Precedence in awk shall be recognized. A <newline>  shall  not
           occur  within  a  string  constant.  A  string  constant  shall  be terminated by the first unescaped
           occurrence of the character '' after the one that begins the string constant. The value of the string
           shall  be  the  sequence  of all unescaped characters and values of escape sequences between, but not
           including, the two delimiting '' characters.

        6. The token ERE represents an extended regular expression constant.  An ERE constant shall  begin  with
           the  slash  character.  Within an ERE constant, a backslash character shall be considered to begin an
           escape sequence as specified in the table in the Base  Definitions  volume  of  IEEE Std 1003.1-2001,
           Chapter  5,  File  Format  Notation.  In  addition, the escape sequences in Expressions in Decreasing
           Precedence in awk shall be recognized. The application shall ensure that a <newline> does  not  occur
           within  an ERE constant. An ERE constant shall be terminated by the first unescaped occurrence of the
           slash character after the  one  that  begins  the  ERE  constant.  The  extended  regular  expression
           represented  by  the  ERE  constant  shall  be the sequence of all unescaped characters and values of
           escape sequences between, but not including, the two delimiting slash characters.

        7. A <blank> shall have no effect, except to delimit lexical tokens or within STRING or ERE tokens.

        8. The token NUMBER shall represent a numeric constant. Its form and numeric value shall  be  equivalent
           to  either  of  the  tokens floating-constant or integer-constant as specified by the ISO C standard,
           with the following exceptions:

            a. An integer constant cannot begin with 0x or include the hexadecimal digits 'a' , 'b' , 'c' ,  'd'
               , 'e' , 'f' , 'A' , 'B' , 'C' , 'D' , 'E' , or 'F' .

            b. The value of an integer constant beginning with 0 shall be taken in decimal rather than octal.

            c. An integer constant cannot include a suffix ( 'u' , 'U' , 'l' , or 'L' ).

            d. A floating constant cannot include a suffix ( 'f' , 'F' , 'l' , or 'L' ).

       If  the value is too large or too small to be representable (see Concepts Derived from the ISO C Standard
       ), the behavior is undefined.

        9. A sequence of underscores, digits, and alphabetics from the portable  character  set  (see  the  Base
           Definitions  volume  of IEEE Std 1003.1-2001, Section 6.1, Portable Character Set), beginning with an
           underscore or alphabetic, shall be considered a word.

       10. The following words are keywords that shall be recognized as individual tokens; the name of the token
           is the same as the keyword:

               BEGIN           delete          END             function        in              printf
               break           do              exit            getline         next            return
               continue        else            for             if              print           while

       11. The  following  words  are  names  of  built-in  functions  and  shall  be  recognized  as  the token
           BUILTIN_FUNC_NAME:

               atan2           gsub            log             split           sub             toupper
               close           index           match           sprintf         substr
               cos             int             rand            sqrt            system
               exp             length          sin             srand           tolower

       The above-listed keywords and names of built-in functions are considered reserved words.

       12. The token NAME shall consist of a word that is not a keyword or a name of a built-in function and  is
           not followed immediately (without any delimiters) by the '(' character.

       13. The  token  FUNC_NAME shall consist of a word that is not a keyword or a name of a built-in function,
           followed immediately (without any delimiters) by the '(' character. The '(' character  shall  not  be
           included as part of the token.

       14. The following two-character sequences shall be recognized as the named tokens:

                                       Token Name   Sequence   Token Name   Sequence
                                       ADD_ASSIGN   +=         NO_MATCH     !~
                                       SUB_ASSIGN   -=         EQ           ==
                                       MUL_ASSIGN   *=         LE           <=
                                       DIV_ASSIGN   /=         GE           >=
                                       MOD_ASSIGN   %=         NE           !=
                                       POW_ASSIGN   ^=         INCR         ++
                                       OR           ||         DECR         --
                                       AND          &&         APPEND       >>

       15. The following single characters shall be recognized as tokens whose names are the character:

           <newline> { } ( ) [ ] , ; + - * % ^ ! > < | ? : ~ $ =

       There  is  a  lexical  ambiguity  between  the token ERE and the tokens '/' and DIV_ASSIGN. When an input
       sequence begins with a slash character in any syntactic context where the token '/' or  DIV_ASSIGN  could
       appear  as the next token in a valid program, the longer of those two tokens that can be recognized shall
       be recognized. In any other syntactic context where the token ERE could appear as the  next  token  in  a
       valid program, the token ERE shall be recognized.

EXIT STATUS

       The following exit values shall be returned:

        0     All input files were processed successfully.

       >0     An error occurred.

       The exit status can be altered within the program by using an exit expression.

CONSEQUENCES OF ERRORS

       If  any  file  operand  is  specified and the named file cannot be accessed, awk shall write a diagnostic
       message to standard error and terminate without any further action.

       If the program specified by either the program operand or a progfile operand is not a valid  awk  program
       (as specified in the EXTENDED DESCRIPTION section), the behavior is undefined.

       The following sections are informative.

APPLICATION USAGE

       The index, length, match, and substr functions should not be confused with similar functions in the ISO C
       standard; the awk versions deal with characters, while the ISO C standard deals with bytes.

       Because the concatenation operation is represented  by  adjacent  expressions  rather  than  an  explicit
       operator, it is often necessary to use parentheses to enforce the proper evaluation precedence.

EXAMPLES

       The awk program specified in the command line is most easily specified within single-quotes (for example,
       programs commonly contain characters that are special to the  shell,  including  double-quotes.   In  the
       cases where an awk program contains single-quote characters, it is usually easiest to specify most of the
       program as strings within single-quotes concatenated by the shell with  quoted  single-quote  characters.
       For example:

              awk '/'\''/ { print "quote:", $0 }'

       prints all lines from the standard input containing a single-quote character, prefixed with quote:.

       The following are examples of simple awk programs:

        1. Write to the standard output all input lines for which field 3 is greater than 5:

           $3 > 5

        2. Write every tenth line:

           (NR % 10) == 0

        3. Write any line with a substring matching the regular expression:

           /(G|D)(2[0-9][[:alpha:]]*)/

        4. Print  any  line  with  a  substring  containing  a 'G' or 'D' , followed by a sequence of digits and
           characters.  This example uses character classes digit and alpha to match language-independent  digit
           and alphabetic characters respectively:

           /(G|D)([[:digit:][:alpha:]]*)/

        5. Write  any  line  in  which the second field matches the regular expression and the fourth field does
           not:

           $2 ~ /xyz/ && $4 !~ /xyz/

        6. Write any line in which the second field contains a backslash:

           $2 ~ /\\/

        7. Write any line in which the second field contains  a  backslash.  Note  that  backslash  escapes  are
           interpreted  twice;  once  in  lexical  processing  of  the string and once in processing the regular
           expression:

           $2 ~ "\\\\"

        8. Write the second to the last and the last field in each line. Separate the fields by a colon:

           {OFS=":";print $(NF-1), $NF}

        9. Write the line number and number of fields in each line. The  three  strings  representing  the  line
           number,  the  colon, and the number of fields are concatenated and that string is written to standard
           output:

           {print NR ":" NF}

       10. Write lines longer than 72 characters:

           length($0) > 72

       11. Write the first two fields in opposite order separated by OFS:

           { print $2, $1 }

       12. Same, with input fields separated by a comma or <space>s and <tab>s, or both:

           BEGIN { FS = ",[ \t]*|[ \t]+" }
                 { print $2, $1 }

       13. Add up the first column, print sum, and average:

                {s += $1 }
           END   {print "sum is ", s, " average is", s/NR}

       14. Write fields in reverse order, one per line (many lines out for each line in):

           { for (i = NF; i > 0; --i) print $i }

       15. Write all lines between occurrences of the strings start and stop:

           /start/, /stop/

       16. Write all lines whose first field is different from the previous one:

           $1 != prev { print; prev = $1 }

       17. Simulate echo:

           BEGIN  {
                   for (i = 1; i < ARGC; ++i)
                   printf("%s%s", ARGV[i], i==ARGC-1?"\n":" ")
           }

       18. Write the path prefixes contained in the PATH environment variable, one per line:

           BEGIN  {
                   n = split (ENVIRON["PATH"], path, ":")
                   for (i = 1; i <= n; ++i)
                   print path[i]
           }

       19. If there is a file named input containing page headers of the form:

           Page #

       and a file named program that contains:

              /Page/   { $2 = n++; }
                       { print }

       then the command line:

              awk -f program n=5 input

       prints the file input, filling in page numbers starting at 5.

RATIONALE

       This description is based on the new awk, "nawk", (see the  referenced  The  AWK  Programming  Language),
       which introduced a number of new features to the historical awk:

        1. New keywords: delete, do, function, return

        2. New built-in functions: atan2, close, cos, gsub, match, rand, sin, srand, sub, system

        3. New predefined variables: FNR, ARGC, ARGV, RSTART, RLENGTH, SUBSEP

        4. New expression operators: ?, :, ,, ^

        5. The FS variable and the third argument to split, now treated as extended regular expressions.

        6. The  operator  precedence,  changed  to more closely match the C language.  Two examples of code that
           operate differently are:

           while ( n /= 10 > 1) ...
           if (!"wk" ~ /bwk/) ...

       Several features have been added based on newer implementations of awk:

        * Multiple instances of -f progfile are permitted.

        * The new option -v assignment.

        * The new predefined variable ENVIRON.

        * New built-in functions toupper and tolower.

        * More formatting capabilities are added to printf to match the ISO C standard.

       The overall awk syntax has always been based on the C language,  with  a  few  features  from  the  shell
       command  language  and  other  sources.  Because  of this, it is not completely compatible with any other
       language, which has caused confusion for some users.  It is not the intent of the standard developers  to
       address  such issues.  A few relatively minor changes toward making the language more compatible with the
       ISO C standard were made; most of these changes are based on similar changes in  recent  implementations,
       as  described  above. There remain several C-language conventions that are not in awk. One of the notable
       ones is the comma operator, which is commonly used to specify multiple expressions in the C language  for
       statement. Also, there are various places where awk is more restrictive than the C language regarding the
       type of expression that can be used in a given context.  These  limitations  are  due  to  the  different
       features that the awk language does provide.

       Regular  expressions  in  awk  have been extended somewhat from historical implementations to make them a
       pure superset of  extended  regular  expressions,  as  defined  by  IEEE Std 1003.1-2001  (see  the  Base
       Definitions  volume  of  IEEE Std 1003.1-2001,  Section  9.4,  Extended  Regular  Expressions).  The main
       extensions are internationalization features and interval expressions.  Historical implementations of awk
       have  long supported backslash escape sequences as an extension to extended regular expressions, and this
       extension has been retained despite inconsistency with other utilities. The number  of  escape  sequences
       recognized  in  both extended regular expressions and strings has varied (generally increasing with time)
       among implementations. The set specified by IEEE Std 1003.1-2001 includes  most  sequences  known  to  be
       supported  by  popular  implementations  and by the ISO C standard. One sequence that is not supported is
       hexadecimal value escapes beginning with '\x' . This would allow values expressed in more than 9 bits  to
       be used within awk as in the ISO C standard. However, because this syntax has a non-deterministic length,
       it does not permit the subsequent character to be a hexadecimal digit. This limitation can be dealt  with
       in  the  C  language by the use of lexical string concatenation. In the awk language, concatenation could
       also be a solution for strings, but not for extended regular expressions (either lexical  ERE  tokens  or
       strings  used  dynamically  as regular expressions). Because of this limitation, the feature has not been
       added to IEEE Std 1003.1-2001.

       When a string variable is used in a context where an extended regular expression normally appears  (where
       the lexical token ERE is used in the grammar) the string does not contain the literal slashes.

       Some versions of awk allow the form:

              func name(args, ... ) { statements }

       This has been deprecated by the authors of the language, who asked that it not be specified.

       Historical implementations of awk produce an error if a next statement is executed in a BEGIN action, and
       cause awk to terminate if a next statement is executed in an END  action.  This  behavior  has  not  been
       documented, and it was not believed that it was necessary to standardize it.

       The  specification  of  conversions  between  string and numeric values is much more detailed than in the
       documentation of historical implementations or in the referenced The AWK Programming Language.   Although
       most of the behavior is designed to be intuitive, the details are necessary to ensure compatible behavior
       from different implementations. This is especially important in relational expressions since the types of
       the  operands  determine  whether a string or numeric comparison is performed. From the perspective of an
       application writer, it is usually sufficient to expect intuitive behavior and to  force  conversions  (by
       adding  zero or concatenating a null string) when the type of an expression does not obviously match what
       is needed. The intent has been to specify historical practice in almost all cases. The one  exception  is
       that,  in  historical  implementations,  variables  and constants maintain both string and numeric values
       after their original value is converted by any use. This means that referencing a  variable  or  constant
       can have unexpected side effects. For example, with historical implementations the following program:

              {
                  a = "+2"
                  b = 2
                  if (NR % 2)
                      c = a + b
                  if (a == b)
                      print "numeric comparison"
                  else
                      print "string comparison"
              }

       would  perform  a  numeric  comparison  (and  output  numeric comparison) for each odd-numbered line, but
       perform  a  string  comparison  (and   output   string   comparison)   for   each   even-numbered   line.
       IEEE Std 1003.1-2001   ensures   that   comparisons   will  be  numeric  if  necessary.  With  historical
       implementations, the following program:

              BEGIN {
                  OFMT = "%e"
                  print 3.14
                  OFMT = "%f"
                  print 3.14
              }

       would output "3.140000e+00" twice, because in the second print statement the constant "3.14" would have a
       string  value  from  the previous conversion. IEEE Std 1003.1-2001 requires that the output of the second
       print statement be "3.140000" . The behavior of historical implementations was seen  as  too  unintuitive
       and unpredictable.

       It  was  pointed  out  that  with  the  rules contained in early drafts, the following script would print
       nothing:

              BEGIN {
                  y[1.5] = 1
                  OFMT = "%e"
                  print y[1.5]
              }

       Therefore, a new variable, CONVFMT, was introduced. The OFMT variable  is  now  restricted  to  affecting
       output  conversions  of  numbers  to  strings  and  CONVFMT  is  used  for  internal conversions, such as
       comparisons or array indexing. The default value is the same as  that  for  OFMT,  so  unless  a  program
       changes  CONVFMT  (which  no  historical  program  would  do),  it  will  receive the historical behavior
       associated with internal string conversions.

       The POSIX awk lexical and syntactic conventions are specified more formally than in other sources.  Again
       the  intent  has  been  to  specify  historical practice. One convention that may not be obvious from the
       formal grammar as in other verbal descriptions is where <newline>s  are  acceptable.  There  are  several
       obvious  placements  such  as  terminating  a statement, and a backslash can be used to escape <newline>s
       between any lexical tokens. In addition, <newline>s without backslashes  can  follow  a  comma,  an  open
       brace, a logical AND operator ( "&&" ), a logical OR operator ( "||" ), the do keyword, the else keyword,
       and the closing parenthesis of an if, for, or while statement. For example:

              { print $1,
                      $2 }

       The requirement that awk add a trailing <newline> to  the  program  argument  text  is  to  simplify  the
       grammar,  making  it  match  a  text  file  in  form. There is no way for an application or test suite to
       determine whether a literal <newline> is added or whether awk simply acts as if it did.

       IEEE Std 1003.1-2001 requires several  changes  from  historical  implementations  in  order  to  support
       internationalization.  Probably  the  most  subtle  of  these  is the use of the decimal-point character,
       defined by the LC_NUMERIC category of the locale, in representations  of  floating-point  numbers.   This
       locale-specific character is used in recognizing numeric input, in converting between strings and numeric
       values, and in formatting output. However, regardless of locale, the period character (the  decimal-point
       character  of  the  POSIX  locale)  is  the decimal-point character recognized in processing awk programs
       (including assignments in command line arguments). This is essentially the same  convention  as  the  one
       used  in  the  ISO C  standard.  The difference is that the C language includes the setlocale() function,
       which permits an application to modify its locale. Because of this capability,  a  C  application  begins
       executing  with  its  locale  set  to the C locale, and only executes in the environment-specified locale
       after an explicit call to setlocale(). However, adding such an elaborate new feature to the awk  language
       was  seen  as inappropriate for IEEE Std 1003.1-2001. It is possible to execute an awk program explicitly
       in any desired locale by setting the environment in the shell.

       The undefined behavior resulting from NULs in extended regular expressions allows future  extensions  for
       the GNU gawk program to process binary data.

       The  behavior  in the case of invalid awk programs (including lexical, syntactic, and semantic errors) is
       undefined because it was considered overly limiting on implementations to specify.  In  most  cases  such
       errors  can be expected to produce a diagnostic and a non-zero exit status. However, some implementations
       may choose to extend the language in ways that make use of  certain  invalid  constructs.  Other  invalid
       constructs  might  be  deemed  worthy  of a warning, but otherwise cause some reasonable behavior.  Still
       other  constructs  may  be  very  difficult  to  detect  in  some   implementations.    Also,   different
       implementations  might  detect a given error during an initial parsing of the program (before reading any
       input files) while others  might  detect  it  when  executing  the  program  after  reading  some  input.
       Implementors should be aware that diagnosing errors as early as possible and producing useful diagnostics
       can ease debugging of applications, and thus make an implementation more usable.

       The unspecified behavior from using multi-character RS values is  to  allow  possible  future  extensions
       based  on  extended  regular  expressions used for record separators. Historical implementations take the
       first character of the string and ignore the others.

       Unspecified behavior when split( string, array, <null>) is used is to allow a proposed  future  extension
       that would split up a string into an array of individual characters.

       In  the  context of the getline function, equally good arguments for different precedences of the | and <
       operators can be made. Historical practice has been that:

              getline < "a" "b"

       is parsed as:

              ( getline < "a" ) "b"

       although many would argue that the intent was that the file ab should be read. However:

              getline < "x" + 1

       parses as:

              getline < ( "x" + 1 )

       Similar problems occur with the | version of getline, particularly in combination with $. For example:

              $"echo hi" | getline

       (This situation is particularly problematic when used in a print statement, where the |getline part might
       be a redirection of the print.)

       Since in most cases such constructs are not (or at least should not) be used (because they have a natural
       ambiguity for which there is no conventional parsing), the meaning of  these  constructs  has  been  made
       explicitly  unspecified.  (The  effect  is  that a conforming application that runs into the problem must
       parenthesize to resolve the ambiguity.) There appeared to be few if any actual uses of such constructs.

       Grammars can be  written  that  would  cause  an  error  under  these  circumstances.   Where  backwards-
       compatibility is not a large consideration, implementors may wish to use such grammars.

       Some  historical  implementations  have  allowed some built-in functions to be called without an argument
       list, the result being a default argument list chosen in some  "reasonable"  way.  Use  of  length  as  a
       synonym  for length($0) is the only one of these forms that is thought to be widely known or widely used;
       this particular form is documented in various places (for example, most historical awk  reference  pages,
       although not in the referenced The AWK Programming Language) as legitimate practice. With this exception,
       default argument lists have always been undocumented and vaguely defined, and it is not at all clear  how
       (or  if)  they  should  be  generalized  to user-defined functions.  They add no useful functionality and
       preclude possible future extensions that  might  need  to  name  functions  without  calling  them.   Not
       standardizing  them  seems  the  simplest  course. The standard developers considered that length merited
       special treatment, however, since it has been documented in the past and sees possibly substantial use in
       historical  programs.  Accordingly,  this  usage  has  been  made  legitimate,  but  Issue 5  removed the
       obsolescent marking for XSI-conforming implementations and many otherwise conforming applications  depend
       on this feature.

       In  sub  and gsub, if repl is a string literal (the lexical token STRING), then two consecutive backslash
       characters should be used in the string to ensure a single backslash will precede the ampersand when  the
       resultant  string  is  passed  to  the  function.  (For  example, to specify one literal ampersand in the
       replacement string, use gsub( ERE, "\\&" ).)

       Historically the only special character in the repl argument of sub and gsub  string  functions  was  the
       ampersand  (  '&'  )  character  and  preceding  it with the backslash character was used to turn off its
       special meaning.

       The description in the ISO POSIX-2:1993 standard introduced behavior such that  the  backslash  character
       was  another  special  character  and it was unspecified whether there were any other special characters.
       This description introduced several portability problems, some of which are described below,  and  so  it
       has been replaced with the more historical description. Some of the problems include:

        * Historically,  to  create  the replacement string, a script could use gsub( ERE, "\\&" ), but with the
          ISO POSIX-2:1993 standard wording, it was necessary to use gsub( ERE, "\\\\&" ). Backslash  characters
          are  doubled here because all string literals are subject to lexical analysis, which would reduce each
          pair of backslash characters to a single backslash before being passed to gsub.

        * Since it was unspecified what the special characters were, for  portable  scripts  to  guarantee  that
          characters  are printed literally, each character had to be preceded with a backslash. (For example, a
          portable script had to use gsub( ERE, "\\h\\i" ) to produce a replacement string of "hi" .)

       The description for comparisons in the ISO POSIX-2:1993 standard did  not  properly  describe  historical
       practice  because  of  the  way  numeric  strings  are  compared  as numbers. The current rules cause the
       following code:

              if (0 == "000")
                  print "strange, but true"
              else
                  print "not true"

       to do a numeric comparison, causing the if to succeed. It should be  intuitively  obvious  that  this  is
       incorrect behavior, and indeed, no historical implementation of awk actually behaves this way.

       To  fix this problem, the definition of numeric string was enhanced to include only those values obtained
       from specific circumstances (mostly external sources) where it is not possible to determine unambiguously
       whether the value is intended to be a string or a numeric.

       Variables  that are assigned to a numeric string shall also be treated as a numeric string. (For example,
       the notion of a numeric string can be propagated  across  assignments.)  In  comparisons,  all  variables
       having  the  uninitialized  value  are to be treated as a numeric operand evaluating to the numeric value
       zero.

       Uninitialized variables include all types of variables including scalars, array elements, and fields. The
       definition  of  an  uninitialized  value  in Variables and Special Variables is necessary to describe the
       value placed on uninitialized variables and on fields that are valid (for example, <  $NF)  but  have  no
       characters in them and to describe how these variables are to be used in comparisons. A valid field, such
       as $1, that has no characters in it can be obtained from  an  input  line  of  "\t\t"  when  FS=  '\t'  .
       Historically, the comparison ( $1<10) was done numerically after evaluating $1 to the value zero.

       The  phrase  "...  also  shall  have  the  numeric  value of the numeric string" was removed from several
       sections of the ISO POSIX-2:1993 standard because is specifies an unnecessary implementation  detail.  It
       is not necessary for IEEE Std 1003.1-2001 to specify that these objects be assigned two different values.
       It is only necessary to specify that these objects may evaluate to  two  different  values  depending  on
       context.

       The description of numeric string processing is based on the behavior of the atof() function in the ISO C
       standard. While it is not a requirement for an implementation  to  use  this  function,  many  historical
       implementations  of  awk  do.  In  the ISO C standard, floating-point constants use a period as a decimal
       point character for the language itself, independent of the current locale, but the atof()  function  and
       the  associated  strtod()  function use the decimal point character of the current locale when converting
       strings to numeric values. Similarly in awk, floating-point constants in  an  awk  script  use  a  period
       independent of the locale, but input strings use the decimal point character of the locale.

FUTURE DIRECTIONS

       None.

SEE ALSO

       Grammar  Conventions  ,  grep , lex , sed , the System Interfaces volume of IEEE Std 1003.1-2001, atof(),
       exec, popen(), setlocale(), strtod()

COPYRIGHT

       Portions of this text are reprinted and reproduced in electronic form from IEEE Std 1003.1, 2003 Edition,
       Standard  for  Information Technology -- Portable Operating System Interface (POSIX), The Open Group Base
       Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of Electrical and Electronics Engineers,
       Inc  and  The  Open Group. In the event of any discrepancy between this version and the original IEEE and
       The Open Group Standard, the original IEEE and The Open Group  Standard  is  the  referee  document.  The
       original Standard can be obtained online at http://www.opengroup.org/unix/online.html .