Provided by: manpages-posix_2.16-1_all bug

NAME

       awk - pattern scanning and processing language

SYNOPSIS

       awk [-F ERE][-v assignment] ... program [argument ...]

       awk [-F ERE] -f progfile ...  [-v assignment] ...[argument ...]

DESCRIPTION

       The  awk utility shall execute programs written in the awk programming language, which is specialized for
       textual data manipulation. An awk program is a sequence of patterns and corresponding actions. When input
       is read that matches a pattern, the action associated with that pattern is carried out.

       Input shall be interpreted as a sequence of records. By default, a record is a line, less its terminating
       <newline>, but this can be changed by using the RS built-in variable.  Each  record  of  input  shall  be
       matched  in  turn  against  each  pattern in the program. For each pattern matched, the associated action
       shall be executed.

       The awk utility shall interpret each input record as a sequence of fields where, by default, a field is a
       string of non- <blank>s. This default white-space field delimiter can be changed by using the FS built-in
       variable or -F ERE. The awk utility shall denote the first field in a record $1, the second  $2,  and  so
       on.  The  symbol $0 shall refer to the entire record; setting any other field causes the re-evaluation of
       $0. Assigning to $0 shall reset the values of all other fields and the NF built-in variable.

OPTIONS

       The awk utility shall conform to the Base  Definitions  volume  of  IEEE Std 1003.1-2001,  Section  12.2,
       Utility Syntax Guidelines.

       The following options shall be supported:

       -F  ERE
              Define  the  input  field separator to be the extended regular expression ERE, before any input is
              read; see Regular Expressions .

       -f  progfile
              Specify the pathname of the file progfile containing an awk program. If multiple instances of this
              option are specified, the concatenation of the files specified as progfile in the order  specified
              shall  be the awk program. The awk program can alternatively be specified in the command line as a
              single argument.

       -v  assignment
              The application shall ensure that the assignment argument is in the same  form  as  an  assignment
              operand.  The  specified  variable  assignment  shall  occur  prior  to executing the awk program,
              including the actions associated with BEGIN patterns (if any). Multiple occurrences of this option
              can be specified.

OPERANDS

       The following operands shall be supported:

       program
              If no -f option is specified, the first operand to awk shall be the text of the awk  program.  The
              application shall supply the program operand as a single argument to awk. If the text does not end
              in a <newline>, awk shall interpret the text as if it did.

       argument
              Either of the following two types of argument can be intermixed:

       file
              A  pathname  of  a  file  that  contains the input to be read, which is matched against the set of
              patterns in the program. If no file operands are specified, or if a file  operand  is  '-'  ,  the
              standard input shall be used.

       assignment
              An  operand that begins with an underscore or alphabetic character from the portable character set
              (see the table in the Base Definitions  volume  of  IEEE Std 1003.1-2001,  Section  6.1,  Portable
              Character  Set),  followed by a sequence of underscores, digits, and alphabetics from the portable
              character set, followed by the '=' character, shall specify a variable assignment  rather  than  a
              pathname.  The characters before the '=' represent the name of an awk variable; if that name is an
              awk reserved word (see Grammar ) the behavior is undefined. The  characters  following  the  equal
              sign  shall  be  interpreted  as  if  they  appeared in the awk program preceded and followed by a
              double-quote ( ' )' character, as a STRING token (see Grammar ), except that if the last character
              is an unescaped backslash, it shall be interpreted as a literal backslash rather than as the first
              character of the sequence "\"" . The variable shall be assigned the value  of  that  STRING  token
              and,  if appropriate, shall be considered a numeric string (see Expressions in awk ), the variable
              shall also be assigned its numeric value. Each such variable assignment shall occur just prior  to
              the  processing  of the following file, if any. Thus, an assignment before the first file argument
              shall be executed after the BEGIN actions (if any),  while  an  assignment  after  the  last  file
              argument  shall occur before the END actions (if any). If there are no file arguments, assignments
              shall be executed before processing the standard input.

STDIN

       The standard input shall be used only if no file operands are specified, or if a file operand  is  '-'  ;
       see  the  INPUT FILES section. If the awk program contains no actions and no patterns, but is otherwise a
       valid awk program, standard input and any file operands shall not be read  and  awk  shall  exit  with  a
       return status of zero.

INPUT FILES

       Input files to the awk program from any of the following sources shall be text files:

        * Any file operands or their equivalents, achieved by modifying the awk variables ARGV and ARGC

        * Standard input in the absence of any file operands

        * Arguments to the getline function

       Whether the variable RS is set to a value other than a <newline> or not, for these files, implementations
       shall  support  records  terminated  with  the specified separator up to {LINE_MAX} bytes and may support
       longer records.

       If -f progfile is specified, the application shall ensure that the files named by each  of  the  progfile
       option-arguments  are  text  files  and  their  concatenation,  in  the  same order as they appear in the
       arguments, is an awk program.

ENVIRONMENT VARIABLES

       The following environment variables shall affect the execution of awk:

       LANG   Provide a default value for the internationalization variables that are unset or  null.  (See  the
              Base  Definitions  volume of IEEE Std 1003.1-2001, Section 8.2, Internationalization Variables for
              the  precedence  of  internationalization  variables  used  to  determine  the  values  of  locale
              categories.)

       LC_ALL If  set  to  a  non-empty  string value, override the values of all the other internationalization
              variables.

       LC_COLLATE
              Determine the locale  for  the  behavior  of  ranges,  equivalence  classes,  and  multi-character
              collating elements within regular expressions and in comparisons of string values.

       LC_CTYPE
              Determine  the locale for the interpretation of sequences of bytes of text data as characters (for
              example, single-byte as opposed to multi-byte  characters  in  arguments  and  input  files),  the
              behavior  of  character  classes  within  regular expressions, the identification of characters as
              letters, and the mapping of uppercase  and  lowercase  characters  for  the  toupper  and  tolower
              functions.

       LC_MESSAGES
              Determine  the locale that should be used to affect the format and contents of diagnostic messages
              written to standard error.

       LC_NUMERIC
              Determine the radix character used when interpreting numeric input, performing conversions between
              numeric and string values, and  formatting  numeric  output.  Regardless  of  locale,  the  period
              character  (the  decimal-point  character  of  the  POSIX  locale)  is the decimal-point character
              recognized in processing awk programs (including assignments in command line arguments).

       NLSPATH
              Determine the location of message catalogs for the processing of LC_MESSAGES .

       PATH   Determine the search path when looking for commands executed by system(expr), or input and  output
              pipes; see the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 8, Environment Variables.

       In addition, all environment variables shall be visible via the awk variable ENVIRON.

ASYNCHRONOUS EVENTS

       Default.

STDOUT

       The nature of the output files depends on the awk program.

STDERR

       The standard error shall be used only for diagnostic messages.

OUTPUT FILES

       The nature of the output files depends on the awk program.

EXTENDED DESCRIPTION

   Overall Program Structure
       An awk program is composed of pairs of the form:

              pattern { action }

       Either the pattern or the action (including the enclosing brace characters) can be omitted.

       A missing pattern shall match any record of input, and a missing action shall be equivalent to:

              { print }

       Execution  of  the  awk  program  shall  start  by  first executing the actions associated with all BEGIN
       patterns in the order they occur in the program. Then each file operand (or standard input  if  no  files
       were specified) shall be processed in turn by reading data from the file until a record separator is seen
       (  <newline>  by  default).  Before the first reference to a field in the record is evaluated, the record
       shall be split into fields, according to the rules in Regular Expressions , using the value  of  FS  that
       was  current  at the time the record was read. Each pattern in the program then shall be evaluated in the
       order of occurrence, and the action  associated  with  each  pattern  that  matches  the  current  record
       executed.  The  action  for  a  matching pattern shall be executed before evaluating subsequent patterns.
       Finally, the actions associated with all END patterns shall be executed in the order they  occur  in  the
       program.

   Expressions in awk
       Expressions describe computations used in patterns and actions.  In the following table, valid expression
       operations  are  given  in  groups  from  highest precedence first to lowest precedence last, with equal-
       precedence operators grouped between horizontal lines. In expression evaluation,  where  the  grammar  is
       formally  ambiguous, higher precedence operators shall be evaluated before lower precedence operators. In
       this table expr, expr1, expr2, and expr3 represent any expression, while  lvalue  represents  any  entity
       that  can  be  assigned  to  (that is, on the left side of an assignment operator). The precise syntax of
       expressions is given in Grammar .

                                  Table: Expressions in Decreasing Precedence in awk
                    Syntax                Name                      Type of Result   Associativity
                    ( expr )              Grouping                  Type of expr     N/A
                    $expr                 Field reference           String           N/A
                    ++ lvalue             Pre-increment             Numeric          N/A
                    -- lvalue             Pre-decrement             Numeric          N/A
                    lvalue ++             Post-increment            Numeric          N/A
                    lvalue --             Post-decrement            Numeric          N/A
                    expr ^ expr           Exponentiation            Numeric          Right
                    ! expr                Logical not               Numeric          N/A
                    + expr                Unary plus                Numeric          N/A
                    - expr                Unary minus               Numeric          N/A
                    expr * expr           Multiplication            Numeric          Left
                    expr / expr           Division                  Numeric          Left
                    expr % expr           Modulus                   Numeric          Left
                    expr + expr           Addition                  Numeric          Left
                    expr - expr           Subtraction               Numeric          Left
                    expr expr             String concatenation      String           Left
                    expr < expr           Less than                 Numeric          None
                    expr <= expr          Less than or equal to     Numeric          None
                    expr != expr          Not equal to              Numeric          None
                    expr == expr          Equal to                  Numeric          None
                    expr > expr           Greater than              Numeric          None
                    expr >= expr          Greater than or equal to  Numeric          None
                    expr ~ expr           ERE match                 Numeric          None
                    expr !~ expr          ERE non-match             Numeric          None
                    expr in array         Array membership          Numeric          Left
                    ( index ) in array    Multi-dimension array     Numeric          Left
                                          membership
                    expr && expr          Logical AND               Numeric          Left
                    expr || expr          Logical OR                Numeric          Left
                    expr1 ? expr2 : expr3 Conditional expression    Type of selected Right
                                                                    expr2 or expr3
                    lvalue ^= expr        Exponentiation assignment Numeric          Right
                    lvalue %= expr        Modulus assignment        Numeric          Right
                    lvalue *= expr        Multiplication assignment Numeric          Right
                    lvalue /= expr        Division assignment       Numeric          Right
                    lvalue += expr        Addition assignment       Numeric          Right
                    lvalue -= expr        Subtraction assignment    Numeric          Right
                    lvalue = expr         Assignment                Type of expr     Right

       Each expression shall have either a string value, a numeric value, or both. Except as stated for specific
       contexts, the value of an expression shall be implicitly converted to the type needed for the context  in
       which it is used. A string value shall be converted to a numeric value by the equivalent of the following
       calls to functions defined by the ISO C standard:

              setlocale(LC_NUMERIC, "");
              numeric_value = atof(string_value);

       A  numeric  value  that  is exactly equal to the value of an integer (see Concepts Derived from the ISO C
       Standard ) shall be converted to a string by the equivalent of a call to the sprintf function (see String
       Functions ) with the string "%d" as the fmt argument and the numeric value being converted as  the  first
       and  only  expr  argument.  Any other numeric value shall be converted to a string by the equivalent of a
       call to the sprintf function with the value of the variable CONVFMT as the fmt argument and  the  numeric
       value being converted as the first and only expr argument. The result of the conversion is unspecified if
       the  value  of  CONVFMT is not a floating-point format specification. This volume of IEEE Std 1003.1-2001
       specifies no explicit conversions between numbers and strings. An application can force an expression  to
       be  treated  as a number by adding zero to it, or can force it to be treated as a string by concatenating
       the null string ( "" ) to it.

       A string value shall be considered a numeric string if it comes from one of the following:

        1. Field variables

        2. Input from the getline() function

        3. FILENAME

        4. ARGV array elements

        5. ENVIRON array elements

        6. Array elements created by the split() function

        7. A command line variable assignment

        8. Variable assignment from another numeric string variable

       and after all the following conversions have been  applied,  the  resulting  string  would  lexically  be
       recognized as a NUMBER token as described by the lexical conventions in Grammar :

        * All leading and trailing <blank>s are discarded.

        * If the first non- <blank> is '+' or '-' , it is discarded.

        * Changing each occurrence of the decimal point character from the current locale to a period.

       If a '-' character is ignored in the preceding description, the numeric value of the numeric string shall
       be the negation of the numeric value of the recognized NUMBER token.  Otherwise, the numeric value of the
       numeric  string  shall  be the numeric value of the recognized NUMBER token. Whether or not a string is a
       numeric string shall be relevant only in contexts where that term is used in this section.

       When an expression is used in a Boolean context, if it has a numeric value, a  value  of  zero  shall  be
       treated  as  false  and  any  other value shall be treated as true. Otherwise, a string value of the null
       string shall be treated as false and any other value shall be treated as true. A Boolean context shall be
       one of the following:

        * The first subexpression of a conditional expression

        * An expression operated on by logical NOT, logical AND, or logical OR

        * The second expression of a for statement

        * The expression of an if statement

        * The expression of the while clause in either a while or do... while statement

        * An expression used as a pattern (as in Overall Program Structure)

       All arithmetic shall follow the semantics of floating-point arithmetic as specified by the ISO C standard
       (see Concepts Derived from the ISO C Standard ).

       The value of the expression:

              expr1 ^ expr2

       shall be equivalent to the value returned by the ISO C standard function call:

              pow(expr1, expr2)

       The expression:

              lvalue ^= expr

       shall be equivalent to the ISO C standard expression:

              lvalue = pow(lvalue, expr)

       except that lvalue shall be evaluated only once. The value of the expression:

              expr1 % expr2

       shall be equivalent to the value returned by the ISO C standard function call:

              fmod(expr1, expr2)

       The expression:

              lvalue %= expr

       shall be equivalent to the ISO C standard expression:

              lvalue = fmod(lvalue, expr)

       except that lvalue shall be evaluated only once.

       Variables and fields shall be set by the assignment statement:

              lvalue = expression

       and the type of expression shall determine the resulting  variable  type.  The  assignment  includes  the
       arithmetic  assignments  (  "+="  ,  "-="  , "*=" , "/=" , "%=" , "^=" , "++" , "--" ) all of which shall
       produce a numeric result. The left-hand side of an assignment and the target of increment  and  decrement
       operators can be one of a variable, an array with index, or a field selector.

       The  awk  language  supplies  arrays  that  are  used  for storing numbers or strings. Arrays need not be
       declared. They shall initially be empty, and their sizes shall change  dynamically.  The  subscripts,  or
       element  identifiers,  are  strings,  providing  a  type  of  associative array capability. An array name
       followed by a subscript within square brackets can be used as an lvalue and thus  as  an  expression,  as
       described  in  the  grammar;  see  Grammar  . Unsubscripted array names can be used in only the following
       contexts:

        * A parameter in a function definition or function call

        * The NAME token following any use of the keyword in as specified in the grammar (see Grammar ); if  the
          name used in this context is not an array name, the behavior is undefined

       A valid array index shall consist of one or more comma-separated expressions, similar to the way in which
       multi-dimensional  arrays  are indexed in some programming languages.  Because awk arrays are really one-
       dimensional, such a comma-separated list shall be converted to  a  single  string  by  concatenating  the
       string  values  of  the  separate  expressions,  each separated from the other by the value of the SUBSEP
       variable.  Thus, the following two index operations shall be equivalent:

              var[expr1, expr2, ... exprn]

              var[expr1 SUBSEP expr2 SUBSEP ... SUBSEP exprn]

       The application shall ensure that a multi-dimensioned index used with the in operator  is  parenthesized.
       The  in  operator,  which  tests  for  the  existence of a particular array element, shall not cause that
       element to exist. Any other reference to a nonexistent array element shall automatically create it.

       Comparisons (with the '<' , "<=" , "!=" , "==" , '>' , and ">=" operators) shall be made  numerically  if
       both  operands  are numeric, if one is numeric and the other has a string value that is a numeric string,
       or if one is numeric and the other has the uninitialized value. Otherwise, operands shall be converted to
       strings as required and a string comparison shall be made using the locale-specific  collation  sequence.
       The value of the comparison expression shall be 1 if the relation is true, or 0 if the relation is false.

   Variables and Special Variables
       Variables  can  be used in an awk program by referencing them.  With the exception of function parameters
       (see User-Defined Functions ), they are not explicitly declared. Function parameter names shall be  local
       to  the  function;  all  other  variable names shall be global. The same name shall not be used as both a
       function parameter name and as the name of a function or a special awk variable. The same name shall  not
       be  used both as a variable name with global scope and as the name of a function. The same name shall not
       be used within the same scope both as a scalar  variable  and  as  an  array.   Uninitialized  variables,
       including  scalar  variables,  array elements, and field variables, shall have an uninitialized value. An
       uninitialized value shall have both a numeric value of zero and a  string  value  of  the  empty  string.
       Evaluation  of variables with an uninitialized value, to either string or numeric, shall be determined by
       the context in which they are used.

       Field variables shall be designated by a '$' followed by a number or numerical expression. The effect  of
       the  field  number  expression  evaluating  to anything other than a non-negative integer is unspecified;
       uninitialized variables or string values need not be converted to numeric values  in  this  context.  New
       field  variables can be created by assigning a value to them.  References to nonexistent fields (that is,
       fields after $NF), shall evaluate to the uninitialized  value.  Such  references  shall  not  create  new
       fields.  However,  assigning  to a nonexistent field (for example, $(NF+2)=5) shall increase the value of
       NF; create any intervening fields with the  uninitialized  value;  and  cause  the  value  of  $0  to  be
       recomputed,  with the fields being separated by the value of OFS. Each field variable shall have a string
       value or an uninitialized value when created.  Field variables shall have the  uninitialized  value  when
       created  from  $0  using  FS  and the variable does not contain any characters. If appropriate, the field
       variable shall be considered a numeric string (see Expressions in awk ).

       Implementations shall support the following other special variables that are set by awk:

       ARGC   The number of elements in the ARGV array.

       ARGV   An array of command line arguments, excluding options and the program argument, numbered from zero
              to ARGC-1.

       The arguments in ARGV can be modified or added to; ARGC can be altered. As  each  input  file  ends,  awk
       shall  treat the next non-null element of ARGV, up to the current value of ARGC-1, inclusive, as the name
       of the next input file. Thus, setting an element of ARGV to null means that it shall not be treated as an
       input file. The name '-' indicates the standard input. If an argument matches the format of an assignment
       operand, this argument shall be treated as an assignment rather than a file argument.

       CONVFMT
              The printf format for converting numbers to strings (except for output statements, where  OFMT  is
              used); "%.6g" by default.

       ENVIRON
              An  array representing the value of the environment, as described in the exec functions defined in
              the System Interfaces volume of IEEE Std 1003.1-2001. The indices of the array  shall  be  strings
              consisting of the names of the environment variables, and the value of each array element shall be
              a  string consisting of the value of that variable. If appropriate, the environment variable shall
              be considered a numeric string (see Expressions in awk ); the array element shall  also  have  its
              numeric value.

       In all cases where the behavior of awk is affected by environment variables (including the environment of
       any  commands  that  awk  executes  via  the  system function or via pipeline redirections with the print
       statement, the printf statement, or the getline function), the environment used shall be the  environment
       at the time awk began executing; it is implementation-defined whether any modification of ENVIRON affects
       this environment.

       FILENAME
              A  pathname of the current input file. Inside a BEGIN action the value is undefined. Inside an END
              action the value shall be the name of the last input file processed.

       FNR    The ordinal number of the current record in the current file. Inside  a  BEGIN  action  the  value
              shall  be zero. Inside an END action the value shall be the number of the last record processed in
              the last file processed.

       FS     Input field separator regular expression; a <space> by default.

       NF     The number of fields in the current record. Inside a BEGIN action, the  use  of  NF  is  undefined
              unless a getline function without a var argument is executed previously.  Inside an END action, NF
              shall  retain  the value it had for the last record read, unless a subsequent, redirected, getline
              function without a var argument is performed prior to entering the END action.

       NR     The ordinal number of the current record from the start of input.  Inside a BEGIN action the value
              shall be zero. Inside an END action the value shall be the number of the last record processed.

       OFMT   The printf format for converting numbers to strings in output statements (see Output Statements );
              "%.6g" by default. The result of the conversion is unspecified if the  value  of  OFMT  is  not  a
              floating-point format specification.

       OFS    The print statement output field separation; <space> by default.

       ORS    The print statement output record separator; a <newline> by default.

       RLENGTH
              The length of the string matched by the match function.

       RS     The  first character of the string value of RS shall be the input record separator; a <newline> by
              default. If RS contains more than one character, the results are unspecified.  If RS is null, then
              records are separated by sequences consisting of a <newline> plus one or more blank lines, leading
              or trailing blank lines shall not result in empty records at the beginning or end  of  the  input,
              and a <newline> shall always be a field separator, no matter what the value of FS is.

       RSTART The  starting  position  of the string matched by the match function, numbering from 1. This shall
              always be equivalent to the return value of the match function.

       SUBSEP The subscript separator string for multi-dimensional arrays; the default value is  implementation-
              defined.

   Regular Expressions
       The  awk  utility  shall  make  use of the extended regular expression notation (see the Base Definitions
       volume of IEEE Std 1003.1-2001, Section 9.4, Extended Regular Expressions) except that it shall allow the
       use of C-language conventions for escaping special characters within the EREs, as specified in the  table
       in  the  Base Definitions volume of IEEE Std 1003.1-2001, Chapter 5, File Format Notation ( '\\' , '\a' ,
       '\b' , '\f' , '\n' , '\r' , '\t' , '\v' ) and the  following  table;  these  escape  sequences  shall  be
       recognized  both  inside  and  outside  bracket  expressions.  Note that records need not be separated by
       <newline>s and string constants can contain <newline>s, so even the "\n" sequence is valid in  awk  EREs.
       Using a slash character within an ERE requires the escaping shown in the following table.

                                            Table: Escape Sequences in awk
                        Escape
                        Sequence Description                    Meaning
                        \"       Backslash quotation-mark       Quotation-mark character
                        \/       Backslash slash                Slash character
                        \ddd     A backslash character followed The character whose encoding
                                 by the longest sequence of     is represented by the one,
                                 one, two, or three octal-digit two, or three-digit octal
                                 characters (01234567). If all  integer. Multi-byte characters
                                 of the digits are 0 (that is,  require multiple, concatenated
                                 representation of the NUL      escape sequences of this type,
                                 character), the behavior is    including the leading '\' for
                                 undefined.                     each byte.
                        \c       A backslash character followed Undefined
                                 by any character not described
                                 in this table or in the table
                                 in the Base Definitions volume
                                 of IEEE Std 1003.1-2001,
                                 Chapter 5, File Format
                                 Notation ( '\\' , '\a' , '\b'
                                 , '\f' , '\n' , '\r' , '\t' ,
                                 '\v' ).

       A  regular  expression  can be matched against a specific field or string by using one of the two regular
       expression matching operators, '~' and "!~" . These operators shall interpret their right-hand operand as
       a regular expression and their left-hand operand as a string.  If  the  regular  expression  matches  the
       string,  the  '~'  expression shall evaluate to a value of 1, and the "!~" expression shall evaluate to a
       value of 0. (The regular expression matching operation is as defined by the  term  matched  in  the  Base
       Definitions  volume  of  IEEE Std 1003.1-2001, Section 9.1, Regular Expression Definitions, where a match
       occurs on any part of the string unless the regular expression is limited with the circumflex  or  dollar
       sign  special  characters.) If the regular expression does not match the string, the '~' expression shall
       evaluate to a value of 0, and the "!~" expression shall evaluate to a  value  of  1.  If  the  right-hand
       operand  is  any expression other than the lexical token ERE, the string value of the expression shall be
       interpreted as an extended regular expression, including the escape conventions  described  above.   Note
       that  these  same  escape  conventions shall also be applied in determining the value of a string literal
       (the lexical token STRING), and thus shall be applied a second time when a string literal is used in this
       context.

       When an ERE token appears as an expression in any context other than as the right-hand of the '~' or "!~"
       operator or as one of the built-in function  arguments  described  below,  the  value  of  the  resulting
       expression shall be the equivalent of:

              $0 ~ /ere/

       The ere argument to the gsub, match, sub functions, and the fs argument to the split function (see String
       Functions  )  shall  be  interpreted  as  extended regular expressions. These can be either ERE tokens or
       arbitrary expressions, and shall be interpreted in the same manner as the right-hand side of the  '~'  or
       "!~" operator.

       An  extended regular expression can be used to separate fields by using the -F ERE option or by assigning
       a string containing the expression to the built-in variable FS. The default  value  of  the  FS  variable
       shall be a single <space>. The following describes FS behavior:

        1. If FS is a null string, the behavior is unspecified.

        2. If FS is a single character:

            a. If FS is <space>, skip leading and trailing <blank>s; fields shall be delimited by sets of one or
               more <blank>s.

            b. Otherwise, if FS is any other character c, fields shall be delimited by each single occurrence of
               c.

        3. Otherwise,  the  string  value  of  FS shall be considered to be an extended regular expression. Each
           occurrence of a sequence matching the extended regular expression shall delimit fields.

       Except for the '~' and "!~" operators, and in the gsub, match, split, and  sub  built-in  functions,  ERE
       matching  shall  be  based on input records; that is, record separator characters (the first character of
       the value of the variable RS, <newline> by  default)  cannot  be  embedded  in  the  expression,  and  no
       expression  shall  match  the  record  separator  character.  If  the  record separator is not <newline>,
       <newline>s embedded in the expression can be matched. For the '~' and "!~" operators, and in  those  four
       built-in  functions,  ERE  matching  shall  be  based  on text strings; that is, any character (including
       <newline> and the record separator) can be embedded in the pattern,  and  an  appropriate  pattern  shall
       match  any  character.  However,  in  all  awk ERE matching, the use of one or more NUL characters in the
       pattern, input record, or text string produces undefined results.

   Patterns
       A pattern is any valid expression, a range specified by two expressions separated by a comma, or  one  of
       the two special patterns BEGIN or END.

   Special Patterns
       The  awk utility shall recognize two special patterns, BEGIN and END. Each BEGIN pattern shall be matched
       once and its associated action executed before the first record of input is read (except possibly by  use
       of  the  getline  function-see  Input/Output  and General Functions - in a prior BEGIN action) and before
       command line assignment is done. Each END pattern  shall  be  matched  once  and  its  associated  action
       executed after the last record of input has been read. These two patterns shall have associated actions.

       BEGIN  and  END  shall not combine with other patterns. Multiple BEGIN and END patterns shall be allowed.
       The actions associated with the BEGIN patterns shall be executed in the order specified in  the  program,
       as are the END actions. An END pattern can precede a BEGIN pattern in a program.

       If  an  awk  program  consists  of  only actions with the pattern BEGIN, and the BEGIN action contains no
       getline function, awk shall exit without reading its input when the last  statement  in  the  last  BEGIN
       action  is executed. If an awk program consists of only actions with the pattern END or only actions with
       the patterns BEGIN and END, the input shall be  read  before  the  statements  in  the  END  actions  are
       executed.

   Expression Patterns
       An  expression pattern shall be evaluated as if it were an expression in a Boolean context. If the result
       is true, the pattern shall be considered to match, and the associated action (if any) shall be  executed.
       If the result is false, the action shall not be executed.

   Pattern Ranges
       A  pattern  range  consists  of  two  expressions separated by a comma; in this case, the action shall be
       performed for all records between a match of the first expression and the following match of  the  second
       expression,  inclusive.  At  this  point,  the  pattern  range  can be repeated starting at input records
       subsequent to the end of the matched range.

   Actions
       An action is a sequence of statements as shown in the grammar in Grammar . Any single  statement  can  be
       replaced  by  a  statement  list  enclosed  in  braces. The application shall ensure that statements in a
       statement list are separated by <newline>s or  semicolons.  Statements  in  a  statement  list  shall  be
       executed sequentially in the order that they appear.

       The  expression  acting as the conditional in an if statement shall be evaluated and if it is non-zero or
       non-null, the following statement shall be  executed;  otherwise,  if  else  is  present,  the  statement
       following the else shall be executed.

       The  if,  while,  do...  while,  for, break, and continue statements are based on the ISO C standard (see
       Concepts Derived from the ISO C Standard ), except that the  Boolean  expressions  shall  be  treated  as
       described in Expressions in awk , and except in the case of:

              for (variable in array)

       which  shall  iterate,  assigning each index of array to variable in an unspecified order. The results of
       adding new elements to array within such a for loop are undefined.  If  a  break  or  continue  statement
       occurs outside of a loop, the behavior is undefined.

       The  delete  statement  shall  remove  an  individual array element.  Thus, the following code deletes an
       entire array:

              for (index in array)
                  delete array[index]

       The next statement shall cause all further processing of the current input record to  be  abandoned.  The
       behavior is undefined if a next statement appears or is invoked in a BEGIN or END action.

       The  exit  statement  shall invoke all END actions in the order in which they occur in the program source
       and then terminate the program without reading further input. An exit  statement  inside  an  END  action
       shall terminate the program without further execution of END actions. If an expression is specified in an
       exit  statement,  its  numeric  value  shall  be  the  exit  status  of awk, unless subsequent errors are
       encountered or a subsequent exit statement with an expression is executed.

   Output Statements
       Both print and printf statements shall write to standard output by default. The output shall  be  written
       to the location specified by output_redirection if one is supplied, as follows:

              > expression>> expression| expression

       In all cases, the expression shall be evaluated to produce a string that is used as a pathname into which
       to  write (for '>' or ">>" ) or as a command to be executed (for '|' ). Using the first two forms, if the
       file of that name is not currently open, it shall be opened, creating it if necessary and using the first
       form, truncating the file. The output then shall be appended to the file. As long  as  the  file  remains
       open,  subsequent calls in which expression evaluates to the same string value shall simply append output
       to the file. The file remains open until the close function (see Input/Output and General Functions )  is
       called with an expression that evaluates to the same string value.

       The  third  form  shall  write  output onto a stream piped to the input of a command. The stream shall be
       created if no stream is currently open with the value of expression as  its  command  name.   The  stream
       created  shall  be  equivalent  to  one  created  by a call to the popen() function defined in the System
       Interfaces volume of IEEE Std 1003.1-2001 with the value of expression as  the  command  argument  and  a
       value of w as the mode argument. As long as the stream remains open, subsequent calls in which expression
       evaluates  to  the  same  string value shall write output to the existing stream. The stream shall remain
       open until the close function (see Input/Output and General Functions ) is called with an expression that
       evaluates to the same string value.  At that time, the stream shall be closed as if  by  a  call  to  the
       pclose() function defined in the System Interfaces volume of IEEE Std 1003.1-2001.

       As  described  in detail by the grammar in Grammar , these output statements shall take a comma-separated
       list of expressions referred to in the grammar by the non-terminal symbols expr_list, print_expr_list, or
       print_expr_list_opt. This list is referred to here as the expression list, and each member is referred to
       as an expression argument.

       The print statement shall write the value of each expression argument onto the  indicated  output  stream
       separated  by  the  current output field separator (see variable OFS above), and terminated by the output
       record separator (see variable ORS above). All expression arguments shall  be  taken  as  strings,  being
       converted  if necessary; this conversion shall be as described in Expressions in awk , with the exception
       that the printf format in OFMT shall be used instead of the value in CONVFMT. An  empty  expression  list
       shall stand for the whole input record ($0).

       The printf statement shall produce output based on a notation similar to the File Format Notation used to
       describe  file  formats  in  this  volume  of  IEEE Std 1003.1-2001  (see  the Base Definitions volume of
       IEEE Std 1003.1-2001, Chapter 5, File Format Notation).  Output shall be produced as specified  with  the
       first expression argument as the string format and subsequent expression arguments as the strings arg1 to
       argn, inclusive, with the following exceptions:

        1. The  format shall be an actual character string rather than a graphical representation. Therefore, it
           cannot contain empty character positions. The <space> in the format string, in any context other than
           a flag of a conversion specification, shall be treated as an ordinary character that is copied to the
           output.

        2. If the character set contains a ' ' character and that character appears in  the  format  string,  it
           shall be treated as an ordinary character that is copied to the output.

        3. The  escape  sequences beginning with a backslash character shall be treated as sequences of ordinary
           characters that are copied to the output.  Note  that  these  same  sequences  shall  be  interpreted
           lexically  by awk when they appear in literal strings, but they shall not be treated specially by the
           printf statement.

        4. A field width or precision can be specified as the '*' character instead of a digit string.  In  this
           case  the  next argument from the expression list shall be fetched and its numeric value taken as the
           field width or precision.

        5. The implementation shall not precede or follow output from the d or u conversion specifier characters
           with <blank>s not specified by the format string.

        6. The implementation shall not precede output from the o conversion specifier  character  with  leading
           zeros not specified by the format string.

        7. For  the  c  conversion specifier character: if the argument has a numeric value, the character whose
           encoding is that value shall be output. If the value is zero or is not the encoding of any  character
           in  the  character set, the behavior is undefined. If the argument does not have a numeric value, the
           first character of the string value shall be output; if the string does not contain  any  characters,
           the behavior is undefined.

        8. For  each  conversion  specification that consumes an argument, the next expression argument shall be
           evaluated. With the exception of the c conversion specifier character, the value shall  be  converted
           (according  to the rules specified in Expressions in awk ) to the appropriate type for the conversion
           specification.

        9. If there are insufficient expression arguments to satisfy all the conversion  specifications  in  the
           format string, the behavior is undefined.

       10. If any character sequence in the format string begins with a '%' character, but does not form a valid
           conversion specification, the behavior is unspecified.

       Both print and printf can output at least {LINE_MAX} bytes.

   Functions
       The awk language has a variety of built-in functions: arithmetic, string, input/output, and general.

   Arithmetic Functions
       The arithmetic functions, except for int, shall be based on the ISO C standard (see Concepts Derived from
       the ISO C Standard ). The behavior is undefined in cases where the ISO C standard specifies that an error
       be  returned  or  that  the  behavior  is undefined. Although the grammar (see Grammar ) permits built-in
       functions to appear with no arguments or parentheses, unless the argument or parentheses are indicated as
       optional in the following list (by displaying them within the "[]" brackets), such use is undefined.

       atan2(y,x)
              Return arctangent of y/x in radians in the range [-pi,pi].

       cos(x) Return cosine of x, where x is in radians.

       sin(x) Return sine of x, where x is in radians.

       exp(x) Return the exponential function of x.

       log(x) Return the natural logarithm of x.

       sqrt(x)
              Return the square root of x.

       int(x) Return the argument truncated to an integer. Truncation shall be toward 0 when x>0.

       rand() Return a random number n, such that 0<=n<1.

       srand([expr])
              Set the seed value for rand to expr or use the time of day if expr is omitted. The  previous  seed
              value shall be returned.

   String Functions
       The  string  functions  in  the  following  list  shall be supported. Although the grammar (see Grammar )
       permits built-in functions to appear with no arguments or parentheses, unless the argument or parentheses
       are indicated as optional in the following list (by displaying them within the "[]" brackets),  such  use
       is undefined.

       gsub(ere, repl[, in])
              Behave  like  sub  (see  below),  except  that  it  shall  replace  all occurrences of the regular
              expression (like the ed utility global substitute) in $0 or in the in argument, when specified.

       index(s, t)
              Return the position, in characters, numbering from 1, in string s where string t first occurs,  or
              zero if it does not occur at all.

       length[([s])]
              Return  the  length, in characters, of its argument taken as a string, or of the whole record, $0,
              if there is no argument.

       match(s, ere)
              Return the position, in characters, numbering from 1, in  string  s  where  the  extended  regular
              expression  ere  occurs,  or zero if it does not occur at all. RSTART shall be set to the starting
              position (which is the same as the returned value), zero if no match is found;  RLENGTH  shall  be
              set to the length of the matched string, -1 if no match is found.

       split(s, a[, fs  ])
              Split  the  string  s into array elements a[1], a[2], ..., a[n], and return n. All elements of the
              array shall be deleted before the split is performed. The separation shall be done with the ERE fs
              or with the field separator FS if fs is not given. Each array element shall have  a  string  value
              when  created  and,  if  appropriate,  the array element shall be considered a numeric string (see
              Expressions in awk ). The effect of a null string as the value of fs is unspecified.

       sprintf(fmt, expr, expr, ...)
              Format the expressions according to the printf format  given  by  fmt  and  return  the  resulting
              string.

       sub(ere, repl[, in  ])
              Substitute  the  string repl in place of the first instance of the extended regular expression ERE
              in string in and return the number of substitutions. An ampersand ( '&' ) appearing in the  string
              repl  shall  be  replaced by the string from in that matches the ERE. An ampersand preceded with a
              backslash ( '\' ) shall be interpreted as the literal ampersand character. An  occurrence  of  two
              consecutive  backslashes  shall  be  interpreted as just a single literal backslash character. Any
              other occurrence of a backslash (for example, preceding any other character) shall be treated as a
              literal backslash character. Note that if repl is a string literal (the lexical token STRING;  see
              Grammar  ), the handling of the ampersand character occurs after any lexical processing, including
              any lexical backslash escape sequence processing. If in is specified and it is not an lvalue  (see
              Expressions  in  awk  ),  the  behavior  is undefined. If in is omitted, awk shall use the current
              record ($0) in its place.

       substr(s, m[, n  ])
              Return the at most n-character substring of s that begins at position m, numbering from 1. If n is
              omitted, or if n specifies more characters than  are  left  in  the  string,  the  length  of  the
              substring shall be limited by the length of the string s.

       tolower(s)
              Return  a  string based on the string s. Each character in s that is an uppercase letter specified
              to have a tolower mapping by the LC_CTYPE category of the current locale shall be replaced in  the
              returned  string  by the lowercase letter specified by the mapping. Other characters in s shall be
              unchanged in the returned string.

       toupper(s)
              Return a string based on the string s. Each character in s that is a lowercase letter specified to
              have a toupper mapping by the LC_CTYPE category of the current locale is replaced in the  returned
              string  by  the  uppercase letter specified by the mapping. Other characters in s are unchanged in
              the returned string.

       All of the preceding functions that take ERE  as  a  parameter  expect  a  pattern  or  a  string  valued
       expression that is a regular expression as defined in Regular Expressions .

   Input/Output and General Functions
       The input/output and general functions are:

       close(expression)
              Close  the  file  or pipe opened by a print or printf statement or a call to getline with the same
              string-valued expression. The limit on the number of open expression arguments is  implementation-
              defined.  If  the close was successful, the function shall return zero; otherwise, it shall return
              non-zero.

       expression |  getline [var]
              Read a record of input from a stream piped from the output of a  command.   The  stream  shall  be
              created  if  no  stream  is  currently  open with the value of expression as its command name. The
              stream created shall be equivalent to one created by a call to the popen() function with the value
              of expression as the command argument and a value of r as the mode argument. As long as the stream
              remains open, subsequent calls in which expression evaluates to the same string value  shall  read
              subsequent  records  from  the  stream.  The  stream shall remain open until the close function is
              called with an expression that evaluates to the same string value. At that time, the stream  shall
              be  closed  as  if  by a call to the pclose() function. If var is omitted, $0 and NF shall be set;
              otherwise, var shall be set and, if appropriate, it shall be  considered  a  numeric  string  (see
              Expressions in awk ).

       The  getline  operator  can form ambiguous constructs when there are unparenthesized operators (including
       concatenate) to the left of the '|' (to the beginning of  the  expression  containing  getline).  In  the
       context  of  the  '$' operator, '|' shall behave as if it had a lower precedence than '$' . The result of
       evaluating other operators is unspecified, and conforming applications shall  parenthesize  properly  all
       such usages.

       getline
              Set  $0  to  the next input record from the current input file. This form of getline shall set the
              NF, NR, and FNR variables.

       getline  var
              Set variable var to the next input record from the current input file  and,  if  appropriate,  var
              shall be considered a numeric string (see Expressions in awk ). This form of getline shall set the
              FNR and NR variables.

       getline [var]  < expression
              Read  the  next  record of input from a named file. The expression shall be evaluated to produce a
              string that is used as a pathname. If the file of that name is not currently  open,  it  shall  be
              opened.  As long as the stream remains open, subsequent calls in which expression evaluates to the
              same string value shall read subsequent records from the file. The file shall  remain  open  until
              the close function is called with an expression that evaluates to the same string value. If var is
              omitted,  $0  and  NF  shall  be set; otherwise, var shall be set and, if appropriate, it shall be
              considered a numeric string (see Expressions in awk ).

       The getline operator can form ambiguous  constructs  when  there  are  unparenthesized  binary  operators
       (including concatenate) to the right of the '<' (up to the end of the expression containing the getline).
       The  result of evaluating such a construct is unspecified, and conforming applications shall parenthesize
       properly all such usages.

       system(expression)
              Execute the command given by expression in a manner equivalent to the system() function defined in
              the System Interfaces volume of IEEE Std 1003.1-2001 and return the exit status of the command.

       All forms of getline shall return 1 for successful input, zero for end-of-file, and -1 for an error.

       Where strings are used as the name of a file or pipeline, the application shall ensure that  the  strings
       are  textually  identical.   The  terminology "same string value" implies that "equivalent strings", even
       those that differ only by <space>s, represent different files.

   User-Defined Functions
       The awk language also provides user-defined functions. Such functions can be defined as:

              function name([parameter, ...]) { statements }

       A function can be referred to anywhere in an  awk  program;  in  particular,  its  use  can  precede  its
       definition. The scope of a function is global.

       Function  parameters,  if present, can be either scalars or arrays; the behavior is undefined if an array
       name is passed as a parameter that the function uses as a scalar, or if a scalar expression is passed  as
       a  parameter  that  the function uses as an array. Function parameters shall be passed by value if scalar
       and by reference if array name.

       The number of parameters in the function definition need not  match  the  number  of  parameters  in  the
       function  call.  Excess formal parameters can be used as local variables. If fewer arguments are supplied
       in a function call than are in the function definition,  the  extra  parameters  that  are  used  in  the
       function  body as scalars shall evaluate to the uninitialized value until they are otherwise initialized,
       and the extra parameters that are used in the function body as arrays shall be treated  as  uninitialized
       arrays where each element evaluates to the uninitialized value until otherwise initialized.

       When  invoking  a  function,  no  white  space  can  be  placed between the function name and the opening
       parenthesis. Function calls can be nested and recursive calls can be made  upon  functions.  Upon  return
       from  any nested or recursive function call, the values of all of the calling function's parameters shall
       be unchanged, except for array parameters passed by reference. The return statement can be used to return
       a value. If a return statement appears outside of a function definition, the behavior is undefined.

       In the function definition, <newline>s shall be optional before the opening brace and after  the  closing
       brace. Function definitions can appear anywhere in the program where a pattern-action pair is allowed.

   Grammar
       The  grammar in this section and the lexical conventions in the following section shall together describe
       the syntax for awk programs. The general conventions for this style of grammar are described  in  Grammar
       Conventions  . A valid program can be represented as the non-terminal symbol program in the grammar. This
       formal syntax shall take precedence over the preceding text syntax description.

              %token NAME NUMBER STRING ERE
              %token FUNC_NAME   /* Name followed by '(' without white space. */

              /* Keywords  */
              %token       Begin   End
              /*          'BEGIN' 'END'                            */

              %token       Break   Continue   Delete   Do   Else
              /*          'break' 'continue' 'delete' 'do' 'else'  */

              %token       Exit   For   Function   If   In
              /*          'exit' 'for' 'function' 'if' 'in'        */

              %token       Next   Print   Printf   Return   While
              /*          'next' 'print' 'printf' 'return' 'while' */

              /* Reserved function names */
              %token BUILTIN_FUNC_NAME
                          /* One token for the following:
                           * atan2 cos sin exp log sqrt int rand srand
                           * gsub index length match split sprintf sub
                           * substr tolower toupper close system
                           */
              %token GETLINE
                          /* Syntactically different from other built-ins. */

              /* Two-character tokens. */
              %token ADD_ASSIGN SUB_ASSIGN MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN POW_ASSIGN
              /*     '+='       '-='       '*='       '/='       '%='       '^=' */

              %token OR   AND  NO_MATCH   EQ   LE   GE   NE   INCR  DECR  APPEND
              /*     '||' '&&' '!~' '==' '<=' '>=' '!=' '++'  '--'  '>>'   */

              /* One-character tokens. */
              %token '{' '}' '(' ')' '[' ']' ',' ';' NEWLINE
              %token '+' '-' '*' '%' '^' '!' '>' '<' '|' '?' ':' '~' '$' '='

              %start program
              %%

              program          : item_list
                               | actionless_item_list
                               ;

              item_list        : newline_opt
                               | actionless_item_list item terminator
                               | item_list            item terminator
                               | item_list          action terminator
                               ;

              actionless_item_list : item_list            pattern terminator
                               | actionless_item_list pattern terminator
                               ;

              item             : pattern action
                               | Function NAME      '(' param_list_opt ')'
                                     newline_opt action
                               | Function FUNC_NAME '(' param_list_opt ')'
                                     newline_opt action
                               ;

              param_list_opt   : /* empty */
                               | param_list
                               ;

              param_list       : NAME
                               | param_list ',' NAME
                               ;

              pattern          : Begin
                               | End
                               | expr
                               | expr ',' newline_opt expr
                               ;

              action           : '{' newline_opt                             '}'
                               | '{' newline_opt terminated_statement_list   '}'
                               | '{' newline_opt unterminated_statement_list '}'
                               ;

              terminator       : terminator ';'
                               | terminator NEWLINE
                               |            ';'
                               |            NEWLINE
                               ;

              terminated_statement_list : terminated_statement
                               | terminated_statement_list terminated_statement
                               ;

              unterminated_statement_list : unterminated_statement
                               | terminated_statement_list unterminated_statement
                               ;

              terminated_statement : action newline_opt
                               | If '(' expr ')' newline_opt terminated_statement
                               | If '(' expr ')' newline_opt terminated_statement
                                     Else newline_opt terminated_statement
                               | While '(' expr ')' newline_opt terminated_statement
                               | For '(' simple_statement_opt ';'
                                    expr_opt ';' simple_statement_opt ')' newline_opt
                                    terminated_statement
                               | For '(' NAME In NAME ')' newline_opt
                                    terminated_statement
                               | ';' newline_opt
                               | terminatable_statement NEWLINE newline_opt
                               | terminatable_statement ';'     newline_opt
                               ;

              unterminated_statement : terminatable_statement
                               | If '(' expr ')' newline_opt unterminated_statement
                               | If '(' expr ')' newline_opt terminated_statement
                                    Else newline_opt unterminated_statement
                               | While '(' expr ')' newline_opt unterminated_statement
                               | For '(' simple_statement_opt ';'
                                expr_opt ';' simple_statement_opt ')' newline_opt
                                    unterminated_statement
                               | For '(' NAME In NAME ')' newline_opt
                                    unterminated_statement
                               ;

              terminatable_statement : simple_statement
                               | Break
                               | Continue
                               | Next
                               | Exit expr_opt
                               | Return expr_opt
                               | Do newline_opt terminated_statement While '(' expr ')'
                               ;

              simple_statement_opt : /* empty */
                               | simple_statement
                               ;

              simple_statement : Delete NAME '[' expr_list ']'
                               | expr
                               | print_statement
                               ;

              print_statement  : simple_print_statement
                               | simple_print_statement output_redirection
                               ;

              simple_print_statement : Print  print_expr_list_opt
                               | Print  '(' multiple_expr_list ')'
                               | Printf print_expr_list
                               | Printf '(' multiple_expr_list ')'
                               ;

              output_redirection : '>'    expr
                               | APPEND expr
                               | '|'    expr
                               ;

              expr_list_opt    : /* empty */
                               | expr_list
                               ;

              expr_list        : expr
                               | multiple_expr_list
                               ;

              multiple_expr_list : expr ',' newline_opt expr
                               | multiple_expr_list ',' newline_opt expr
                               ;

              expr_opt         : /* empty */
                               | expr
                               ;

              expr             : unary_expr
                               | non_unary_expr
                               ;

              unary_expr       : '+' expr
                               | '-' expr
                               | unary_expr '^'      expr
                               | unary_expr '*'      expr
                               | unary_expr '/'      expr
                               | unary_expr '%'      expr
                               | unary_expr '+'      expr
                               | unary_expr '-'      expr
                               | unary_expr          non_unary_expr
                               | unary_expr '<'      expr
                               | unary_expr LE       expr
                               | unary_expr NE       expr
                               | unary_expr EQ       expr
                               | unary_expr '>'      expr
                               | unary_expr GE       expr
                               | unary_expr '~'      expr
                               | unary_expr NO_MATCH expr
                               | unary_expr In NAME
                               | unary_expr AND newline_opt expr
                               | unary_expr OR  newline_opt expr
                               | unary_expr '?' expr ':' expr
                               | unary_input_function
                               ;

              non_unary_expr   : '(' expr ')'
                               | '!' expr
                               | non_unary_expr '^'      expr
                               | non_unary_expr '*'      expr
                               | non_unary_expr '/'      expr
                               | non_unary_expr '%'      expr
                               | non_unary_expr '+'      expr
                               | non_unary_expr '-'      expr
                               | non_unary_expr          non_unary_expr
                               | non_unary_expr '<'      expr
                               | non_unary_expr LE       expr
                               | non_unary_expr NE       expr
                               | non_unary_expr EQ       expr
                               | non_unary_expr '>'      expr
                               | non_unary_expr GE       expr
                               | non_unary_expr '~'      expr
                               | non_unary_expr NO_MATCH expr
                               | non_unary_expr In NAME
                               | '(' multiple_expr_list ')' In NAME
                               | non_unary_expr AND newline_opt expr
                               | non_unary_expr OR  newline_opt expr
                               | non_unary_expr '?' expr ':' expr
                               | NUMBER
                               | STRING
                               | lvalue
                               | ERE
                               | lvalue INCR
                               | lvalue DECR
                               | INCR lvalue
                               | DECR lvalue
                               | lvalue POW_ASSIGN expr
                               | lvalue MOD_ASSIGN expr
                               | lvalue MUL_ASSIGN expr
                               | lvalue DIV_ASSIGN expr
                               | lvalue ADD_ASSIGN expr
                               | lvalue SUB_ASSIGN expr
                               | lvalue '=' expr
                               | FUNC_NAME '(' expr_list_opt ')'
                                    /* no white space allowed before '(' */
                               | BUILTIN_FUNC_NAME '(' expr_list_opt ')'
                               | BUILTIN_FUNC_NAME
                               | non_unary_input_function
                               ;

              print_expr_list_opt : /* empty */
                               | print_expr_list
                               ;

              print_expr_list  : print_expr
                               | print_expr_list ',' newline_opt print_expr
                               ;

              print_expr       : unary_print_expr
                               | non_unary_print_expr
                               ;

              unary_print_expr : '+' print_expr
                               | '-' print_expr
                               | unary_print_expr '^'      print_expr
                               | unary_print_expr '*'      print_expr
                               | unary_print_expr '/'      print_expr
                               | unary_print_expr '%'      print_expr
                               | unary_print_expr '+'      print_expr
                               | unary_print_expr '-'      print_expr
                               | unary_print_expr          non_unary_print_expr
                               | unary_print_expr '~'      print_expr
                               | unary_print_expr NO_MATCH print_expr
                               | unary_print_expr In NAME
                               | unary_print_expr AND newline_opt print_expr
                               | unary_print_expr OR  newline_opt print_expr
                               | unary_print_expr '?' print_expr ':' print_expr
                               ;

              non_unary_print_expr : '(' expr ')'
                               | '!' print_expr
                               | non_unary_print_expr '^'      print_expr
                               | non_unary_print_expr '*'      print_expr
                               | non_unary_print_expr '/'      print_expr
                               | non_unary_print_expr '%'      print_expr
                               | non_unary_print_expr '+'      print_expr
                               | non_unary_print_expr '-'      print_expr
                               | non_unary_print_expr          non_unary_print_expr
                               | non_unary_print_expr '~'      print_expr
                               | non_unary_print_expr NO_MATCH print_expr
                               | non_unary_print_expr In NAME
                               | '(' multiple_expr_list ')' In NAME
                               | non_unary_print_expr AND newline_opt print_expr
                               | non_unary_print_expr OR  newline_opt print_expr
                               | non_unary_print_expr '?' print_expr ':' print_expr
                               | NUMBER
                               | STRING
                               | lvalue
                               | ERE
                               | lvalue INCR
                               | lvalue DECR
                               | INCR lvalue
                               | DECR lvalue
                               | lvalue POW_ASSIGN print_expr
                               | lvalue MOD_ASSIGN print_expr
                               | lvalue MUL_ASSIGN print_expr
                               | lvalue DIV_ASSIGN print_expr
                               | lvalue ADD_ASSIGN print_expr
                               | lvalue SUB_ASSIGN print_expr
                               | lvalue '=' print_expr
                               | FUNC_NAME '(' expr_list_opt ')'
                                   /* no white space allowed before '(' */
                               | BUILTIN_FUNC_NAME '(' expr_list_opt ')'
                               | BUILTIN_FUNC_NAME
                               ;

              lvalue           : NAME
                               | NAME '[' expr_list ']'
                               | '$' expr
                               ;

              non_unary_input_function : simple_get
                               | simple_get '<' expr
                               | non_unary_expr '|' simple_get
                               ;

              unary_input_function : unary_expr '|' simple_get
                               ;

              simple_get       : GETLINE
                               | GETLINE lvalue
                               ;

              newline_opt      : /* empty */
                               | newline_opt NEWLINE
                               ;

       This grammar has several ambiguities that shall be resolved as follows:

        * Operator precedence and associativity shall be as described in Expressions in Decreasing Precedence in
          awk .

        * In case of ambiguity, an else shall be associated with the most immediately preceding  if  that  would
          satisfy the grammar.

        * In some contexts, a slash ( '/' ) that is used to surround an ERE could also be the division operator.
          This  shall  be  resolved  in  such a way that wherever the division operator could appear, a slash is
          assumed to be the division operator. (There is no unary division operator.)

       One convention that might not be obvious from the formal grammar  is  where  <newline>s  are  acceptable.
       There  are  several  obvious  placements  such as terminating a statement, and a backslash can be used to
       escape <newline>s between any lexical tokens. In addition, <newline>s without backslashes  can  follow  a
       comma,  an  open  brace, logical AND operator ( "&&" ), logical OR operator ( "||" ), the do keyword, the
       else keyword, and the closing parenthesis of an if, for, or while statement. For example:

              { print  $1,
                       $2 }

   Lexical Conventions
       The lexical conventions for awk programs, with respect to the preceding grammar, shall be as follows:

        1. Except as noted, awk shall recognize the longest possible token or delimiter  beginning  at  a  given
           point.

        2. A comment shall consist of any characters beginning with the number sign character and terminated by,
           but  excluding  the next occurrence of, a <newline>. Comments shall have no effect, except to delimit
           lexical tokens.

        3. The <newline> shall be recognized as the token NEWLINE.

        4. A backslash character immediately followed by a <newline> shall have no effect.

        5. The token STRING shall represent a string constant. A string constant shall begin with the  character
           '  .' Within a string constant, a backslash character shall be considered to begin an escape sequence
           as specified in the table in the Base Definitions volume of  IEEE Std 1003.1-2001,  Chapter  5,  File
           Format  Notation  (  '\\' , '\a' , '\b' , '\f' , '\n' , '\r' , '\t' , '\v' ). In addition, the escape
           sequences in Expressions in Decreasing Precedence in awk shall be recognized. A <newline>  shall  not
           occur  within  a  string  constant.  A  string  constant  shall  be terminated by the first unescaped
           occurrence of the character '' after the one that begins the string constant. The value of the string
           shall be the sequence of all unescaped characters and values of escape  sequences  between,  but  not
           including, the two delimiting '' characters.

        6. The  token  ERE represents an extended regular expression constant.  An ERE constant shall begin with
           the slash character.  Within an ERE constant, a backslash character shall be considered to  begin  an
           escape  sequence  as  specified  in the table in the Base Definitions volume of IEEE Std 1003.1-2001,
           Chapter 5, File Format Notation. In addition, the  escape  sequences  in  Expressions  in  Decreasing
           Precedence  in  awk shall be recognized. The application shall ensure that a <newline> does not occur
           within an ERE constant. An ERE constant shall be terminated by the first unescaped occurrence of  the
           slash  character  after  the  one  that  begins  the  ERE  constant.  The extended regular expression
           represented by the ERE constant shall be the sequence of  all  unescaped  characters  and  values  of
           escape sequences between, but not including, the two delimiting slash characters.

        7. A <blank> shall have no effect, except to delimit lexical tokens or within STRING or ERE tokens.

        8. The  token  NUMBER shall represent a numeric constant. Its form and numeric value shall be equivalent
           to either of the tokens floating-constant or integer-constant as specified  by  the  ISO C  standard,
           with the following exceptions:

            a. An  integer constant cannot begin with 0x or include the hexadecimal digits 'a' , 'b' , 'c' , 'd'
               , 'e' , 'f' , 'A' , 'B' , 'C' , 'D' , 'E' , or 'F' .

            b. The value of an integer constant beginning with 0 shall be taken in decimal rather than octal.

            c. An integer constant cannot include a suffix ( 'u' , 'U' , 'l' , or 'L' ).

            d. A floating constant cannot include a suffix ( 'f' , 'F' , 'l' , or 'L' ).

       If the value is too large or too small to be representable (see Concepts Derived from the ISO C  Standard
       ), the behavior is undefined.

        9. A  sequence  of  underscores,  digits,  and alphabetics from the portable character set (see the Base
           Definitions volume of IEEE Std 1003.1-2001, Section 6.1, Portable Character Set), beginning  with  an
           underscore or alphabetic, shall be considered a word.

       10. The following words are keywords that shall be recognized as individual tokens; the name of the token
           is the same as the keyword:

               BEGIN           delete          END             function        in              printf
               break           do              exit            getline         next            return
               continue        else            for             if              print           while

       11. The  following  words  are  names  of  built-in  functions  and  shall  be  recognized  as  the token
           BUILTIN_FUNC_NAME:

               atan2           gsub            log             split           sub             toupper
               close           index           match           sprintf         substr
               cos             int             rand            sqrt            system
               exp             length          sin             srand           tolower

       The above-listed keywords and names of built-in functions are considered reserved words.

       12. The token NAME shall consist of a word that is not a keyword or a name of a built-in function and  is
           not followed immediately (without any delimiters) by the '(' character.

       13. The  token  FUNC_NAME shall consist of a word that is not a keyword or a name of a built-in function,
           followed immediately (without any delimiters) by the '(' character. The '(' character  shall  not  be
           included as part of the token.

       14. The following two-character sequences shall be recognized as the named tokens:
                                       Token Name   Sequence   Token Name   Sequence
                                       ADD_ASSIGN   +=         NO_MATCH     !~
                                       SUB_ASSIGN   -=         EQ           ==
                                       MUL_ASSIGN   *=         LE           <=
                                       DIV_ASSIGN   /=         GE           >=
                                       MOD_ASSIGN   %=         NE           !=
                                       POW_ASSIGN   ^=         INCR         ++
                                       OR           ||         DECR         --
                                       AND          &&         APPEND       >>

       15. The following single characters shall be recognized as tokens whose names are the character:

           <newline> { } ( ) [ ] , ; + - * % ^ ! > < | ? : ~ $ =

       There  is  a  lexical  ambiguity  between  the token ERE and the tokens '/' and DIV_ASSIGN. When an input
       sequence begins with a slash character in any syntactic context where the token '/' or  DIV_ASSIGN  could
       appear  as the next token in a valid program, the longer of those two tokens that can be recognized shall
       be recognized. In any other syntactic context where the token ERE could appear as the  next  token  in  a
       valid program, the token ERE shall be recognized.

EXIT STATUS

       The following exit values shall be returned:

        0     All input files were processed successfully.

       >0     An error occurred.

       The exit status can be altered within the program by using an exit expression.

CONSEQUENCES OF ERRORS

       If  any  file  operand  is  specified and the named file cannot be accessed, awk shall write a diagnostic
       message to standard error and terminate without any further action.

       If the program specified by either the program operand or a progfile operand is not a valid  awk  program
       (as specified in the EXTENDED DESCRIPTION section), the behavior is undefined.

       The following sections are informative.

APPLICATION USAGE

       The index, length, match, and substr functions should not be confused with similar functions in the ISO C
       standard; the awk versions deal with characters, while the ISO C standard deals with bytes.

       Because  the  concatenation  operation  is  represented  by  adjacent expressions rather than an explicit
       operator, it is often necessary to use parentheses to enforce the proper evaluation precedence.

EXAMPLES

       The awk program specified in the command line is most easily specified within single-quotes (for example,
       programs commonly contain characters that are special to the  shell,  including  double-quotes.   In  the
       cases where an awk program contains single-quote characters, it is usually easiest to specify most of the
       program  as  strings  within single-quotes concatenated by the shell with quoted single-quote characters.
       For example:

              awk '/'\''/ { print "quote:", $0 }'

       prints all lines from the standard input containing a single-quote character, prefixed with quote:.

       The following are examples of simple awk programs:

        1. Write to the standard output all input lines for which field 3 is greater than 5:

           $3 > 5

        2. Write every tenth line:

           (NR % 10) == 0

        3. Write any line with a substring matching the regular expression:

           /(G|D)(2[0-9][[:alpha:]]*)/

        4. Print any line with a substring containing a 'G' or 'D' ,  followed  by  a  sequence  of  digits  and
           characters.   This example uses character classes digit and alpha to match language-independent digit
           and alphabetic characters respectively:

           /(G|D)([[:digit:][:alpha:]]*)/

        5. Write any line in which the second field matches the regular expression and  the  fourth  field  does
           not:

           $2 ~ /xyz/ && $4 !~ /xyz/

        6. Write any line in which the second field contains a backslash:

           $2 ~ /\\/

        7. Write  any  line  in  which  the  second  field contains a backslash. Note that backslash escapes are
           interpreted twice; once in lexical processing of the  string  and  once  in  processing  the  regular
           expression:

           $2 ~ "\\\\"

        8. Write the second to the last and the last field in each line. Separate the fields by a colon:

           {OFS=":";print $(NF-1), $NF}

        9. Write  the  line  number  and  number of fields in each line. The three strings representing the line
           number, the colon, and the number of fields are concatenated and that string is written  to  standard
           output:

           {print NR ":" NF}

       10. Write lines longer than 72 characters:

           length($0) > 72

       11. Write the first two fields in opposite order separated by OFS:

           { print $2, $1 }

       12. Same, with input fields separated by a comma or <space>s and <tab>s, or both:

           BEGIN { FS = ",[ \t]*|[ \t]+" }
                 { print $2, $1 }

       13. Add up the first column, print sum, and average:

                {s += $1 }
           END   {print "sum is ", s, " average is", s/NR}

       14. Write fields in reverse order, one per line (many lines out for each line in):

           { for (i = NF; i > 0; --i) print $i }

       15. Write all lines between occurrences of the strings start and stop:

           /start/, /stop/

       16. Write all lines whose first field is different from the previous one:

           $1 != prev { print; prev = $1 }

       17. Simulate echo:

           BEGIN  {
                   for (i = 1; i < ARGC; ++i)
                   printf("%s%s", ARGV[i], i==ARGC-1?"\n":" ")
           }

       18. Write the path prefixes contained in the PATH environment variable, one per line:

           BEGIN  {
                   n = split (ENVIRON["PATH"], path, ":")
                   for (i = 1; i <= n; ++i)
                   print path[i]
           }

       19. If there is a file named input containing page headers of the form:

           Page #

       and a file named program that contains:

              /Page/   { $2 = n++; }
                       { print }

       then the command line:

              awk -f program n=5 input

       prints the file input, filling in page numbers starting at 5.

RATIONALE

       This  description  is  based  on  the new awk, "nawk", (see the referenced The AWK Programming Language),
       which introduced a number of new features to the historical awk:

        1. New keywords: delete, do, function, return

        2. New built-in functions: atan2, close, cos, gsub, match, rand, sin, srand, sub, system

        3. New predefined variables: FNR, ARGC, ARGV, RSTART, RLENGTH, SUBSEP

        4. New expression operators: ?, :, ,, ^

        5. The FS variable and the third argument to split, now treated as extended regular expressions.

        6. The operator precedence, changed to more closely match the C language.  Two  examples  of  code  that
           operate differently are:

           while ( n /= 10 > 1) ...
           if (!"wk" ~ /bwk/) ...

       Several features have been added based on newer implementations of awk:

        * Multiple instances of -f progfile are permitted.

        * The new option -v assignment.

        * The new predefined variable ENVIRON.

        * New built-in functions toupper and tolower.

        * More formatting capabilities are added to printf to match the ISO C standard.

       The  overall  awk  syntax  has  always  been  based on the C language, with a few features from the shell
       command language and other sources. Because of this, it is  not  completely  compatible  with  any  other
       language,  which has caused confusion for some users.  It is not the intent of the standard developers to
       address such issues.  A few relatively minor changes toward making the language more compatible with  the
       ISO C  standard  were made; most of these changes are based on similar changes in recent implementations,
       as described above. There remain several C-language conventions that are not in awk. One of  the  notable
       ones  is the comma operator, which is commonly used to specify multiple expressions in the C language for
       statement. Also, there are various places where awk is more restrictive than the C language regarding the
       type of expression that can be used in a given context.  These  limitations  are  due  to  the  different
       features that the awk language does provide.

       Regular  expressions  in  awk  have been extended somewhat from historical implementations to make them a
       pure superset of  extended  regular  expressions,  as  defined  by  IEEE Std 1003.1-2001  (see  the  Base
       Definitions  volume  of  IEEE Std 1003.1-2001,  Section  9.4,  Extended  Regular  Expressions).  The main
       extensions are internationalization features and interval expressions.  Historical implementations of awk
       have long supported backslash escape sequences as an extension to extended regular expressions, and  this
       extension  has  been  retained despite inconsistency with other utilities. The number of escape sequences
       recognized in both extended regular expressions and strings has varied (generally increasing  with  time)
       among  implementations.  The  set  specified  by IEEE Std 1003.1-2001 includes most sequences known to be
       supported by popular implementations and by the ISO C standard. One sequence that  is  not  supported  is
       hexadecimal  value escapes beginning with '\x' . This would allow values expressed in more than 9 bits to
       be used within awk as in the ISO C standard. However, because this syntax has a non-deterministic length,
       it does not permit the subsequent character to be a hexadecimal digit. This limitation can be dealt  with
       in  the  C  language by the use of lexical string concatenation. In the awk language, concatenation could
       also be a solution for strings, but not for extended regular expressions (either lexical  ERE  tokens  or
       strings  used  dynamically  as regular expressions). Because of this limitation, the feature has not been
       added to IEEE Std 1003.1-2001.

       When a string variable is used in a context where an extended regular expression normally appears  (where
       the lexical token ERE is used in the grammar) the string does not contain the literal slashes.

       Some versions of awk allow the form:

              func name(args, ... ) { statements }

       This has been deprecated by the authors of the language, who asked that it not be specified.

       Historical implementations of awk produce an error if a next statement is executed in a BEGIN action, and
       cause  awk  to  terminate  if  a  next statement is executed in an END action. This behavior has not been
       documented, and it was not believed that it was necessary to standardize it.

       The specification of conversions between string and numeric values is much  more  detailed  than  in  the
       documentation  of historical implementations or in the referenced The AWK Programming Language.  Although
       most of the behavior is designed to be intuitive, the details are necessary to ensure compatible behavior
       from different implementations. This is especially important in relational expressions since the types of
       the operands determine whether a string or numeric comparison is performed. From the  perspective  of  an
       application  writer,  it  is usually sufficient to expect intuitive behavior and to force conversions (by
       adding zero or concatenating a null string) when the type of an expression does not obviously match  what
       is  needed.  The intent has been to specify historical practice in almost all cases. The one exception is
       that, in historical implementations, variables and constants maintain  both  string  and  numeric  values
       after  their  original  value is converted by any use. This means that referencing a variable or constant
       can have unexpected side effects. For example, with historical implementations the following program:

              {
                  a = "+2"
                  b = 2
                  if (NR % 2)
                      c = a + b
                  if (a == b)
                      print "numeric comparison"
                  else
                      print "string comparison"
              }

       would perform a numeric comparison (and output  numeric  comparison)  for  each  odd-numbered  line,  but
       perform   a   string   comparison   (and   output   string   comparison)  for  each  even-numbered  line.
       IEEE Std 1003.1-2001  ensures  that  comparisons  will  be  numeric   if   necessary.   With   historical
       implementations, the following program:

              BEGIN {
                  OFMT = "%e"
                  print 3.14
                  OFMT = "%f"
                  print 3.14
              }

       would output "3.140000e+00" twice, because in the second print statement the constant "3.14" would have a
       string  value  from  the previous conversion. IEEE Std 1003.1-2001 requires that the output of the second
       print statement be "3.140000" . The behavior of historical implementations was seen  as  too  unintuitive
       and unpredictable.

       It  was  pointed  out  that  with  the  rules contained in early drafts, the following script would print
       nothing:

              BEGIN {
                  y[1.5] = 1
                  OFMT = "%e"
                  print y[1.5]
              }

       Therefore, a new variable, CONVFMT, was introduced. The OFMT variable  is  now  restricted  to  affecting
       output  conversions  of  numbers  to  strings  and  CONVFMT  is  used  for  internal conversions, such as
       comparisons or array indexing. The default value is the same as  that  for  OFMT,  so  unless  a  program
       changes  CONVFMT  (which  no  historical  program  would  do),  it  will  receive the historical behavior
       associated with internal string conversions.

       The POSIX awk lexical and syntactic conventions are specified more formally than in other sources.  Again
       the  intent  has  been  to  specify  historical practice. One convention that may not be obvious from the
       formal grammar as in other verbal descriptions is where <newline>s  are  acceptable.  There  are  several
       obvious  placements  such  as  terminating  a statement, and a backslash can be used to escape <newline>s
       between any lexical tokens. In addition, <newline>s without backslashes  can  follow  a  comma,  an  open
       brace, a logical AND operator ( "&&" ), a logical OR operator ( "||" ), the do keyword, the else keyword,
       and the closing parenthesis of an if, for, or while statement. For example:

              { print $1,
                      $2 }

       The  requirement  that  awk  add  a  trailing  <newline>  to the program argument text is to simplify the
       grammar, making it match a text file in form. There is no  way  for  an  application  or  test  suite  to
       determine whether a literal <newline> is added or whether awk simply acts as if it did.

       IEEE Std 1003.1-2001  requires  several  changes  from  historical  implementations  in  order to support
       internationalization. Probably the most subtle of these  is  the  use  of  the  decimal-point  character,
       defined  by  the  LC_NUMERIC  category of the locale, in representations of floating-point numbers.  This
       locale-specific character is used in recognizing numeric input, in converting between strings and numeric
       values, and in formatting output. However, regardless of locale, the period character (the  decimal-point
       character  of  the  POSIX  locale)  is  the decimal-point character recognized in processing awk programs
       (including assignments in command line arguments). This is essentially the same  convention  as  the  one
       used  in  the  ISO C  standard.  The difference is that the C language includes the setlocale() function,
       which permits an application to modify its locale. Because of this capability,  a  C  application  begins
       executing  with  its  locale  set  to the C locale, and only executes in the environment-specified locale
       after an explicit call to setlocale(). However, adding such an elaborate new feature to the awk  language
       was  seen  as inappropriate for IEEE Std 1003.1-2001. It is possible to execute an awk program explicitly
       in any desired locale by setting the environment in the shell.

       The undefined behavior resulting from NULs in extended regular expressions allows future  extensions  for
       the GNU gawk program to process binary data.

       The  behavior  in the case of invalid awk programs (including lexical, syntactic, and semantic errors) is
       undefined because it was considered overly limiting on implementations to specify.  In  most  cases  such
       errors  can be expected to produce a diagnostic and a non-zero exit status. However, some implementations
       may choose to extend the language in ways that make use of  certain  invalid  constructs.  Other  invalid
       constructs  might  be  deemed  worthy  of a warning, but otherwise cause some reasonable behavior.  Still
       other  constructs  may  be  very  difficult  to  detect  in  some   implementations.    Also,   different
       implementations  might  detect a given error during an initial parsing of the program (before reading any
       input files) while others  might  detect  it  when  executing  the  program  after  reading  some  input.
       Implementors should be aware that diagnosing errors as early as possible and producing useful diagnostics
       can ease debugging of applications, and thus make an implementation more usable.

       The  unspecified  behavior  from  using  multi-character RS values is to allow possible future extensions
       based on extended regular expressions used for record separators.  Historical  implementations  take  the
       first character of the string and ignore the others.

       Unspecified  behavior  when split( string, array, <null>) is used is to allow a proposed future extension
       that would split up a string into an array of individual characters.

       In the context of the getline function, equally good arguments for different precedences of the |  and  <
       operators can be made. Historical practice has been that:

              getline < "a" "b"

       is parsed as:

              ( getline < "a" ) "b"

       although many would argue that the intent was that the file ab should be read. However:

              getline < "x" + 1

       parses as:

              getline < ( "x" + 1 )

       Similar problems occur with the | version of getline, particularly in combination with $. For example:

              $"echo hi" | getline

       (This situation is particularly problematic when used in a print statement, where the |getline part might
       be a redirection of the print.)

       Since in most cases such constructs are not (or at least should not) be used (because they have a natural
       ambiguity  for  which  there  is  no conventional parsing), the meaning of these constructs has been made
       explicitly unspecified. (The effect is that a conforming application that  runs  into  the  problem  must
       parenthesize to resolve the ambiguity.) There appeared to be few if any actual uses of such constructs.

       Grammars  can  be  written  that  would  cause  an  error  under  these  circumstances.  Where backwards-
       compatibility is not a large consideration, implementors may wish to use such grammars.

       Some historical implementations have allowed some built-in functions to be  called  without  an  argument
       list,  the  result  being  a  default  argument  list chosen in some "reasonable" way. Use of length as a
       synonym for length($0) is the only one of these forms that is thought to be widely known or widely  used;
       this  particular  form is documented in various places (for example, most historical awk reference pages,
       although not in the referenced The AWK Programming Language) as legitimate practice. With this exception,
       default argument lists have always been undocumented and vaguely defined, and it is not at all clear  how
       (or  if)  they  should  be  generalized  to user-defined functions.  They add no useful functionality and
       preclude possible future extensions that  might  need  to  name  functions  without  calling  them.   Not
       standardizing  them  seems  the  simplest  course. The standard developers considered that length merited
       special treatment, however, since it has been documented in the past and sees possibly substantial use in
       historical programs.  Accordingly,  this  usage  has  been  made  legitimate,  but  Issue 5  removed  the
       obsolescent  marking for XSI-conforming implementations and many otherwise conforming applications depend
       on this feature.

       In sub and gsub, if repl is a string literal (the lexical token STRING), then two  consecutive  backslash
       characters  should be used in the string to ensure a single backslash will precede the ampersand when the
       resultant string is passed to the function. (For  example,  to  specify  one  literal  ampersand  in  the
       replacement string, use gsub( ERE, "\\&" ).)

       Historically  the  only  special  character in the repl argument of sub and gsub string functions was the
       ampersand ( '&' ) character and preceding it with the backslash  character  was  used  to  turn  off  its
       special meaning.

       The  description  in  the ISO POSIX-2:1993 standard introduced behavior such that the backslash character
       was another special character and it was unspecified whether there were  any  other  special  characters.
       This  description  introduced  several portability problems, some of which are described below, and so it
       has been replaced with the more historical description. Some of the problems include:

        * Historically, to create the replacement string, a script could use gsub( ERE, "\\&" ),  but  with  the
          ISO POSIX-2:1993  standard wording, it was necessary to use gsub( ERE, "\\\\&" ). Backslash characters
          are doubled here because all string literals are subject to lexical analysis, which would reduce  each
          pair of backslash characters to a single backslash before being passed to gsub.

        * Since  it  was  unspecified  what  the special characters were, for portable scripts to guarantee that
          characters are printed literally, each character had to be preceded with a backslash. (For example,  a
          portable script had to use gsub( ERE, "\\h\\i" ) to produce a replacement string of "hi" .)

       The  description  for  comparisons  in the ISO POSIX-2:1993 standard did not properly describe historical
       practice because of the way numeric strings  are  compared  as  numbers.  The  current  rules  cause  the
       following code:

              if (0 == "000")
                  print "strange, but true"
              else
                  print "not true"

       to  do  a  numeric  comparison,  causing the if to succeed. It should be intuitively obvious that this is
       incorrect behavior, and indeed, no historical implementation of awk actually behaves this way.

       To fix this problem, the definition of numeric string was enhanced to include only those values  obtained
       from specific circumstances (mostly external sources) where it is not possible to determine unambiguously
       whether the value is intended to be a string or a numeric.

       Variables  that are assigned to a numeric string shall also be treated as a numeric string. (For example,
       the notion of a numeric string can be propagated  across  assignments.)  In  comparisons,  all  variables
       having  the  uninitialized  value  are to be treated as a numeric operand evaluating to the numeric value
       zero.

       Uninitialized variables include all types of variables including scalars, array elements, and fields. The
       definition of an uninitialized value in Variables and Special Variables  is  necessary  to  describe  the
       value  placed  on  uninitialized  variables and on fields that are valid (for example, < $NF) but have no
       characters in them and to describe how these variables are to be used in comparisons. A valid field, such
       as $1, that has no characters in it can be obtained from  an  input  line  of  "\t\t"  when  FS=  '\t'  .
       Historically, the comparison ( $1<10) was done numerically after evaluating $1 to the value zero.

       The  phrase  "...  also  shall  have  the  numeric  value of the numeric string" was removed from several
       sections of the ISO POSIX-2:1993 standard because is specifies an unnecessary implementation  detail.  It
       is not necessary for IEEE Std 1003.1-2001 to specify that these objects be assigned two different values.
       It  is  only  necessary  to  specify that these objects may evaluate to two different values depending on
       context.

       The description of numeric string processing is based on the behavior of the atof() function in the ISO C
       standard. While it is not a requirement for an implementation  to  use  this  function,  many  historical
       implementations  of  awk  do.  In  the ISO C standard, floating-point constants use a period as a decimal
       point character for the language itself, independent of the current locale, but the atof()  function  and
       the  associated  strtod()  function use the decimal point character of the current locale when converting
       strings to numeric values. Similarly in awk, floating-point constants in  an  awk  script  use  a  period
       independent of the locale, but input strings use the decimal point character of the locale.

FUTURE DIRECTIONS

       None.

SEE ALSO

       Grammar  Conventions  ,  grep , lex , sed , the System Interfaces volume of IEEE Std 1003.1-2001, atof(),
       exec, popen(), setlocale(), strtod()

COPYRIGHT

       Portions of this text are reprinted and reproduced in electronic form from IEEE Std 1003.1, 2003 Edition,
       Standard for Information Technology -- Portable Operating System Interface (POSIX), The Open  Group  Base
       Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of Electrical and Electronics Engineers,
       Inc  and  The  Open Group. In the event of any discrepancy between this version and the original IEEE and
       The Open Group Standard, the original IEEE and The Open Group  Standard  is  the  referee  document.  The
       original Standard can be obtained online at http://www.opengroup.org/unix/online.html .

IEEE/The Open Group                                   2003                                                AWK(P)