Provided by: manpages-posix-dev_2.16-1_all bug

NAME

       regcomp, regerror, regexec, regfree - regular expression matching

SYNOPSIS

       #include <regex.h>

       int regcomp(regex_t *restrict preg, const char *restrict pattern,
              int cflags);
       size_t regerror(int errcode, const regex_t *restrict preg,
              char *restrict errbuf, size_t errbuf_size);
       int regexec(const regex_t *restrict preg, const char *restrict string,
              size_t nmatch, regmatch_t pmatch[restrict], int eflags);
       void regfree(regex_t *preg);

DESCRIPTION

       These  functions  interpret  basic  and extended regular expressions as described in the Base Definitions
       volume of IEEE Std 1003.1-2001, Chapter 9, Regular Expressions.

       The regex_t structure is defined in <regex.h> and contains at least the following member:

                           Member Type  Member Name  Description
                           size_t       re_nsub      Number of parenthesized subexpressions.

       The regmatch_t structure is defined in <regex.h> and contains at least the following members:

                           Member Type Member Name Description
                           regoff_t    rm_so       Byte offset from start of string to
                                                   start of substring.
                           regoff_t    rm_eo       Byte offset from start of string of the
                                                   first character after the end of
                                                   substring.

       The  regcomp()  function  shall  compile the regular expression contained in the string pointed to by the
       pattern argument and place the results in the structure pointed to by preg.  The cflags argument  is  the
       bitwise-inclusive OR of zero or more of the following flags, which are defined in the <regex.h> header:

       REG_EXTENDED
              Use Extended Regular Expressions.

       REG_ICASE
              Ignore case in match. (See the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 9, Regular
              Expressions.)

       REG_NOSUB
              Report only success/fail in regexec().

       REG_NEWLINE
              Change the handling of <newline>s, as described in the text.

       The default regular expression type for pattern is  a  Basic  Regular  Expression.  The  application  can
       specify Extended Regular Expressions using the REG_EXTENDED cflags flag.

       If  the  REG_NOSUB  flag  was  not  set  in  cflags,  then  regcomp()  shall set re_nsub to the number of
       parenthesized subexpressions (delimited by "\(\)" in  basic  regular  expressions  or  "()"  in  extended
       regular expressions) found in pattern.

       The  regexec() function compares the null-terminated string specified by string with the compiled regular
       expression preg initialized by a previous call to regcomp().  If it finds a match, regexec() shall return
       0; otherwise, it shall return non-zero indicating either no match or an error. The eflags argument is the
       bitwise-inclusive OR of zero or more of the following flags, which are defined in the <regex.h> header:

       REG_NOTBOL
              The first character of the string pointed  to  by  string  is  not  the  beginning  of  the  line.
              Therefore,  the  circumflex  character ( '^' ), when taken as a special character, shall not match
              the beginning of string.

       REG_NOTEOL
              The last character of the string pointed to by string is not the end of the line.  Therefore,  the
              dollar sign ( '$' ), when taken as a special character, shall not match the end of string.

       If  nmatch is 0 or REG_NOSUB was set in the cflags argument to regcomp(), then regexec() shall ignore the
       pmatch argument. Otherwise, the application shall ensure that the pmatch argument points to an array with
       at  least  nmatch  elements,  and  regexec() shall fill in the elements of that array with offsets of the
       substrings of string that correspond to the parenthesized subexpressions of pattern:  pmatch[  i].  rm_so
       shall be the byte offset of the beginning and pmatch[ i]. rm_eo shall be one greater than the byte offset
       of the end of substring i. (Subexpression i begins at the ith matched open parenthesis, counting from 1.)
       Offsets  in  pmatch[0]  identify  the substring that corresponds to the entire regular expression. Unused
       elements of pmatch up to pmatch[ nmatch-1] shall be filled  with  -1.  If  there  are  more  than  nmatch
       subexpressions  in pattern ( pattern itself counts as a subexpression), then regexec() shall still do the
       match, but shall record only the first nmatch substrings.

       When matching a basic or extended regular expression, any given parenthesized  subexpression  of  pattern
       might  participate  in  the  match  of  several different substrings of string, or it might not match any
       substring even though the pattern as a whole did match. The following rules shall be  used  to  determine
       which substrings to report in pmatch when matching regular expressions:

        1. If  subexpression  i  in  a  regular expression is not contained within another subexpression, and it
           participated in the match several times, then the byte offsets in pmatch[ i] shall delimit  the  last
           such match.

        2. If  subexpression  i  is not contained within another subexpression, and it did not participate in an
           otherwise successful match, the byte offsets in pmatch[ i] shall be  -1.  A  subexpression  does  not
           participate  in  the match when: '*' or "\{\}" appears immediately after the subexpression in a basic
           regular expression, or '*' , '?' , or "{}" appears immediately after the subexpression in an extended
           regular expression, and the subexpression did not match (matched 0 times)

       or:  '|' is used in an extended regular expression to select this subexpression or another, and the other
       subexpression matched.

        3. If subexpression i is contained within another subexpression j, and i is  not  contained  within  any
           other subexpression that is contained within j, and a match of subexpression j is reported in pmatch[
           j], then the match or non-match of subexpression i reported in pmatch[ i] shall be as described in 1.
           and  2.   above,  but  within  the substring reported in pmatch[ j] rather than the whole string. The
           offsets in pmatch[ i] are still relative to the start of string.

        4. If subexpression i is contained in subexpression j, and the byte offsets in pmatch[ j] are  -1,  then
           the pointers in pmatch[ i] shall also be -1.

        5. If  subexpression  i  matched a zero-length string, then both byte offsets in pmatch[ i] shall be the
           byte offset of the character or null terminator immediately following the zero-length string.

       If, when regexec() is called, the locale is different from when the regular expression was compiled,  the
       result is undefined.

       If  REG_NEWLINE  is  not  set  in  cflags,  then  a <newline> in pattern or string shall be treated as an
       ordinary character. If REG_NEWLINE is set, then <newline> shall  be  treated  as  an  ordinary  character
       except as follows:

        1. A <newline> in string shall not be matched by a period outside a bracket expression or by any form of
           a non-matching list (see the Base Definitions volume  of  IEEE Std 1003.1-2001,  Chapter  9,  Regular
           Expressions).

        2. A  circumflex ( '^' ) in pattern, when used to specify expression anchoring (see the Base Definitions
           volume of IEEE Std 1003.1-2001, Section 9.3.8, BRE Expression Anchoring), shall match the zero-length
           string immediately after a <newline> in string, regardless of the setting of REG_NOTBOL.

        3. A  dollar  sign  ( '$' ) in pattern, when used to specify expression anchoring, shall match the zero-
           length string immediately before a <newline> in string, regardless of the setting of REG_NOTEOL.

       The regfree() function frees any memory allocated by regcomp() associated with preg.

       The following constants are defined as error return values:

       REG_NOMATCH
              regexec() failed to match.

       REG_BADPAT
              Invalid regular expression.

       REG_ECOLLATE
              Invalid collating element referenced.

       REG_ECTYPE
              Invalid character class type referenced.

       REG_EESCAPE
              Trailing '\' in pattern.

       REG_ESUBREG
              Number in "\digit" invalid or in error.

       REG_EBRACK
              "[]" imbalance.

       REG_EPAREN
              "\(\)" or "()" imbalance.

       REG_EBRACE
              "\{\}" imbalance.

       REG_BADBR
              Content of "\{\}" invalid: not a number, number too large, more than  two  numbers,  first  larger
              than second.

       REG_ERANGE
              Invalid endpoint in range expression.

       REG_ESPACE
              Out of memory.

       REG_BADRPT
              '?' , '*' , or '+' not preceded by valid regular expression.

       The  regerror()  function  provides  a  mapping  from  error codes returned by regcomp() and regexec() to
       unspecified printable strings. It generates a string corresponding to the value of the errcode  argument,
       which the application shall ensure is the last non-zero value returned by regcomp() or regexec() with the
       given value of preg. If errcode is not such a value, the content of the generated string is unspecified.

       If preg is a null pointer, but errcode is a value returned by a previous call to regexec() or  regcomp(),
       the regerror() still generates an error string corresponding to the value of errcode, but it might not be
       as detailed under some implementations.

       If the errbuf_size argument is not 0, regerror() shall place the generated string into the buffer of size
       errbuf_size  bytes pointed to by errbuf. If the string (including the terminating null) cannot fit in the
       buffer, regerror() shall truncate the string and null-terminate the result.

       If errbuf_size is 0, regerror() shall ignore the errbuf argument, and  return  the  size  of  the  buffer
       needed to hold the generated string.

       If  the  preg  argument  to  regexec()  or  regfree()  is  not  a compiled regular expression returned by
       regcomp(), the result is undefined. A preg is no longer treated as a compiled regular expression after it
       is given to regfree().

RETURN VALUE

       Upon  successful completion, the regcomp() function shall return 0. Otherwise, it shall return an integer
       value indicating an error as described in <regex.h>, and the content of preg is undefined. If a  code  is
       returned, the interpretation shall be as given in <regex.h>.

       If  regcomp()  detects  an  invalid RE, it may return REG_BADPAT, or it may return one of the error codes
       that more precisely describes the error.

       Upon successful completion, the regexec() function shall return 0. Otherwise, it shall return REG_NOMATCH
       to indicate no match.

       Upon  successful  completion, the regerror() function shall return the number of bytes needed to hold the
       entire generated string, including the null termination. If the return value is greater than errbuf_size,
       the string returned in the buffer pointed to by errbuf has been truncated.

       The regfree() function shall not return a value.

ERRORS

       No errors are defined.

       The following sections are informative.

EXAMPLES

              #include <regex.h>

              /*
               * Match string against the extended regular expression in
               * pattern, treating errors as no match.
               *
               * Return 1 for match, 0 for no match.
               */

              int
              match(const char *string, char *pattern)
              {
                  int    status;
                  regex_t    re;

                  if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) {
                      return(0);      /* Report error. */
                  }
                  status = regexec(&re, string, (size_t) 0, NULL, 0);
                  regfree(&re);
                  if (status != 0) {
                      return(0);      /* Report error. */
                  }
                  return(1);
              }

       The following demonstrates how the REG_NOTBOL flag could be used with regexec() to find all substrings in
       a line that match a pattern supplied by a user.  (For  simplicity  of  the  example,  very  little  error
       checking is done.)

              (void) regcomp (&re, pattern, 0);
              /* This call to regexec() finds the first match on the line. */
              error = regexec (&re, &buffer[0], 1, &pm, 0);
              while (error == 0) {  /* While matches found. */
                  /* Substring found between pm.rm_so and pm.rm_eo. */
                  /* This call to regexec() finds the next match. */
                  error = regexec (&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
              }

APPLICATION USAGE

       An application could use:

              regerror(code,preg,(char *)NULL,(size_t)0)

       to  find  out  how big a buffer is needed for the generated string, malloc() a buffer to hold the string,
       and then call regerror() again to get the string. Alternatively, it could allocate a fixed, static buffer
       that  is  big  enough to hold most strings, and then use malloc() to allocate a larger buffer if it finds
       that this is too small.

       To match a pattern as described in the Shell and Utilities volume of IEEE Std 1003.1-2001, Section  2.13,
       Pattern Matching Notation, use the fnmatch() function.

RATIONALE

       The  regexec()  function must fill in all nmatch elements of pmatch, where nmatch and pmatch are supplied
       by the application, even if some elements of pmatch do not correspond to subexpressions in  pattern.  The
       application  writer  should  note  that  there  is probably no reason for using a value of nmatch that is
       larger than preg-> re_nsub+1.

       The REG_NEWLINE flag supports a use of RE matching that is needed in some applications like text editors.
       In  such  applications,  the  user  supplies an RE asking the application to find a line that matches the
       given expression. An anchor in such an RE  anchors  at  the  beginning  or  end  of  any  line.  Such  an
       application  can  pass  a  sequence of <newline>-separated lines to regexec() as a single long string and
       specify REG_NEWLINE to regcomp() to get the desired behavior. The application must ensure that there  are
       no  explicit  <newline>s  in pattern if it wants to ensure that any match occurs entirely within a single
       line.

       The REG_NEWLINE flag affects the behavior of regexec(), but it is in the cflags parameter to regcomp() to
       allow  flexibility  of implementation. Some implementations will want to generate the same compiled RE in
       regcomp() regardless of the setting of REG_NEWLINE and have regexec() handle anchors differently based on
       the  setting  of  the  flag.  Other  implementations  will  generate  different compiled REs based on the
       REG_NEWLINE.

       The REG_ICASE flag supports the operations taken by the grep -i option and the historical implementations
       of  ex  and vi.  Including this flag will make it easier for application code to be written that does the
       same thing as these utilities.

       The substrings reported in pmatch[] are defined using offsets from the start of the  string  rather  than
       pointers.  Since  this  is  a  new  interface, there should be no impact on historical implementations or
       applications, and offsets should be just as easy to use as pointers. The change to offsets  was  made  to
       facilitate  future  extensions  in  which  the string to be searched is presented to regexec() in blocks,
       allowing a string to be searched that is not all in memory at once.

       The type regoff_t is used for the elements of pmatch[] to  ensure  that  the  application  can  represent
       either  the  largest  possible  array in memory (important for an application conforming to the Shell and
       Utilities volume of IEEE Std 1003.1-2001) or the largest possible  file  (important  for  an  application
       using the extension where a file is searched in chunks).

       The  standard  developers  rejected  the  inclusion  of  a  regsub()  function  that  would be used to do
       substitutions for a matched RE. While such a routine would be useful to some  applications,  its  utility
       would  be  much  more limited than the matching function described here. Both RE parsing and substitution
       are possible to implement without support other than that required by the ISO C standard, but matching is
       much  more  complex  than  substituting.   The only difficult part of substitution, given the information
       supplied by regexec(), is finding the next character in a string when there can be multi-byte characters.
       That is a much larger issue, and one that needs a more general solution.

       The  errno  variable  has  not been used for error returns to avoid filling the errno name space for this
       feature.

       The interface is defined so that the matched substrings rm_sp and rm_ep  are  in  a  separate  regmatch_t
       structure  instead  of  in regex_t. This allows a single compiled RE to be used simultaneously in several
       contexts; in main() and a signal handler, perhaps, or in multiple threads of lightweight processes.  (The
       preg argument to regexec() is declared with type const, so the implementation is not permitted to use the
       structure to store intermediate results.) It also allows an application to request an arbitrary number of
       substrings  from an RE. The number of subexpressions in the RE is reported in re_nsub in preg.  With this
       change to regexec(), consideration was given to dropping the  REG_NOSUB  flag  since  the  user  can  now
       specify  this  with  a  zero  nmatch  argument  to  regexec().   However,  keeping  REG_NOSUB  allows  an
       implementation to use a different (perhaps more efficient) algorithm if it knows  in  regcomp()  that  no
       subexpressions  need  be reported. The implementation is only required to fill in pmatch if nmatch is not
       zero and if REG_NOSUB is not specified. Note that the size_t type, as defined in the ISO C  standard,  is
       unsigned, so the description of regexec() does not need to address negative values of nmatch.

       REG_NOTBOL  was  added to allow an application to do repeated searches for the same pattern in a line. If
       the pattern contains a circumflex character that should match the beginning of a line, then  the  pattern
       should  only  match  when  matched  against  the  beginning of the line. Without the REG_NOTBOL flag, the
       application could rewrite the expression for subsequent matches, but  in  the  general  case  this  would
       require parsing the expression. The need for REG_NOTEOL is not as clear; it was added for symmetry.

       The addition of the regerror() function addresses the historical need for conforming application programs
       to have access to error information more than "Function failed  to  compile/match  your  RE  for  unknown
       reasons".

       This  interface  provides  for two different methods of dealing with error conditions. The specific error
       codes (REG_EBRACE, for example), defined in <regex.h>, allow an application to recover from an  error  if
       it  is so able. Many applications, especially those that use patterns supplied by a user, will not try to
       deal with specific error cases, but will just use regerror() to obtain a human-readable error message  to
       present to the user.

       The  regerror() function uses a scheme similar to confstr() to deal with the problem of allocating memory
       to hold the generated string. The scheme  used  by  strerror()  in  the  ISO C  standard  was  considered
       unacceptable since it creates difficulties for multi-threaded applications.

       The  preg  argument  is  provided to regerror() to allow an implementation to generate a more descriptive
       message than would be possible with errcode  alone.  An  implementation  might,  for  example,  save  the
       character  offset  of the offending character of the pattern in a field of preg, and then include that in
       the generated message string. The implementation may also ignore preg.

       A REG_FILENAME flag was considered, but  omitted.  This  flag  caused  regexec()  to  match  patterns  as
       described  in  the  Shell  and  Utilities  volume of IEEE Std 1003.1-2001, Section 2.13, Pattern Matching
       Notation instead of REs. This service is now provided by the fnmatch() function.

       Notice  that  there  is  a  difference  in  philosophy  between   the   ISO POSIX-2:1993   standard   and
       IEEE Std 1003.1-2001 in how to handle a "bad" regular expression. The ISO POSIX-2:1993 standard says that
       many  bad  constructs  "produce  undefined  results",  or  that  "the   interpretation   is   undefined".
       IEEE Std 1003.1-2001,  however,  says  that  the  interpretation  of  such  REs  is unspecified. The term
       "undefined" means that the action by the application is an error, of similar severity to  passing  a  bad
       pointer to a function.

       The  regcomp()  and  regexec() functions are required to accept any null-terminated string as the pattern
       argument. If the meaning of the string is "undefined", the behavior of  the  function  is  "unspecified".
       IEEE Std 1003.1-2001  does  not  specify  how the functions will interpret the pattern; they might return
       error codes, or they might do pattern matching in some completely unexpected way, but they should not  do
       something like abort the process.

FUTURE DIRECTIONS

       None.

SEE ALSO

       fnmatch()  ,  glob() , Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.13, Pattern Matching
       Notation, Base Definitions volume of IEEE Std 1003.1-2001, Chapter  9,  Regular  Expressions,  <regex.h>,
       <sys/types.h>

COPYRIGHT

       Portions of this text are reprinted and reproduced in electronic form from IEEE Std 1003.1, 2003 Edition,
       Standard for Information Technology -- Portable Operating System Interface (POSIX), The Open  Group  Base
       Specifications Issue 6, Copyright (C) 2001-2003 by the Institute of Electrical and Electronics Engineers,
       Inc and The Open Group. In the event of any discrepancy between this version and the  original  IEEE  and
       The  Open  Group  Standard,  the  original  IEEE and The Open Group Standard is the referee document. The
       original Standard can be obtained online at http://www.opengroup.org/unix/online.html .