Provided by: libpcre3-dev_8.12-4_amd64 bug

NAME

       PCRE - Perl-compatible regular expressions

PCRE NATIVE API


       #include <pcre.h>

       pcre *pcre_compile(const char *pattern, int options,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       pcre *pcre_compile2(const char *pattern, int options,
            int *errorcodeptr,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       pcre_extra *pcre_study(const pcre *code, int options,
            const char **errptr);

       int pcre_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize);

       int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize,
            int *workspace, int wscount);

       int pcre_copy_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            char *buffer, int buffersize);

       int pcre_copy_substring(const char *subject, int *ovector,
            int stringcount, int stringnumber, char *buffer,
            int buffersize);

       int pcre_get_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            const char **stringptr);

       int pcre_get_stringnumber(const pcre *code,
            const char *name);

       int pcre_get_stringtable_entries(const pcre *code,
            const char *name, char **first, char **last);

       int pcre_get_substring(const char *subject, int *ovector,
            int stringcount, int stringnumber,
            const char **stringptr);

       int pcre_get_substring_list(const char *subject,
            int *ovector, int stringcount, const char ***listptr);

       void pcre_free_substring(const char *stringptr);

       void pcre_free_substring_list(const char **stringptr);

       const unsigned char *pcre_maketables(void);

       int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
            int what, void *where);

       int pcre_info(const pcre *code, int *optptr, int *firstcharptr);

       int pcre_refcount(pcre *code, int adjust);

       int pcre_config(int what, void *where);

       const char *pcre_version(void);

       void *(*pcre_malloc)(size_t);

       void (*pcre_free)(void *);

       void *(*pcre_stack_malloc)(size_t);

       void (*pcre_stack_free)(void *);

       int (*pcre_callout)(pcre_callout_block *);

PCRE API OVERVIEW


       PCRE  has  its  own  native  API, which is described in this document. There are also some
       wrapper functions that correspond to the POSIX regular expression API. These are described
       in the pcreposix documentation. Both of these APIs define a set of C function calls. A C++
       wrapper is distributed with PCRE. It is documented in the pcrecpp page.

       The native API C function prototypes are defined in the header file pcre.h,  and  on  Unix
       systems  the  library  itself  is  called  libpcre.  It can normally be accessed by adding
       -lpcre to the command for linking an application that uses PCRE. The header  file  defines
       the  macros  PCRE_MAJOR  and PCRE_MINOR to contain the major and minor release numbers for
       the library.  Applications can use these to include  support  for  different  releases  of
       PCRE.

       In  a Windows environment, if you want to statically link an application program against a
       non-dll pcre.a file, you must define PCRE_STATIC before  including  pcre.h  or  pcrecpp.h,
       because  otherwise  the  pcre_malloc() and pcre_free() exported functions will be declared
       __declspec(dllimport), with unwanted results.

       The functions pcre_compile(), pcre_compile2(), pcre_study(), and pcre_exec() are used  for
       compiling  and  matching regular expressions in a Perl-compatible manner. A sample program
       that demonstrates the simplest way of using them is provided in the file called pcredemo.c
       in  the  PCRE  source  distribution.  A  listing  of this program is given in the pcredemo
       documentation, and the pcresample documentation describes how to compile and run it.

       A second matching  function,  pcre_dfa_exec(),  which  is  not  Perl-compatible,  is  also
       provided.  This  uses  a  different  algorithm for the matching. The alternative algorithm
       finds all possible matches (at a given point in the subject), and scans the  subject  just
       once  (unless  there  are  lookbehind assertions). However, this algorithm does not return
       captured substrings. A description of the two matching algorithms and their advantages and
       disadvantages is given in the pcrematching documentation.

       In  addition to the main compiling and matching functions, there are convenience functions
       for extracting captured substrings from a subject string that is matched  by  pcre_exec().
       They are:

         pcre_copy_substring()
         pcre_copy_named_substring()
         pcre_get_substring()
         pcre_get_named_substring()
         pcre_get_substring_list()
         pcre_get_stringnumber()
         pcre_get_stringtable_entries()

       pcre_free_substring() and pcre_free_substring_list() are also provided, to free the memory
       used for extracted strings.

       The function pcre_maketables() is used to build a set of character tables in  the  current
       locale for passing to pcre_compile(), pcre_exec(), or pcre_dfa_exec(). This is an optional
       facility that is provided for specialist use. Most commonly, no special tables are passed,
       in which case internal tables that are generated when PCRE is built are used.

       The  function  pcre_fullinfo()  is  used to find out information about a compiled pattern;
       pcre_info() is an obsolete version that returns only some of  the  available  information,
       but  is  retained  for  backwards  compatibility.   The  function pcre_version() returns a
       pointer to a string containing the version of PCRE and its date of release.

       The function pcre_refcount() maintains a reference count in  a  data  block  containing  a
       compiled pattern. This is provided for the benefit of object-oriented applications.

       The  global  variables pcre_malloc and pcre_free initially contain the entry points of the
       standard malloc() and free() functions, respectively. PCRE  calls  the  memory  management
       functions  via  these  variables,  so  a  calling program can replace them if it wishes to
       intercept the calls. This should be done before calling any PCRE functions.

       The global variables pcre_stack_malloc and pcre_stack_free are also indirections to memory
       management  functions.  These special functions are used only when PCRE is compiled to use
       the heap for remembering data, instead of  recursive  function  calls,  when  running  the
       pcre_exec() function. See the pcrebuild documentation for details of how to do this. It is
       a non-standard way of building PCRE, for use in environments  that  have  limited  stacks.
       Because  of  the greater use of memory management, it runs more slowly. Separate functions
       are provided so that special-purpose external code can be used for this case.  When  used,
       these functions are always called in a stack-like manner (last obtained, first freed), and
       always for memory blocks of the same size. There is a discussion about PCRE's stack  usage
       in the pcrestack documentation.

       The global variable pcre_callout initially contains NULL. It can be set by the caller to a
       "callout" function, which PCRE will then  call  at  specified  points  during  a  matching
       operation. Details are given in the pcrecallout documentation.

NEWLINES


       PCRE  supports  five different conventions for indicating line breaks in strings: a single
       CR (carriage return) character,  a  single  LF  (linefeed)  character,  the  two-character
       sequence  CRLF,  any  of the three preceding, or any Unicode newline sequence. The Unicode
       newline sequences are the three just mentioned, plus the single  characters  VT  (vertical
       tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028),
       and PS (paragraph separator, U+2029).

       Each of the first three conventions is used by  at  least  one  operating  system  as  its
       standard  newline  sequence.  When PCRE is built, a default can be specified.  The default
       default is LF, which is  the  Unix  standard.  When  PCRE  is  run,  the  default  can  be
       overridden, either when a pattern is compiled, or when it is matched.

       At  compile  time,  the  newline  convention  can  be specified by the options argument of
       pcre_compile(), or it can be specified by special text at the start of the pattern itself;
       this  overrides  any  other  settings. See the pcrepattern page for details of the special
       character sequences.

       In the PCRE documentation the word "newline" is used to mean "the  character  or  pair  of
       characters  that  indicate  a  line  break".  The choice of newline convention affects the
       handling of the dot, circumflex, and dollar metacharacters, the handling of #-comments  in
       /x  mode,  and,  when  CRLF  is  a  recognized  line  ending  sequence, the match position
       advancement for a non-anchored pattern. There is more detail about this in the section  on
       pcre_exec() options below.

       The choice of newline convention does not affect the interpretation of the \n or \r escape
       sequences, nor does it affect what \R matches, which is controlled in a similar  way,  but
       by separate options.

MULTITHREADING


       The  PCRE functions can be used in multi-threading applications, with the proviso that the
       memory management functions pointed to by pcre_malloc, pcre_free,  pcre_stack_malloc,  and
       pcre_stack_free,  and  the  callout function pointed to by pcre_callout, are shared by all
       threads.

       The compiled form of a regular expression is not altered  during  matching,  so  the  same
       compiled pattern can safely be used by several threads at once.

SAVING PRECOMPILED PATTERNS FOR LATER USE


       The  compiled  form  of  a  regular  expression  can be saved and re-used at a later time,
       possibly by a different program, and even on a host other than the one  on  which  it  was
       compiled.  Details  are  given  in  the pcreprecompile documentation. However, compiling a
       regular expression with one version of PCRE for  use  with  a  different  version  is  not
       guaranteed to work and may cause crashes.

CHECKING BUILD-TIME OPTIONS


       int pcre_config(int what, void *where);

       The  function pcre_config() makes it possible for a PCRE client to discover which optional
       features have been compiled into the PCRE library. The pcrebuild  documentation  has  more
       details about these optional features.

       The  first  argument  for  pcre_config()  is  an  integer, specifying which information is
       required; the second argument is a pointer to a variable into  which  the  information  is
       placed. The following information is available:

         PCRE_CONFIG_UTF8

       The output is an integer that is set to one if UTF-8 support is available; otherwise it is
       set to zero.

         PCRE_CONFIG_UNICODE_PROPERTIES

       The output is an integer that is set to one if support for Unicode character properties is
       available; otherwise it is set to zero.

         PCRE_CONFIG_NEWLINE

       The  output  is  an  integer  whose value specifies the default character sequence that is
       recognized as meaning "newline". The four values that are supported are: 10 for LF, 13 for
       CR,  3338  for  CRLF, -2 for ANYCRLF, and -1 for ANY.  Though they are derived from ASCII,
       the same  values  are  returned  in  EBCDIC  environments.  The  default  should  normally
       correspond to the standard sequence for your operating system.

         PCRE_CONFIG_BSR

       The  output  is  an  integer  whose value indicates what character sequences the \R escape
       sequence matches by default. A value of 0 means that \R matches any  Unicode  line  ending
       sequence;  a  value  of  1  means that \R matches only CR, LF, or CRLF. The default can be
       overridden when a pattern is compiled or matched.

         PCRE_CONFIG_LINK_SIZE

       The output is an integer that contains the number of bytes used for  internal  linkage  in
       compiled  regular expressions. The value is 2, 3, or 4. Larger values allow larger regular
       expressions to be compiled, at the expense of slower matching. The default value of  2  is
       sufficient  for all but the most massive patterns, since it allows the compiled pattern to
       be up to 64K in size.

         PCRE_CONFIG_POSIX_MALLOC_THRESHOLD

       The output is an integer that contains the threshold above which the POSIX interface  uses
       malloc() for output vectors. Further details are given in the pcreposix documentation.

         PCRE_CONFIG_MATCH_LIMIT

       The  output  is  a  long  integer  that gives the default limit for the number of internal
       matching function calls in  a  pcre_exec()  execution.  Further  details  are  given  with
       pcre_exec() below.

         PCRE_CONFIG_MATCH_LIMIT_RECURSION

       The  output is a long integer that gives the default limit for the depth of recursion when
       calling the internal matching function in a pcre_exec()  execution.  Further  details  are
       given with pcre_exec() below.

         PCRE_CONFIG_STACKRECURSE

       The output is an integer that is set to one if internal recursion when running pcre_exec()
       is implemented by recursive function calls that use the stack  to  remember  their  state.
       This  is  the  usual way that PCRE is compiled. The output is zero if PCRE was compiled to
       use blocks of data on the  heap  instead  of  recursive  function  calls.  In  this  case,
       pcre_stack_malloc and pcre_stack_free are called to manage memory blocks on the heap, thus
       avoiding the use of the stack.

COMPILING A PATTERN


       pcre *pcre_compile(const char *pattern, int options,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       pcre *pcre_compile2(const char *pattern, int options,
            int *errorcodeptr,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       Either of the functions pcre_compile() or pcre_compile2()  can  be  called  to  compile  a
       pattern  into  an  internal  form.  The only difference between the two interfaces is that
       pcre_compile2() has an additional argument, errorcodeptr, via which a numerical error code
       can  be returned. To avoid too much repetition, we refer just to pcre_compile() below, but
       the information applies equally to pcre_compile2().

       The pattern is a C string terminated by a binary  zero,  and  is  passed  in  the  pattern
       argument.  A  pointer  to  a  single  block  of memory that is obtained via pcre_malloc is
       returned. This contains the compiled code and related data. The pcre type is  defined  for
       the  returned  block;  this is a typedef for a structure whose contents are not externally
       defined. It is up to the caller to free the memory (via pcre_free) when it  is  no  longer
       required.

       Although  the compiled code of a PCRE regex is relocatable, that is, it does not depend on
       memory location, the complete pcre data block is not fully  relocatable,  because  it  may
       contain a copy of the tableptr argument, which is an address (see below).

       The  options argument contains various bit settings that affect the compilation. It should
       be zero if no options are required. The available options are  described  below.  Some  of
       them  (in  particular,  those  that are compatible with Perl, but some others as well) can
       also be set and unset from within  the  pattern  (see  the  detailed  description  in  the
       pcrepattern  documentation). For those options that can be different in different parts of
       the pattern, the contents of the options argument specifies their settings at the start of
       compilation    and   execution.   The   PCRE_ANCHORED,   PCRE_BSR_xxx,   PCRE_NEWLINE_xxx,
       PCRE_NO_UTF8_CHECK, and PCRE_NO_START_OPT options can be set at the time  of  matching  as
       well as at compile time.

       If  errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise, if compilation of
       a pattern fails, pcre_compile() returns NULL, and sets the variable pointed to  by  errptr
       to  point to a textual error message. This is a static string that is part of the library.
       You must not try to free it. The offset from the start of the pattern to the byte that was
       being  processed  when  the  error  was discovered is placed in the variable pointed to by
       erroffset, which must not be NULL. If it is, an immediate error is given. Some errors  are
       not detected until checks are carried out when the whole pattern has been scanned; in this
       case the offset is set to the end of the pattern.

       Note that the offset is in bytes, not characters, even in UTF-8 mode. It  may  point  into
       the  middle  of a UTF-8 character (for example, when PCRE_ERROR_BADUTF8 is returned for an
       invalid UTF-8 string).

       If pcre_compile2() is used instead of pcre_compile(), and the errorcodeptr argument is not
       NULL, a non-zero error code number is returned via this argument in the event of an error.
       This is in addition to the textual error message. Error  codes  and  messages  are  listed
       below.

       If the final argument, tableptr, is NULL, PCRE uses a default set of character tables that
       are built when PCRE is compiled, using the default C locale. Otherwise, tableptr  must  be
       an  address  that  is the result of a call to pcre_maketables(). This value is stored with
       the compiled pattern, and used again by  pcre_exec(),  unless  another  table  pointer  is
       passed to it. For more discussion, see the section on locale support below.

       This code fragment shows a typical straightforward call to pcre_compile():

         pcre *re;
         const char *error;
         int erroffset;
         re = pcre_compile(
           "^A.*Z",          /* the pattern */
           0,                /* default options */
           &error,           /* for error message */
           &erroffset,       /* for error offset */
           NULL);            /* use default character tables */

       The following names for option bits are defined in the pcre.h header file:

         PCRE_ANCHORED

       If  this bit is set, the pattern is forced to be "anchored", that is, it is constrained to
       match only at the first matching point in the string that is being searched (the  "subject
       string").  This  effect  can  also  be  achieved  by appropriate constructs in the pattern
       itself, which is the only way to do it in Perl.

         PCRE_AUTO_CALLOUT

       If this bit is set, pcre_compile() automatically inserts callout items,  all  with  number
       255, before each pattern item. For discussion of the callout facility, see the pcrecallout
       documentation.

         PCRE_BSR_ANYCRLF
         PCRE_BSR_UNICODE

       These options (which are mutually exclusive) control what the \R escape sequence  matches.
       The  choice  is  either  to  match  only  CR, LF, or CRLF, or to match any Unicode newline
       sequence. The default is specified when PCRE is built. It can be  overridden  from  within
       the pattern, or by setting an option when a compiled pattern is matched.

         PCRE_CASELESS

       If  this bit is set, letters in the pattern match both upper and lower case letters. It is
       equivalent to Perl's /i option, and it can be changed within a pattern by  a  (?i)  option
       setting.  In  UTF-8 mode, PCRE always understands the concept of case for characters whose
       values are less than 128, so caseless matching is always  possible.  For  characters  with
       higher  values, the concept of case is supported if PCRE is compiled with Unicode property
       support, but not otherwise. If you want to use caseless matching for  characters  128  and
       above, you must ensure that PCRE is compiled with Unicode property support as well as with
       UTF-8 support.

         PCRE_DOLLAR_ENDONLY

       If this bit is set, a dollar metacharacter in the pattern matches only at the end  of  the
       subject string. Without this option, a dollar also matches immediately before a newline at
       the end of the string (but not before any other newlines). The PCRE_DOLLAR_ENDONLY  option
       is  ignored  if PCRE_MULTILINE is set.  There is no equivalent to this option in Perl, and
       no way to set it within a pattern.

         PCRE_DOTALL

       If this bit is set, a dot metacharacter in the pattern matches a character of  any  value,
       including  one that indicates a newline. However, it only ever matches one character, even
       if newlines are coded as CRLF. Without this option, a dot does not match when the  current
       position  is  at  a  newline. This option is equivalent to Perl's /s option, and it can be
       changed within a pattern by a (?s) option setting. A negative class such  as  [^a]  always
       matches newline characters, independent of the setting of this option.

         PCRE_DUPNAMES

       If  this bit is set, names used to identify capturing subpatterns need not be unique. This
       can be helpful for certain types of pattern when it is known that only one instance of the
       named  subpattern  can ever be matched. There are more details of named subpatterns below;
       see also the pcrepattern documentation.

         PCRE_EXTENDED

       If this bit is set, whitespace data characters in the pattern are totally  ignored  except
       when  escaped  or  inside  a character class. Whitespace does not include the VT character
       (code 11). In addition, characters between an unescaped # outside a  character  class  and
       the next newline, inclusive, are also ignored. This is equivalent to Perl's /x option, and
       it can be changed within a pattern by a (?x) option setting.

       Which characters are interpreted as newlines  is  controlled  by  the  options  passed  to
       pcre_compile()  or  by a special sequence at the start of the pattern, as described in the
       section entitled "Newline conventions" in the pcrepattern documentation. Note that the end
       of  this  type  of  comment is a literal newline sequence in the pattern; escape sequences
       that happen to represent a newline do not count.

       This option makes it possible to include  comments  inside  complicated  patterns.   Note,
       however, that this applies only to data characters. Whitespace characters may never appear
       within special character sequences in a pattern, for example within the sequence (?(  that
       introduces a conditional subpattern.

         PCRE_EXTRA

       This  option  was  invented  in  order to turn on additional functionality of PCRE that is
       incompatible with Perl, but it is currently of very little use. When set, any backslash in
       a  pattern  that is followed by a letter that has no special meaning causes an error, thus
       reserving these combinations for future expansion. By default, as  in  Perl,  a  backslash
       followed  by a letter with no special meaning is treated as a literal. (Perl can, however,
       be persuaded to give an error for this, by running it with the -w option.)  There  are  at
       present  no  other features controlled by this option. It can also be set by a (?X) option
       setting within a pattern.

         PCRE_FIRSTLINE

       If this option is set, an unanchored pattern is required to match before or at  the  first
       newline in the subject string, though the matched text may continue over the newline.

         PCRE_JAVASCRIPT_COMPAT

       If  this  option is set, PCRE's behaviour is changed in some ways so that it is compatible
       with JavaScript rather than Perl. The changes are as follows:

       (1) A lone closing square bracket in a pattern causes a compile-time error,  because  this
       is illegal in JavaScript (by default it is treated as a data character). Thus, the pattern
       AB]CD becomes illegal when this option is set.

       (2) At run time, a back reference to an unset subpattern group matches an empty string (by
       default  this  causes the current matching alternative to fail). A pattern such as (\1)(a)
       succeeds when this option is set (assuming it can find an "a" in the subject), whereas  it
       fails by default, for Perl compatibility.

         PCRE_MULTILINE

       By  default,  PCRE  treats the subject string as consisting of a single line of characters
       (even if it actually contains newlines). The "start of  line"  metacharacter  (^)  matches
       only at the start of the string, while the "end of line" metacharacter ($) matches only at
       the end of the string, or before a  terminating  newline  (unless  PCRE_DOLLAR_ENDONLY  is
       set). This is the same as Perl.

       When  PCRE_MULTILINE  it  is  set,  the "start of line" and "end of line" constructs match
       immediately following or immediately before  internal  newlines  in  the  subject  string,
       respectively,  as  well  as  at  the  very  start and end. This is equivalent to Perl's /m
       option, and it can be changed within a pattern by a (?m) option setting. If there  are  no
       newlines  in  a  subject  string,  or  no  occurrences  of  ^  or  $ in a pattern, setting
       PCRE_MULTILINE has no effect.

         PCRE_NEWLINE_CR
         PCRE_NEWLINE_LF
         PCRE_NEWLINE_CRLF
         PCRE_NEWLINE_ANYCRLF
         PCRE_NEWLINE_ANY

       These options override the default newline definition that was chosen when PCRE was built.
       Setting  the  first  or  the  second  specifies  that  a  newline is indicated by a single
       character (CR or LF, respectively). Setting PCRE_NEWLINE_CRLF specifies that a newline  is
       indicated  by the two-character CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies that
       any of the three  preceding  sequences  should  be  recognized.  Setting  PCRE_NEWLINE_ANY
       specifies  that  any  Unicode  newline  sequence should be recognized. The Unicode newline
       sequences are the three just mentioned, plus  the  single  characters  VT  (vertical  tab,
       U+000B),  FF (formfeed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and
       PS (paragraph separator, U+2029). The last two are recognized only in UTF-8 mode.

       The newline setting in the options word uses three bits that  are  treated  as  a  number,
       giving  eight  possibilities.  Currently  only  six are used (default plus the five values
       above). This means that if you set more than one newline option, the  combination  may  or
       may  not  be  sensible. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to
       PCRE_NEWLINE_CRLF, but other combinations may yield unused numbers and cause an error.

       The only time that a line break in a pattern is specially  recognized  when  compiling  is
       when PCRE_EXTENDED is set. CR and LF are whitespace characters, and so are ignored in this
       mode. Also, an unescaped # outside a character class indicates a comment that lasts  until
       after  the  next  line  break  sequence.  In  other circumstances, line break sequences in
       patterns are treated as literal data.

       The newline option that is set at compile time  becomes  the  default  that  is  used  for
       pcre_exec() and pcre_dfa_exec(), but it can be overridden.

         PCRE_NO_AUTO_CAPTURE

       If  this  option  is  set,  it  disables  the use of numbered capturing parentheses in the
       pattern. Any opening parenthesis that is not followed by ? behaves as if it were  followed
       by  ?:  but named parentheses can still be used for capturing (and they acquire numbers in
       the usual way). There is no equivalent of this option in Perl.

         NO_START_OPTIMIZE

       This is an option that acts at matching  time;  that  is,  it  is  really  an  option  for
       pcre_exec()  or  pcre_dfa_exec().  If it is set at compile time, it is remembered with the
       compiled pattern and  assumed  at  matching  time.  For  details  see  the  discussion  of
       PCRE_NO_START_OPTIMIZE below.

         PCRE_UCP

       This option changes the way PCRE processes \B, \b, \D, \d, \S, \s, \W, \w, and some of the
       POSIX character classes. By default, only ASCII characters are recognized, but if PCRE_UCP
       is set, Unicode properties are used instead to classify characters. More details are given
       in the section on generic character types in the pcrepattern page. If  you  set  PCRE_UCP,
       matching  one  of  the items it affects takes much longer. The option is available only if
       PCRE has been compiled with Unicode property support.

         PCRE_UNGREEDY

       This option inverts the "greediness" of the quantifiers so that they  are  not  greedy  by
       default, but become greedy if followed by "?". It is not compatible with Perl. It can also
       be set by a (?U) option setting within the pattern.

         PCRE_UTF8

       This option causes PCRE to regard both the pattern and the subject  as  strings  of  UTF-8
       characters  instead  of  single-byte character strings. However, it is available only when
       PCRE is built to include UTF-8 support. If not, the use of this option provokes an  error.
       Details of how this option changes the behaviour of PCRE are given in the section on UTF-8
       support in the main pcre page.

         PCRE_NO_UTF8_CHECK

       When PCRE_UTF8 is set, the validity of the pattern as  a  UTF-8  string  is  automatically
       checked.  There is a discussion about the validity of UTF-8 strings in the main pcre page.
       If an invalid UTF-8 sequence of bytes is found, pcre_compile() returns an  error.  If  you
       already  know  that your pattern is valid, and you want to skip this check for performance
       reasons, you can set the PCRE_NO_UTF8_CHECK option. When it is set, the effect of  passing
       an  invalid  UTF-8  string  as a pattern is undefined. It may cause your program to crash.
       Note that this option can also be passed to pcre_exec() and pcre_dfa_exec(),  to  suppress
       the UTF-8 validity checking of subject strings.

COMPILATION ERROR CODES


       The  following  table lists the error codes than may be returned by pcre_compile2(), along
       with the error messages that may be returned by both  compiling  functions.  As  PCRE  has
       developed, some error codes have fallen out of use. To avoid confusion, they have not been
       re-used.

          0  no error
          1  \ at end of pattern
          2  \c at end of pattern
          3  unrecognized character follows \
          4  numbers out of order in {} quantifier
          5  number too big in {} quantifier
          6  missing terminating ] for character class
          7  invalid escape sequence in character class
          8  range out of order in character class
          9  nothing to repeat
         10  [this code is not in use]
         11  internal error: unexpected repeat
         12  unrecognized character after (? or (?-
         13  POSIX named classes are supported only within a class
         14  missing )
         15  reference to non-existent subpattern
         16  erroffset passed as NULL
         17  unknown option bit(s) set
         18  missing ) after comment
         19  [this code is not in use]
         20  regular expression is too large
         21  failed to get memory
         22  unmatched parentheses
         23  internal error: code overflow
         24  unrecognized character after (?<
         25  lookbehind assertion is not fixed length
         26  malformed number or name after (?(
         27  conditional group contains more than two branches
         28  assertion expected after (?(
         29  (?R or (?[+-]digits must be followed by )
         30  unknown POSIX class name
         31  POSIX collating elements are not supported
         32  this version of PCRE is not compiled with PCRE_UTF8 support
         33  [this code is not in use]
         34  character value in \x{...} sequence is too large
         35  invalid condition (?(0)
         36  \C not allowed in lookbehind assertion
         37  PCRE does not support \L, \l, \N, \U, or \u
         38  number after (?C is > 255
         39  closing ) for (?C expected
         40  recursive call could loop indefinitely
         41  unrecognized character after (?P
         42  syntax error in subpattern name (missing terminator)
         43  two named subpatterns have the same name
         44  invalid UTF-8 string
         45  support for \P, \p, and \X has not been compiled
         46  malformed \P or \p sequence
         47  unknown property name after \P or \p
         48  subpattern name is too long (maximum 32 characters)
         49  too many named subpatterns (maximum 10000)
         50  [this code is not in use]
         51  octal value is greater than \377 (not in UTF-8 mode)
         52  internal error: overran compiling workspace
         53  internal error: previously-checked referenced subpattern
               not found
         54  DEFINE group contains more than one branch
         55  repeating a DEFINE group is not allowed
         56  inconsistent NEWLINE options
         57  \g is not followed by a braced, angle-bracketed, or quoted
               name/number or by a plain number
         58  a numbered reference must not be zero
         59  an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)
         60  (*VERB) not recognized
         61  number is too big
         62  subpattern name expected
         63  digit expected after (?+
         64  ] is an invalid data character in JavaScript compatibility mode
         65  different names for subpatterns of the same number are
               not allowed
         66  (*MARK) must have an argument
         67  this version of PCRE is not compiled with PCRE_UCP support

       The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may be used if
       the limits were changed when PCRE was built.

STUDYING A PATTERN


       pcre_extra *pcre_study(const pcre *code, int options
            const char **errptr);

       If  a  compiled  pattern is going to be used several times, it is worth spending more time
       analyzing it in order to speed up the time taken for matching. The  function  pcre_study()
       takes  a  pointer  to  a  compiled  pattern as its first argument. If studying the pattern
       produces additional information that will help speed up matching, pcre_study()  returns  a
       pointer  to a pcre_extra block, in which the study_data field points to the results of the
       study.

       The  returned  value  from  pcre_study()  can  be  passed  directly  to   pcre_exec()   or
       pcre_dfa_exec(). However, a pcre_extra block also contains other fields that can be set by
       the caller before the block is passed;  these  are  described  below  in  the  section  on
       matching a pattern.

       If  studying  the  pattern  does  not produce any useful information, pcre_study() returns
       NULL. In that circumstance, if the calling program wants to pass any of the  other  fields
       to pcre_exec() or pcre_dfa_exec(), it must set up its own pcre_extra block.

       The  second  argument  of  pcre_study()  contains  option bits. At present, no options are
       defined, and this argument should always be zero.

       The third argument for pcre_study() is  a  pointer  for  an  error  message.  If  studying
       succeeds  (even  if  no  data  is  returned),  the  variable  it points to is set to NULL.
       Otherwise it is set to point to a textual error message. This is a static string  that  is
       part  of  the  library. You must not try to free it. You should test the error pointer for
       NULL after calling pcre_study(), to be sure that it has run successfully.

       This is a typical call to pcre_study():

         pcre_extra *pe;
         pe = pcre_study(
           re,             /* result of pcre_compile() */
           0,              /* no options exist */
           &error);        /* set to NULL or points to a message */

       Studying a pattern does two things: first, a lower bound for the length of subject  string
       that  is  needed  to  match the pattern is computed. This does not mean that there are any
       strings of that length that match, but it does guarantee that no  shorter  strings  match.
       The  value  is  used by pcre_exec() and pcre_dfa_exec() to avoid wasting time by trying to
       match strings that are shorter than the lower bound. You can  find  out  the  value  in  a
       calling program via the pcre_fullinfo() function.

       Studying  a  pattern  is  also  useful for non-anchored patterns that do not have a single
       fixed starting character. A bitmap of possible starting bytes is created. This  speeds  up
       finding a position in the subject at which to start matching.

       The two optimizations just described can be disabled by setting the PCRE_NO_START_OPTIMIZE
       option when calling pcre_exec() or pcre_dfa_exec(). You might want  to  do  this  if  your
       pattern  contains  callouts  or  (*MARK),  and you want to make use of these facilities in
       cases where matching fails. See the discussion of PCRE_NO_START_OPTIMIZE below.

LOCALE SUPPORT


       PCRE handles caseless matching, and determines whether characters are letters, digits,  or
       whatever,  by  reference  to  a set of tables, indexed by character value. When running in
       UTF-8 mode, this applies only to characters with codes less than 128. By default,  higher-
       valued  codes never match escapes such as \w or \d, but they can be tested with \p if PCRE
       is built with Unicode character property support. Alternatively, the PCRE_UCP  option  can
       be set at compile time; this causes \w and friends to use Unicode property support instead
       of built-in tables. The use of locales with Unicode is discouraged. If  you  are  handling
       characters  with  codes  greater than 128, you should either use UTF-8 and Unicode, or use
       locales, but not try to mix the two.

       PCRE contains an internal set  of  tables  that  are  used  when  the  final  argument  of
       pcre_compile()  is  NULL.  These  are  sufficient  for  many  applications.  Normally, the
       internal tables recognize only ASCII characters.  However,  when  PCRE  is  built,  it  is
       possible to cause the internal tables to be rebuilt in the default "C" locale of the local
       system, which may cause them to be different.

       The internal tables can always be overridden by tables supplied by  the  application  that
       calls  PCRE. These may be created in a different locale from the default. As more and more
       applications change to using Unicode, the need for this locale support is expected to  die
       away.

       External  tables  are  built  by  calling  the  pcre_maketables()  function,  which has no
       arguments, in the relevant locale. The result can then  be  passed  to  pcre_compile()  or
       pcre_exec()  as  often  as  necessary.  For  example,  to  build  and  use tables that are
       appropriate for the French locale (where accented characters with values greater than  128
       are treated as letters), the following code could be used:

         setlocale(LC_CTYPE, "fr_FR");
         tables = pcre_maketables();
         re = pcre_compile(..., tables);

       The  locale  name  "fr_FR"  is used on Linux and other Unix-like systems; if you are using
       Windows, the name for the French locale is "french".

       When pcre_maketables() runs,  the  tables  are  built  in  memory  that  is  obtained  via
       pcre_malloc.  It  is  the caller's responsibility to ensure that the memory containing the
       tables remains available for as long as it is needed.

       The pointer that is passed to pcre_compile() is saved with the compiled pattern,  and  the
       same  tables  are  used via this pointer by pcre_study() and normally also by pcre_exec().
       Thus, by default, for any single pattern, compilation, studying and matching all happen in
       the same locale, but different patterns can be compiled in different locales.

       It is possible to pass a table pointer or NULL (indicating the use of the internal tables)
       to pcre_exec(). Although not intended for this purpose, this facility  could  be  used  to
       match a pattern in a different locale from the one in which it was compiled. Passing table
       pointers at run time is discussed below in the section on matching a pattern.

INFORMATION ABOUT A PATTERN


       int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
            int what, void *where);

       The pcre_fullinfo() function returns information about a compiled pattern. It replaces the
       obsolete  pcre_info()  function,  which is nevertheless retained for backwards compability
       (and is documented below).

       The first argument for pcre_fullinfo() is a pointer to the compiled  pattern.  The  second
       argument  is the result of pcre_study(), or NULL if the pattern was not studied. The third
       argument specifies which piece of information is required, and the fourth  argument  is  a
       pointer  to a variable to receive the data. The yield of the function is zero for success,
       or one of the following negative numbers:

         PCRE_ERROR_NULL       the argument code was NULL
                               the argument where was NULL
         PCRE_ERROR_BADMAGIC   the "magic number" was not found
         PCRE_ERROR_BADOPTION  the value of what was invalid

       The "magic number" is placed at the start of each compiled  pattern  as  an  simple  check
       against passing an arbitrary memory pointer. Here is a typical call of pcre_fullinfo(), to
       obtain the length of the compiled pattern:

         int rc;
         size_t length;
         rc = pcre_fullinfo(
           re,               /* result of pcre_compile() */
           pe,               /* result of pcre_study(), or NULL */
           PCRE_INFO_SIZE,   /* what is required */
           &length);         /* where to put the data */

       The possible values for the third argument are defined in pcre.h, and are as follows:

         PCRE_INFO_BACKREFMAX

       Return the number of the highest back reference in the pattern. The fourth argument should
       point to an int variable. Zero is returned if there are no back references.

         PCRE_INFO_CAPTURECOUNT

       Return  the  number  of  capturing  subpatterns in the pattern. The fourth argument should
       point to an int variable.

         PCRE_INFO_DEFAULT_TABLES

       Return a pointer to the internal default character tables within PCRE. The fourth argument
       should  point  to  an  unsigned  char  *  variable.  This information call is provided for
       internal use by the pcre_study() function. External callers can  cause  PCRE  to  use  its
       internal tables by passing a NULL table pointer.

         PCRE_INFO_FIRSTBYTE

       Return information about the first byte of any matched string, for a non-anchored pattern.
       The fourth argument should point to an int  variable.  (This  option  used  to  be  called
       PCRE_INFO_FIRSTCHAR; the old name is still recognized for backwards compatibility.)

       If  there is a fixed first byte, for example, from a pattern such as (cat|cow|coyote), its
       value is returned. Otherwise, if either

       (a) the pattern was compiled with the PCRE_MULTILINE option, and every branch starts  with
       "^", or

       (b)  every  branch  of the pattern starts with ".*" and PCRE_DOTALL is not set (if it were
       set, the pattern would be anchored),

       -1 is returned, indicating that the pattern matches only at the start of a subject  string
       or  after  any newline within the string. Otherwise -2 is returned. For anchored patterns,
       -2 is returned.

         PCRE_INFO_FIRSTTABLE

       If the pattern was studied, and this resulted in  the  construction  of  a  256-bit  table
       indicating  a  fixed  set of bytes for the first byte in any matching string, a pointer to
       the table is returned. Otherwise NULL is returned. The fourth argument should point to  an
       unsigned char * variable.

         PCRE_INFO_HASCRORLF

       Return  1  if the pattern contains any explicit matches for CR or LF characters, otherwise
       0. The fourth argument should point to an int variable. An  explicit  match  is  either  a
       literal CR or LF character, or \r or \n.

         PCRE_INFO_JCHANGED

       Return  1  if  the  (?J)  or (?-J) option setting is used in the pattern, otherwise 0. The
       fourth argument should point to an int variable. (?J) and (?-J) set and  unset  the  local
       PCRE_DUPNAMES option, respectively.

         PCRE_INFO_LASTLITERAL

       Return  the  value  of  the  rightmost literal byte that must exist in any matched string,
       other than at its start, if such a byte has been  recorded.  The  fourth  argument  should
       point to an int variable. If there is no such byte, -1 is returned. For anchored patterns,
       a last literal byte is recorded only if it  follows  something  of  variable  length.  For
       example,  for  the  pattern  /^a\d+z\d+/  the returned value is "z", but for /^a\dz\d/ the
       returned value is -1.

         PCRE_INFO_MINLENGTH

       If the pattern was studied and a minimum length for matching subject strings was computed,
       its  value  is  returned.  Otherwise  the  returned  value is -1. The value is a number of
       characters, not bytes (this may be relevant in UTF-8 mode).  The  fourth  argument  should
       point  to  an  int  variable.  A  non-negative value is a lower bound to the length of any
       matching string. There may not be any strings of that length that do actually  match,  but
       every string that does match is at least that long.

         PCRE_INFO_NAMECOUNT
         PCRE_INFO_NAMEENTRYSIZE
         PCRE_INFO_NAMETABLE

       PCRE  supports  the  use of named as well as numbered capturing parentheses. The names are
       just an additional way of  identifying  the  parentheses,  which  still  acquire  numbers.
       Several   convenience  functions  such  as  pcre_get_named_substring()  are  provided  for
       extracting captured substrings by name. It is also possible to extract the data  directly,
       by  first  converting  the name to a number in order to access the correct pointers in the
       output vector (described with pcre_exec() below). To do the conversion, you  need  to  use
       the name-to-number map, which is described by these three values.

       The  map  consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT gives the number
       of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size of each entry; both of these return
       an   int   value.   The   entry   size   depends  on  the  length  of  the  longest  name.
       PCRE_INFO_NAMETABLE returns a pointer to the first entry of the table (a pointer to char).
       The  first  two  bytes  of  each  entry  are the number of the capturing parenthesis, most
       significant byte first. The rest of the entry is the corresponding name, zero terminated.

       The names are in alphabetical order. Duplicate names may appear if (?| is used  to  create
       multiple  groups with the same number, as described in the section on duplicate subpattern
       numbers in the pcrepattern page. Duplicate names for subpatterns  with  different  numbers
       are  permitted  only if PCRE_DUPNAMES is set. In all cases of duplicate names, they appear
       in the table in the order in which they were found in the pattern. In the absence  of  (?|
       this  is the order of increasing number; when (?| is used this is not necessarily the case
       because later subpatterns may have lower numbers.

       As a simple example of the name/number  table,  consider  the  following  pattern  (assume
       PCRE_EXTENDED is set, so white space - including newlines - is ignored):

         (?<date> (?<year>(\d\d)?\d\d) -
         (?<month>\d\d) - (?<day>\d\d) )

       There  are  four  named  subpatterns, so the table has four entries, and each entry in the
       table is eight bytes long. The table is as  follows,  with  non-printing  bytes  shows  in
       hexadecimal, and undefined bytes shown as ??:

         00 01 d  a  t  e  00 ??
         00 05 d  a  y  00 ?? ??
         00 04 m  o  n  t  h  00
         00 02 y  e  a  r  00 ??

       When  writing  code  to  extract data from named subpatterns using the name-to-number map,
       remember that the length of the entries is  likely  to  be  different  for  each  compiled
       pattern.

         PCRE_INFO_OKPARTIAL

       Return  1  if  the pattern can be used for partial matching with pcre_exec(), otherwise 0.
       The fourth argument should point to an  int  variable.  From  release  8.00,  this  always
       returns  1, because the restrictions that previously applied to partial matching have been
       lifted. The pcrepartial documentation gives details of partial matching.

         PCRE_INFO_OPTIONS

       Return a copy of the options with which the pattern  was  compiled.  The  fourth  argument
       should  point  to  an unsigned long int variable. These option bits are those specified in
       the call to pcre_compile(), modified by any top-level option settings at the start of  the
       pattern  itself.  In other words, they are the options that will be in force when matching
       starts. For example, if the pattern /(?im)abc(?-i)d/ is compiled  with  the  PCRE_EXTENDED
       option, the result is PCRE_CASELESS, PCRE_MULTILINE, and PCRE_EXTENDED.

       A  pattern  is  automatically  anchored by PCRE if all of its top-level alternatives begin
       with one of the following:

         ^     unless PCRE_MULTILINE is set
         \A    always
         \G    always
         .*    if PCRE_DOTALL is set and there are no back
                 references to the subpattern in which .* appears

       For  such  patterns,  the  PCRE_ANCHORED  bit  is  set  in   the   options   returned   by
       pcre_fullinfo().

         PCRE_INFO_SIZE

       Return  the  size  of  the  compiled  pattern,  that  is, the value that was passed as the
       argument to pcre_malloc() when PCRE was getting memory in  which  to  place  the  compiled
       data. The fourth argument should point to a size_t variable.

         PCRE_INFO_STUDYSIZE

       Return  the  size  of  the  data  block pointed to by the study_data field in a pcre_extra
       block. That is, it is the value that was passed to pcre_malloc()  when  PCRE  was  getting
       memory  into  which  to  place the data created by pcre_study(). If pcre_extra is NULL, or
       there is no study data, zero is returned. The fourth argument should  point  to  a  size_t
       variable.

OBSOLETE INFO FUNCTION


       int pcre_info(const pcre *code, int *optptr, int *firstcharptr);

       The  pcre_info()  function  is  now  obsolete  because its interface is too restrictive to
       return all  the  available  data  about  a  compiled  pattern.  New  programs  should  use
       pcre_fullinfo()  instead. The yield of pcre_info() is the number of capturing subpatterns,
       or one of the following negative numbers:

         PCRE_ERROR_NULL       the argument code was NULL
         PCRE_ERROR_BADMAGIC   the "magic number" was not found

       If the optptr argument is not NULL, a copy of the  options  with  which  the  pattern  was
       compiled is placed in the integer it points to (see PCRE_INFO_OPTIONS above).

       If  the  pattern  is not anchored and the firstcharptr argument is not NULL, it is used to
       pass  back  information  about  the  first  character   of   any   matched   string   (see
       PCRE_INFO_FIRSTBYTE above).

REFERENCE COUNTS


       int pcre_refcount(pcre *code, int adjust);

       The  pcre_refcount() function is used to maintain a reference count in the data block that
       contains a compiled pattern. It is provided for the benefit of applications  that  operate
       in  an  object-oriented  manner, where different parts of the application may be using the
       same compiled pattern, but you want to free the block when they are all done.

       When a pattern is compiled, the reference count field  is  initialized  to  zero.   It  is
       changed  only by calling this function, whose action is to add the adjust value (which may
       be positive or negative) to it. The yield of the function is the new value.  However,  the
       value  of the count is constrained to lie between 0 and 65535, inclusive. If the new value
       is outside these limits, it is forced to the appropriate limit value.

       Except when it is zero, the reference count is not correctly preserved  if  a  pattern  is
       compiled  on  one host and then transferred to a host whose byte-order is different. (This
       seems a highly unlikely scenario.)

MATCHING A PATTERN: THE TRADITIONAL FUNCTION


       int pcre_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize);

       The function pcre_exec() is called to match a subject string against a  compiled  pattern,
       which  is passed in the code argument. If the pattern was studied, the result of the study
       should be passed in the extra argument. This function is the main matching facility of the
       library,  and  it  operates  in  a  Perl-like  manner. For specialist use there is also an
       alternative matching  function,  which  is  described  below  in  the  section  about  the
       pcre_dfa_exec() function.

       In  most applications, the pattern will have been compiled (and optionally studied) in the
       same process that calls pcre_exec(). However, it is possible to save compiled patterns and
       study  data,  and  then  use them later in different processes, possibly even on different
       hosts. For a discussion about this, see the pcreprecompile documentation.

       Here is an example of a simple call to pcre_exec():

         int rc;
         int ovector[30];
         rc = pcre_exec(
           re,             /* result of pcre_compile() */
           NULL,           /* we didn't study the pattern */
           "some string",  /* the subject string */
           11,             /* the length of the subject string */
           0,              /* start at offset 0 in the subject */
           0,              /* default options */
           ovector,        /* vector of integers for substring information */
           30);            /* number of elements (NOT size in bytes) */

   Extra data for pcre_exec()

       If the extra argument is not  NULL,  it  must  point  to  a  pcre_extra  data  block.  The
       pcre_study() function returns such a block (when it doesn't return NULL), but you can also
       create one for yourself, and pass additional  information  in  it.  The  pcre_extra  block
       contains the following fields (not necessarily in this order):

         unsigned long int flags;
         void *study_data;
         unsigned long int match_limit;
         unsigned long int match_limit_recursion;
         void *callout_data;
         const unsigned char *tables;
         unsigned char **mark;

       The  flags  field  is  a bitmap that specifies which of the other fields are set. The flag
       bits are:

         PCRE_EXTRA_STUDY_DATA
         PCRE_EXTRA_MATCH_LIMIT
         PCRE_EXTRA_MATCH_LIMIT_RECURSION
         PCRE_EXTRA_CALLOUT_DATA
         PCRE_EXTRA_TABLES
         PCRE_EXTRA_MARK

       Other flag bits should be set to zero. The study_data field is set in the pcre_extra block
       that  is  returned by pcre_study(), together with the appropriate flag bit. You should not
       set this yourself, but you may add to the block by setting  the  other  fields  and  their
       corresponding flag bits.

       The  match_limit  field provides a means of preventing PCRE from using up a vast amount of
       resources when running patterns that are not going to match, but which have a  very  large
       number  of possibilities in their search trees. The classic example is a pattern that uses
       nested unlimited repeats.

       Internally, PCRE uses a function called  match()  which  it  calls  repeatedly  (sometimes
       recursively). The limit set by match_limit is imposed on the number of times this function
       is called during a match, which has the effect of limiting the amount of backtracking that
       can  take place. For patterns that are not anchored, the count restarts from zero for each
       position in the subject string.

       The default value for the limit can be set when PCRE is built; the default default  is  10
       million,  which  handles  all  but the most extreme cases. You can override the default by
       suppling  pcre_exec()  with  a  pcre_extra  block  in  which  match_limit  is   set,   and
       PCRE_EXTRA_MATCH_LIMIT  is  set  in the flags field. If the limit is exceeded, pcre_exec()
       returns PCRE_ERROR_MATCHLIMIT.

       The match_limit_recursion field is similar to match_limit, but  instead  of  limiting  the
       total  number  of  times  that  match()  is  called, it limits the depth of recursion. The
       recursion depth is a smaller number than the total number of calls, because not all  calls
       to  match()  are  recursive.   This  limit  is  of  use  only  if  it  is set smaller than
       match_limit.

       Limiting the recursion depth limits the amount of stack that can be used,  or,  when  PCRE
       has  been  compiled  to  use  memory  on the heap instead of the stack, the amount of heap
       memory that can be used.

       The default value for match_limit_recursion can be set when PCRE  is  built;  the  default
       default  is the same value as the default for match_limit. You can override the default by
       suppling pcre_exec() with a pcre_extra block in which match_limit_recursion  is  set,  and
       PCRE_EXTRA_MATCH_LIMIT_RECURSION  is  set  in  the  flags field. If the limit is exceeded,
       pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.

       The callout_data field is used in conjunction with the "callout" feature, and is described
       in the pcrecallout documentation.

       The tables field is used to pass a character tables pointer to pcre_exec(); this overrides
       the value that is stored with the compiled pattern. A non-NULL value is  stored  with  the
       compiled  pattern  only  if custom tables were supplied to pcre_compile() via its tableptr
       argument.  If NULL is passed  to  pcre_exec()  using  this  mechanism,  it  forces  PCRE's
       internal tables to be used. This facility is helpful when re-using patterns that have been
       saved after compiling with an external set of tables, because the external tables might be
       at  a  different  address when pcre_exec() is called. See the pcreprecompile documentation
       for a discussion of saving compiled patterns for later use.

       If PCRE_EXTRA_MARK is set in the flags field, the mark field must be set  to  point  to  a
       char  *  variable.  If  the  pattern  contains  any  backtracking  control  verbs  such as
       (*MARK:NAME), and the execution ends up with a name to pass back, a pointer  to  the  name
       string (zero terminated) is placed in the variable pointed to by the mark field. The names
       are within the compiled pattern; if you wish to retain such a name you must copy it before
       freeing  the  memory of a compiled pattern. If there is no name to pass back, the variable
       pointed to by the mark field set to NULL. For details of the backtracking  control  verbs,
       see the section entitled "Backtracking control" in the pcrepattern documentation.

   Option bits for pcre_exec()

       The  unused  bits of the options argument for pcre_exec() must be zero. The only bits that
       may be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx, PCRE_NOTBOL,  PCRE_NOTEOL,  PCRE_NOTEMPTY,
       PCRE_NOTEMPTY_ATSTART,  PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_SOFT, and
       PCRE_PARTIAL_HARD.

         PCRE_ANCHORED

       The PCRE_ANCHORED option limits pcre_exec() to matching at the first matching position. If
       a  pattern  was compiled with PCRE_ANCHORED, or turned out to be anchored by virtue of its
       contents, it cannot be made unachored at matching time.

         PCRE_BSR_ANYCRLF
         PCRE_BSR_UNICODE

       These options (which are mutually exclusive) control what the \R escape sequence  matches.
       The  choice  is  either  to  match  only  CR, LF, or CRLF, or to match any Unicode newline
       sequence. These options override the choice that was made or defaulted  when  the  pattern
       was compiled.

         PCRE_NEWLINE_CR
         PCRE_NEWLINE_LF
         PCRE_NEWLINE_CRLF
         PCRE_NEWLINE_ANYCRLF
         PCRE_NEWLINE_ANY

       These  options  override  the  newline  definition  that  was chosen or defaulted when the
       pattern was compiled. For details, see the description  of  pcre_compile()  above.  During
       matching,  the  newline  choice  affects  the behaviour of the dot, circumflex, and dollar
       metacharacters. It may also alter the way the match position is  advanced  after  a  match
       failure for an unanchored pattern.

       When  PCRE_NEWLINE_CRLF,  PCRE_NEWLINE_ANYCRLF,  or  PCRE_NEWLINE_ANY  is set, and a match
       attempt for an unanchored pattern fails when the current position is at a  CRLF  sequence,
       and  the  pattern contains no explicit matches for CR or LF characters, the match position
       is advanced by two characters instead of one, in other words, to after the CRLF.

       The above rule is a compromise that makes the most common  cases  work  as  expected.  For
       example,  if the pattern is .+A (and the PCRE_DOTALL option is not set), it does not match
       the string "\r\nA" because, after failing at the start, it skips both the CR  and  the  LF
       before  retrying. However, the pattern [\r\n]A does match that string, because it contains
       an explicit CR or LF reference, and so advances only by  one  character  after  the  first
       failure.

       An  explicit match for CR of LF is either a literal appearance of one of those characters,
       or one of the \r or \n escape sequences. Implicit matches such as [^X] do not  count,  nor
       does \s (which includes CR and LF in the characters that it matches).

       Notwithstanding  the above, anomalous effects may still occur when CRLF is a valid newline
       sequence and explicit \r or \n escapes appear in the pattern.

         PCRE_NOTBOL

       This option specifies that first character of the subject string is not the beginning of a
       line,  so  the  circumflex  metacharacter should not match before it. Setting this without
       PCRE_MULTILINE (at compile time) causes circumflex never to  match.  This  option  affects
       only the behaviour of the circumflex metacharacter. It does not affect \A.

         PCRE_NOTEOL

       This  option specifies that the end of the subject string is not the end of a line, so the
       dollar metacharacter should not  match  it  nor  (except  in  multiline  mode)  a  newline
       immediately before it. Setting this without PCRE_MULTILINE (at compile time) causes dollar
       never to match. This option affects only the behaviour of  the  dollar  metacharacter.  It
       does not affect \Z or \z.

         PCRE_NOTEMPTY

       An  empty string is not considered to be a valid match if this option is set. If there are
       alternatives in the pattern, they are tried. If  all  the  alternatives  match  the  empty
       string, the entire match fails. For example, if the pattern

         a?b?

       is  applied  to  a string not beginning with "a" or "b", it matches an empty string at the
       start of the subject. With PCRE_NOTEMPTY set, this match is not valid,  so  PCRE  searches
       further into the string for occurrences of "a" or "b".

         PCRE_NOTEMPTY_ATSTART

       This  is like PCRE_NOTEMPTY, except that an empty string match that is not at the start of
       the subject is permitted. If the pattern is anchored, such a match can occur only  if  the
       pattern contains \K.

       Perl  has no direct equivalent of PCRE_NOTEMPTY or PCRE_NOTEMPTY_ATSTART, but it does make
       a special case of a pattern match of the empty string within  its  split()  function,  and
       when  using  the  /g modifier. It is possible to emulate Perl's behaviour after matching a
       null string by first trying the match again at the same offset with  PCRE_NOTEMPTY_ATSTART
       and  PCRE_ANCHORED,  and  then if that fails, by advancing the starting offset (see below)
       and trying an ordinary match again. There is some code that demonstrates how to do this in
       the  pcredemo  sample  program.  In the most general case, you have to check to see if the
       newline convention recognizes CRLF as a newline, and if so, and the current  character  is
       CR followed by LF, advance the starting offset by two characters instead of one.

         PCRE_NO_START_OPTIMIZE

       There  are  a  number  of  optimizations that pcre_exec() uses at the start of a match, in
       order to speed up the process. For example, if it is known that an unanchored  match  must
       start  with  a  specific  character, it searches the subject for that character, and fails
       immediately if it cannot find it, without actually running  the  main  matching  function.
       This  means  that  a  special  item  such  as  (*COMMIT)  at the start of a pattern is not
       considered until after a suitable starting point  for  the  match  has  been  found.  When
       callouts  or (*MARK) items are in use, these "start-up" optimizations can cause them to be
       skipped if the pattern is never actually used. The start-up optimizations are in effect  a
       pre-scan of the subject that takes place before the pattern is run.

       The  PCRE_NO_START_OPTIMIZE  option  disables the start-up optimizations, possibly causing
       performance to suffer, but ensuring that in cases where the  result  is  "no  match",  the
       callouts  do  occur,  and that items such as (*COMMIT) and (*MARK) are considered at every
       possible starting position in the subject string.  If  PCRE_NO_START_OPTIMIZE  is  set  at
       compile time, it cannot be unset at matching time.

       Setting  PCRE_NO_START_OPTIMIZE  can change the outcome of a matching operation.  Consider
       the pattern

         (*COMMIT)ABC

       When this is compiled, PCRE records the fact that a match must start  with  the  character
       "A".  Suppose  the  subject  string is "DEFABC". The start-up optimization scans along the
       subject, finds "A" and runs the first match attempt from there. The (*COMMIT)  item  means
       that  the  pattern  must match the current starting position, which in this case, it does.
       However, if the same match is run with PCRE_NO_START_OPTIMIZE set, the initial scan  along
       the  subject  string does not happen. The first match attempt is run starting from "D" and
       when this fails, (*COMMIT) prevents any further matches being tried, so the overall result
       is  "no  match".  If  the pattern is studied, more start-up optimizations may be used. For
       example, a minimum length for the subject may be recorded. Consider the pattern

         (*MARK:A)(X|Y)

       The minimum length for a match is one character. If the subject is "ABC",  there  will  be
       attempts  to  match "ABC", "BC", "C", and then finally an empty string.  If the pattern is
       studied, the final attempt does not take place, because PCRE knows that the subject is too
       short,  and  so the (*MARK) is never encountered.  In this case, studying the pattern does
       not affect the overall match result, which is still "no match", but  it  does  affect  the
       auxiliary information that is returned.

         PCRE_NO_UTF8_CHECK

       When  PCRE_UTF8  is  set at compile time, the validity of the subject as a UTF-8 string is
       automatically checked when pcre_exec() is subsequently called.  The value  of  startoffset
       is  also  checked  to  ensure that it points to the start of a UTF-8 character. There is a
       discussion about the validity of UTF-8 strings in the section on UTF-8 support in the main
       pcre  page.  If an invalid UTF-8 sequence of bytes is found, pcre_exec() returns the error
       PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is  a  truncated  UTF-8
       character at the end of the subject, PCRE_ERROR_SHORTUTF8. If startoffset contains a value
       that does not point to the start of a UTF-8 character (or to  the  end  of  the  subject),
       PCRE_ERROR_BADUTF8_OFFSET is returned.

       If  you  already  know  that  your subject is valid, and you want to skip these checks for
       performance reasons, you can set the PCRE_NO_UTF8_CHECK option when  calling  pcre_exec().
       You  might  want  to do this for the second and subsequent calls to pcre_exec() if you are
       making repeated calls to find all the matches in a single  subject  string.  However,  you
       should  be sure that the value of startoffset points to the start of a UTF-8 character (or
       the end of the subject). When PCRE_NO_UTF8_CHECK is set, the effect of passing an  invalid
       UTF-8  string  as  a subject or an invalid value of startoffset is undefined. Your program
       may crash.

         PCRE_PARTIAL_HARD
         PCRE_PARTIAL_SOFT

       These  options  turn  on  the  partial  matching  feature.  For  backwards  compatibility,
       PCRE_PARTIAL  is a synonym for PCRE_PARTIAL_SOFT. A partial match occurs if the end of the
       subject string is reached successfully, but there are not  enough  subject  characters  to
       complete  the match. If this happens when PCRE_PARTIAL_SOFT (but not PCRE_PARTIAL_HARD) is
       set, matching continues by testing any remaining alternatives. Only if no  complete  match
       can be found is PCRE_ERROR_PARTIAL returned instead of PCRE_ERROR_NOMATCH. In other words,
       PCRE_PARTIAL_SOFT says that the caller is prepared to handle a partial match, but only  if
       no complete match can be found.

       If  PCRE_PARTIAL_HARD  is  set, it overrides PCRE_PARTIAL_SOFT. In this case, if a partial
       match is found, pcre_exec() immediately returns  PCRE_ERROR_PARTIAL,  without  considering
       any  other alternatives. In other words, when PCRE_PARTIAL_HARD is set, a partial match is
       considered to be more important that an alternative complete match.

       In both cases, the portion of the string that was inspected when  the  partial  match  was
       found  is set as the first matching string. There is a more detailed discussion of partial
       and multi-segment matching, with examples, in the pcrepartial documentation.

   The string to be matched by pcre_exec()

       The subject string is passed to pcre_exec() as a pointer in subject, a length  (in  bytes)
       in  length, and a starting byte offset in startoffset. If this is negative or greater than
       the length of the subject, pcre_exec() returns  PCRE_ERROR_BADOFFSET.  When  the  starting
       offset is zero, the search for a match starts at the beginning of the subject, and this is
       by far the most common case. In UTF-8 mode, the byte offset must point to the start  of  a
       UTF-8  character  (or  the end of the subject). Unlike the pattern string, the subject may
       contain binary zero bytes.

       A non-zero starting offset is useful when searching for another match in the same  subject
       by  calling  pcre_exec() again after a previous success.  Setting startoffset differs from
       just passing over a shortened string and setting PCRE_NOTBOL in the case of a pattern that
       begins with any kind of lookbehind. For example, consider the pattern

         \Biss\B

       which  finds  occurrences of "iss" in the middle of words. (\B matches only if the current
       position in the subject is not a word boundary.) When applied to the  string  "Mississipi"
       the  first  call to pcre_exec() finds the first occurrence. If pcre_exec() is called again
       with just the remainder of the subject, namely "issipi", it does not match, because \B  is
       always  false at the start of the subject, which is deemed to be a word boundary. However,
       if pcre_exec() is passed the entire string again, but with startoffset set to 4, it  finds
       the  second  occurrence  of  "iss" because it is able to look behind the starting point to
       discover that it is preceded by a letter.

       Finding all the matches in a subject is tricky when the pattern can match an empty string.
       It  is possible to emulate Perl's /g behaviour by first trying the match again at the same
       offset, with the PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED options, and then if that  fails,
       advancing  the starting offset and trying an ordinary match again. There is some code that
       demonstrates how to do this in the pcredemo sample program. In the most general case,  you
       have  to  check  to see if the newline convention recognizes CRLF as a newline, and if so,
       and the current character is CR followed  by  LF,  advance  the  starting  offset  by  two
       characters instead of one.

       If a non-zero starting offset is passed when the pattern is anchored, one attempt to match
       at the given offset is made. This can only succeed if the pattern  does  not  require  the
       match to be at the start of the subject.

   How pcre_exec() returns captured substrings

       In  general,  a pattern matches a certain portion of the subject, and in addition, further
       substrings from the subject may be picked out by parts of the pattern. Following the usage
       in  Jeffrey  Friedl's  book,  this  is  called "capturing" in what follows, and the phrase
       "capturing subpattern" is used for a fragment of a pattern that  picks  out  a  substring.
       PCRE supports several other kinds of parenthesized subpattern that do not cause substrings
       to be captured.

       Captured substrings are returned to the caller via a vector of integers whose  address  is
       passed  in ovector. The number of elements in the vector is passed in ovecsize, which must
       be a non-negative number. Note: this argument is NOT the size of ovector in bytes.

       The first two-thirds of the  vector  is  used  to  pass  back  captured  substrings,  each
       substring using a pair of integers. The remaining third of the vector is used as workspace
       by pcre_exec() while matching capturing subpatterns, and is not available for passing back
       information.  The number passed in ovecsize should always be a multiple of three. If it is
       not, it is rounded down.

       When a match is successful, information about captured substrings is returned in pairs  of
       integers,  starting  at  the  beginning of ovector, and continuing up to two-thirds of its
       length at the most. The first element of each pair is set to the byte offset of the  first
       character  in a substring, and the second is set to the byte offset of the first character
       after the end of a substring. Note: these values are always byte offsets,  even  in  UTF-8
       mode. They are not character counts.

       The first pair of integers, ovector[0] and ovector[1], identify the portion of the subject
       string matched by the entire pattern. The next  pair  is  used  for  the  first  capturing
       subpattern,  and  so  on.  The  value returned by pcre_exec() is one more than the highest
       numbered pair that has been set.  For example, if two substrings have been  captured,  the
       returned  value  is  3.  If  there  are  no capturing subpatterns, the return value from a
       successful match is 1, indicating that just the first pair of offsets has been set.

       If a capturing subpattern is matched repeatedly, it is the last portion of the string that
       it matched that is returned.

       If  the  vector is too small to hold all the captured substring offsets, it is used as far
       as possible (up to two-thirds of its length), and the function returns a value of zero. If
       the  substring  offsets are not of interest, pcre_exec() may be called with ovector passed
       as NULL and ovecsize as zero. However, if the pattern contains  back  references  and  the
       ovector  is  not big enough to remember the related substrings, PCRE has to get additional
       memory for use during matching. Thus it is usually advisable to supply an ovector.

       The pcre_fullinfo() function can be used to find out how many capturing subpatterns  there
       are  in  a  compiled pattern. The smallest size for ovector that will allow for n captured
       substrings, in addition to the offsets of the substring matched by the whole  pattern,  is
       (n+1)*3.

       It  is possible for capturing subpattern number n+1 to match some part of the subject when
       subpattern n has not been used at all. For example, if the string "abc" is matched against
       the  pattern  (a|(z))(bc)  the  return from the function is 4, and subpatterns 1 and 3 are
       matched, but 2 is not. When this happens, both values in the offset pairs corresponding to
       unused subpatterns are set to -1.

       Offset  values that correspond to unused subpatterns at the end of the expression are also
       set to -1. For example, if the string "abc" is matched against the pattern  (abc)(x(yz)?)?
       subpatterns  2  and  3  are  not  matched.  The return from the function is 2, because the
       highest used capturing subpattern number is 1, and the offsets  for  for  the  second  and
       third  capturing  subpatterns  (assuming the vector is large enough, of course) are set to
       -1.

       Note: Elements of ovector that do not correspond to capturing parentheses in  the  pattern
       are  never  changed.  That is, if a pattern contains n capturing parentheses, no more than
       ovector[0] to ovector[2n+1] are set by pcre_exec(). The  other  elements  retain  whatever
       values they previously had.

       Some convenience functions are provided for extracting the captured substrings as separate
       strings. These are described below.

   Error return values from pcre_exec()

       If pcre_exec() fails, it returns a negative number.  The  following  are  defined  in  the
       header file:

         PCRE_ERROR_NOMATCH        (-1)

       The subject string did not match the pattern.

         PCRE_ERROR_NULL           (-2)

       Either code or subject was passed as NULL, or ovector was NULL and ovecsize was not zero.

         PCRE_ERROR_BADOPTION      (-3)

       An unrecognized bit was set in the options argument.

         PCRE_ERROR_BADMAGIC       (-4)

       PCRE  stores  a 4-byte "magic number" at the start of the compiled code, to catch the case
       when it is passed a junk pointer and to detect when a pattern  that  was  compiled  in  an
       environment  of one endianness is run in an environment with the other endianness. This is
       the error that PCRE gives when the magic number is not present.

         PCRE_ERROR_UNKNOWN_OPCODE (-5)

       While running the pattern match, an unknown item was encountered in the compiled  pattern.
       This error could be caused by a bug in PCRE or by overwriting of the compiled pattern.

         PCRE_ERROR_NOMEMORY       (-6)

       If  a  pattern  contains back references, but the ovector that is passed to pcre_exec() is
       not big enough to remember the referenced substrings, PCRE gets a block of memory  at  the
       start of matching to use for this purpose. If the call via pcre_malloc() fails, this error
       is given. The memory is automatically freed at the end of matching.

       This error is also given if pcre_stack_malloc() fails in pcre_exec(). This can happen only
       when PCRE has been compiled with --disable-stack-for-recursion.

         PCRE_ERROR_NOSUBSTRING    (-7)

       This   error   is   used   by   the   pcre_copy_substring(),   pcre_get_substring(),   and
       pcre_get_substring_list() functions (see below). It is never returned by pcre_exec().

         PCRE_ERROR_MATCHLIMIT     (-8)

       The backtracking limit, as specified by the match_limit field in  a  pcre_extra  structure
       (or defaulted) was reached. See the description above.

         PCRE_ERROR_CALLOUT        (-9)

       This  error  is  never  generated by pcre_exec() itself. It is provided for use by callout
       functions that want to yield a distinctive error code. See the  pcrecallout  documentation
       for details.

         PCRE_ERROR_BADUTF8        (-10)

       A  string  that contains an invalid UTF-8 byte sequence was passed as a subject.  However,
       if PCRE_PARTIAL_HARD is set and the problem is a truncated UTF-8 character at the  end  of
       the subject, PCRE_ERROR_SHORTUTF8 is used instead.

         PCRE_ERROR_BADUTF8_OFFSET (-11)

       The  UTF-8  byte  sequence  that  was  passed  as  a  subject  was valid, but the value of
       startoffset did not point to the beginning of a UTF-8 character or the end of the subject.

         PCRE_ERROR_PARTIAL        (-12)

       The subject string did not  match,  but  it  did  match  partially.  See  the  pcrepartial
       documentation for details of partial matching.

         PCRE_ERROR_BADPARTIAL     (-13)

       This  code  is no longer in use. It was formerly returned when the PCRE_PARTIAL option was
       used with a compiled  pattern  containing  items  that  were  not  supported  for  partial
       matching. From release 8.00 onwards, there are no restrictions on partial matching.

         PCRE_ERROR_INTERNAL       (-14)

       An  unexpected internal error has occurred. This error could be caused by a bug in PCRE or
       by overwriting of the compiled pattern.

         PCRE_ERROR_BADCOUNT       (-15)

       This error is given if the value of the ovecsize argument is negative.

         PCRE_ERROR_RECURSIONLIMIT (-21)

       The internal recursion limit,  as  specified  by  the  match_limit_recursion  field  in  a
       pcre_extra structure (or defaulted) was reached. See the description above.

         PCRE_ERROR_BADNEWLINE     (-23)

       An invalid combination of PCRE_NEWLINE_xxx options was given.

         PCRE_ERROR_BADOFFSET      (-24)

       The  value of startoffset was negative or greater than the length of the subject, that is,
       the value in length.

         PCRE_ERROR_SHORTUTF8      (-25)

       The subject  string  ended  with  an  incomplete  (truncated)  UTF-8  character,  and  the
       PCRE_PARTIAL_HARD  option  was set. Without this option, PCRE_ERROR_BADUTF8 is returned in
       this situation.

       Error numbers -16 to -20 and -22 are not used by pcre_exec().

EXTRACTING CAPTURED SUBSTRINGS BY NUMBER


       int pcre_copy_substring(const char *subject, int *ovector,
            int stringcount, int stringnumber, char *buffer,
            int buffersize);

       int pcre_get_substring(const char *subject, int *ovector,
            int stringcount, int stringnumber,
            const char **stringptr);

       int pcre_get_substring_list(const char *subject,
            int *ovector, int stringcount, const char ***listptr);

       Captured substrings can be accessed directly by using the offsets returned by  pcre_exec()
       in  ovector.  For  convenience, the functions pcre_copy_substring(), pcre_get_substring(),
       and pcre_get_substring_list() are provided for  extracting  captured  substrings  as  new,
       separate, zero-terminated strings. These functions identify substrings by number. The next
       section describes functions for extracting named substrings.

       A substring that contains a binary zero is correctly extracted  and  has  a  further  zero
       added  on the end, but the result is not, of course, a C string.  However, you can process
       such a string by referring to the length that is  returned  by  pcre_copy_substring()  and
       pcre_get_substring().   Unfortunately,  the  interface to pcre_get_substring_list() is not
       adequate for handling strings containing binary zeros, because the end of the final string
       is not independently indicated.

       The  first  three  arguments are the same for all three of these functions: subject is the
       subject string that has just been successfully matched, ovector is a pointer to the vector
       of  integer  offsets  that  was  passed  to  pcre_exec(), and stringcount is the number of
       substrings that were captured by the match,  including  the  substring  that  matched  the
       entire regular expression. This is the value returned by pcre_exec() if it is greater than
       zero. If pcre_exec() returned zero, indicating that it ran out of space  in  ovector,  the
       value  passed  as  stringcount  should  be the number of elements in the vector divided by
       three.

       The functions pcre_copy_substring() and pcre_get_substring() extract a  single  substring,
       whose number is given as stringnumber. A value of zero extracts the substring that matched
       the  entire  pattern,  whereas  higher  values  extract  the  captured   substrings.   For
       pcre_copy_substring(),  the  string  is  placed  in  buffer,  whose  length  is  given  by
       buffersize, while  for  pcre_get_substring()  a  new  block  of  memory  is  obtained  via
       pcre_malloc,  and  its address is returned via stringptr. The yield of the function is the
       length of the string, not including the terminating zero, or one of these error codes:

         PCRE_ERROR_NOMEMORY       (-6)

       The buffer was too small for pcre_copy_substring(), or the attempt to  get  memory  failed
       for pcre_get_substring().

         PCRE_ERROR_NOSUBSTRING    (-7)

       There is no substring whose number is stringnumber.

       The pcre_get_substring_list() function extracts all available substrings and builds a list
       of pointers to them. All this is done in a single block of memory  that  is  obtained  via
       pcre_malloc.  The  address  of the memory block is returned via listptr, which is also the
       start of the list of string pointers. The end of the list is marked by a NULL pointer. The
       yield of the function is zero if all went well, or the error code

         PCRE_ERROR_NOMEMORY       (-6)

       if the attempt to get the memory block failed.

       When  any  of  these  functions encounter a substring that is unset, which can happen when
       capturing subpattern number n+1 matches some part of the subject, but subpattern n has not
       been  used  at  all, they return an empty string. This can be distinguished from a genuine
       zero-length substring by inspecting the appropriate offset in ovector, which  is  negative
       for unset substrings.

       The  two convenience functions pcre_free_substring() and pcre_free_substring_list() can be
       used  to  free  the  memory  returned  by  a  previous  call  of  pcre_get_substring()  or
       pcre_get_substring_list(),  respectively.  They  do  nothing  more  than call the function
       pointed to by pcre_free, which of course could  be  called  directly  from  a  C  program.
       However,  PCRE  is  used  in some situations where it is linked via a special interface to
       another programming language that cannot use pcre_free directly; it  is  for  these  cases
       that the functions are provided.

EXTRACTING CAPTURED SUBSTRINGS BY NAME


       int pcre_get_stringnumber(const pcre *code,
            const char *name);

       int pcre_copy_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            char *buffer, int buffersize);

       int pcre_get_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            const char **stringptr);

       To  extract  a  substring by name, you first have to find associated number.  For example,
       for this pattern

         (a+)b(?<xxx>\d+)...

       the number of the subpattern called "xxx" is  2.  If  the  name  is  known  to  be  unique
       (PCRE_DUPNAMES  was  not  set),  you  can  find  the  number  from  the  name  by  calling
       pcre_get_stringnumber(). The first argument is the compiled pattern, and the second is the
       name.  The  yield of the function is the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7)
       if there is no subpattern of that name.

       Given the number, you can extract the substring directly, or  use  one  of  the  functions
       described  in  the previous section. For convenience, there are also two functions that do
       the whole job.

       Most of the arguments of pcre_copy_named_substring()  and  pcre_get_named_substring()  are
       the  same  as those for the similarly named functions that extract by number. As these are
       described in the previous section, they are not re-described  here.  There  are  just  two
       differences:

       First, instead of a substring number, a substring name is given. Second, there is an extra
       argument, given at the start, which is a pointer to the compiled pattern. This  is  needed
       in order to gain access to the name-to-number translation table.

       These  functions  call  pcre_get_stringnumber(),  and  if  it  succeeds,  they  then  call
       pcre_copy_substring() or pcre_get_substring(), as appropriate. NOTE: If  PCRE_DUPNAMES  is
       set  and  there  are duplicate names, the behaviour may not be what you want (see the next
       section).

       Warning: If the pattern uses the (?| feature to set up multiple subpatterns with the  same
       number,  as  described  in  the section on duplicate subpattern numbers in the pcrepattern
       page, you cannot use names to distinguish the different subpatterns, because names are not
       included  in  the  compiled code. The matching process uses only numbers. For this reason,
       the use of different names for subpatterns of the same number causes an error  at  compile
       time.

DUPLICATE SUBPATTERN NAMES


       int pcre_get_stringtable_entries(const pcre *code,
            const char *name, char **first, char **last);

       When  a  pattern  is compiled with the PCRE_DUPNAMES option, names for subpatterns are not
       required to be unique. (Duplicate names are always allowed for subpatterns with  the  same
       number,  created by using the (?| feature. Indeed, if such subpatterns are named, they are
       required to use the same names.)

       Normally, patterns with duplicate names are such that in any one match, only  one  of  the
       named subpatterns participates. An example is shown in the pcrepattern documentation.

       When  duplicates  are  present, pcre_copy_named_substring() and pcre_get_named_substring()
       return the first substring corresponding to the given name that is set. If none  are  set,
       PCRE_ERROR_NOSUBSTRING  (-7) is returned; no data is returned. The pcre_get_stringnumber()
       function returns one of the numbers that are associated with  the  name,  but  it  is  not
       defined which it is.

       If  you want to get full details of all captured substrings for a given name, you must use
       the pcre_get_stringtable_entries() function. The first argument is the  compiled  pattern,
       and  the  second  is  the  name.  The third and fourth are pointers to variables which are
       updated by the function. After it has run, they point to the first and last entries in the
       name-to-number  table  for  the given name. The function itself returns the length of each
       entry, or PCRE_ERROR_NOSUBSTRING (-7) if there are  none.  The  format  of  the  table  is
       described  above  in  the  section  entitled  Information  about a pattern.  Given all the
       relevant entries for the name, you can extract  each  of  their  numbers,  and  hence  the
       captured data, if any.

FINDING ALL POSSIBLE MATCHES


       The  traditional  matching  function uses a similar algorithm to Perl, which stops when it
       finds the first match, starting at a given point in the subject. If you want to  find  all
       possible  matches,  or the longest possible match, consider using the alternative matching
       function (see below) instead. If you cannot use the alternative function, but  still  need
       to  find all possible matches, you can kludge it up by making use of the callout facility,
       which is described in the pcrecallout documentation.

       What you have to do is to insert a callout right at the end of  the  pattern.   When  your
       callout function is called, extract and save the current matched substring. Then return 1,
       which forces pcre_exec() to backtrack and try other alternatives. Ultimately, when it runs
       out of matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.

MATCHING A PATTERN: THE ALTERNATIVE FUNCTION


       int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize,
            int *workspace, int wscount);

       The  function  pcre_dfa_exec()  is  called  to  match  a subject string against a compiled
       pattern, using a matching algorithm that scans the subject string just once, and does  not
       backtrack.  This  has  different  characteristics  to  the  normal  algorithm,  and is not
       compatible  with  Perl.  Some  of  the  features  of  PCRE  patterns  are  not  supported.
       Nevertheless,  there  are times when this kind of matching can be useful. For a discussion
       of the two matching algorithms, and a list  of  features  that  pcre_dfa_exec()  does  not
       support, see the pcrematching documentation.

       The  arguments  for the pcre_dfa_exec() function are the same as for pcre_exec(), plus two
       extras. The ovector argument is used in a different way, and this is described below.  The
       other  common  arguments are used in the same way as for pcre_exec(), so their description
       is not repeated here.

       The two additional arguments provide workspace for  the  function.  The  workspace  vector
       should  contain  at  least  20  elements.  It  is used for keeping track of multiple paths
       through the pattern tree. More workspace will be needed for patterns  and  subjects  where
       there are a lot of potential matches.

       Here is an example of a simple call to pcre_dfa_exec():

         int rc;
         int ovector[10];
         int wspace[20];
         rc = pcre_dfa_exec(
           re,             /* result of pcre_compile() */
           NULL,           /* we didn't study the pattern */
           "some string",  /* the subject string */
           11,             /* the length of the subject string */
           0,              /* start at offset 0 in the subject */
           0,              /* default options */
           ovector,        /* vector of integers for substring information */
           10,             /* number of elements (NOT size in bytes) */
           wspace,         /* working space vector */
           20);            /* number of elements (NOT size in bytes) */

   Option bits for pcre_dfa_exec()

       The  unused  bits  of the options argument for pcre_dfa_exec() must be zero. The only bits
       that  may  be  set  are   PCRE_ANCHORED,   PCRE_NEWLINE_xxx,   PCRE_NOTBOL,   PCRE_NOTEOL,
       PCRE_NOTEMPTY,      PCRE_NOTEMPTY_ATSTART,      PCRE_NO_UTF8_CHECK,      PCRE_BSR_ANYCRLF,
       PCRE_BSR_UNICODE,    PCRE_NO_START_OPTIMIZE,     PCRE_PARTIAL_HARD,     PCRE_PARTIAL_SOFT,
       PCRE_DFA_SHORTEST,  and  PCRE_DFA_RESTART.  All but the last four of these are exactly the
       same as for pcre_exec(), so their description is not repeated here.

         PCRE_PARTIAL_HARD
         PCRE_PARTIAL_SOFT

       These have the same general effect as  they  do  for  pcre_exec(),  but  the  details  are
       slightly  different.  When  PCRE_PARTIAL_HARD  is  set  for  pcre_dfa_exec(),  it  returns
       PCRE_ERROR_PARTIAL if the end of the subject is reached and there is still  at  least  one
       matching  possibility  that  requires  additional  characters.  This  happens even if some
       complete matches have also been found. When PCRE_PARTIAL_SOFT  is  set,  the  return  code
       PCRE_ERROR_NOMATCH  is  converted  into  PCRE_ERROR_PARTIAL  if  the end of the subject is
       reached, there have been no complete matches, but there is still  at  least  one  matching
       possibility.  The  portion of the string that was inspected when the longest partial match
       was found is set as the first matching string in both cases.  There  is  a  more  detailed
       discussion  of  partial  and  multi-segment  matching,  with  examples, in the pcrepartial
       documentation.

         PCRE_DFA_SHORTEST

       Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to stop as soon  as  it
       has  found  one  match.  Because  of  the  way  the  alternative  algorithm works, this is
       necessarily the shortest possible match at  the  first  possible  matching  point  in  the
       subject string.

         PCRE_DFA_RESTART

       When  pcre_dfa_exec()  returns  a  partial  match,  it  is possible to call it again, with
       additional  subject  characters,  and  have  it  continue  with  the   same   match.   The
       PCRE_DFA_RESTART  option  requests  this action; when it is set, the workspace and wscount
       options must reference the same vector as before because data about the match  so  far  is
       left  in  them  after  a  partial  match. There is more discussion of this facility in the
       pcrepartial documentation.

   Successful returns from pcre_dfa_exec()

       When pcre_dfa_exec() succeeds, it may have matched more than one substring in the subject.
       Note,  however,  that all the matches from one run of the function start at the same point
       in the subject. The shorter matches are all initial substrings of the longer matches.  For
       example, if the pattern

         <.*>

       is matched against the string

         This is <something> <something else> <something further> no more

       the three matched strings are

         <something>
         <something> <something else>
         <something> <something else> <something further>

       On  success,  the yield of the function is a number greater than zero, which is the number
       of matched substrings. The substrings themselves are returned in ovector. Each string uses
       two  elements;  the  first is the offset to the start, and the second is the offset to the
       end. In fact, all the strings have the same start offset. (Space could have been saved  by
       giving  this  only  once,  but  it  was  decided to retain some compatibility with the way
       pcre_exec() returns data, even though the meaning of the strings is different.)

       The strings are returned in reverse order of length; that is, the longest matching  string
       is  given  first.  If  there  were  too many matches to fit into ovector, the yield of the
       function is zero, and the vector is filled with the longest matches.

   Error returns from pcre_dfa_exec()

       The pcre_dfa_exec() function returns a negative number when it fails.  Many of the  errors
       are the same as for pcre_exec(), and these are described above.  There are in addition the
       following errors that are specific to pcre_dfa_exec():

         PCRE_ERROR_DFA_UITEM      (-16)

       This return is given if pcre_dfa_exec() encounters an item in the pattern that it does not
       support, for instance, the use of \C or a back reference.

         PCRE_ERROR_DFA_UCOND      (-17)

       This  return  is  given  if  pcre_dfa_exec()  encounters a condition item that uses a back
       reference for the condition, or a test for recursion in a specific group.  These  are  not
       supported.

         PCRE_ERROR_DFA_UMLIMIT    (-18)

       This  return  is  given  if  pcre_dfa_exec() is called with an extra block that contains a
       setting of the match_limit field. This is not supported (it is meaningless).

         PCRE_ERROR_DFA_WSSIZE     (-19)

       This return is given if pcre_dfa_exec() runs out of space in the workspace vector.

         PCRE_ERROR_DFA_RECURSE    (-20)

       When a recursive subpattern is processed, the matching function calls itself  recursively,
       using  private vectors for ovector and workspace. This error is given if the output vector
       is not large enough. This should be extremely rare, as a vector of size 1000 is used.

SEE ALSO


       pcrebuild(3),    pcrecallout(3),    pcrecpp(3)(3),    pcrematching(3),     pcrepartial(3),
       pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).

AUTHOR


       Philip Hazel
       University Computing Service
       Cambridge CB2 3QH, England.

REVISION


       Last updated: 21 November 2010
       Copyright (c) 1997-2010 University of Cambridge.

                                                                                       PCREAPI(3)