Provided by: libpcre3-dev_6.4-1.1ubuntu4_i386 bug


       PCRE - Perl-compatible regular expressions


       The  PCRE  library  is  a  set  of  functions  that  implement  regular
       expression pattern matching using the  same  syntax  and  semantics  as
       Perl,  with  just a few differences. The current implementation of PCRE
       (release  6.x)  corresponds  approximately  with  Perl  5.8,  including
       support   for  UTF-8  encoded  strings  and  Unicode  general  category
       properties. However, this support has to be explicitly enabled;  it  is
       not the default.

       In  addition  to  the  Perl-compatible  matching  function,  PCRE  also
       contains  an  alternative  matching  function  that  matches  the  same
       compiled  patterns  in  a  different way. In certain circumstances, the
       alternative function has some advantages. For a discussion of  the  two
       matching algorithms, see the pcrematching page.

       PCRE  is  written  in C and released as a C library. A number of people
       have written wrappers and interfaces of various kinds.  In  particular,
       Google  Inc.   have  provided  a comprehensive C++ wrapper. This is now
       included as part of the PCRE distribution. The pcrecpp page has details
       of  this  interface.  Other  people’s contributions can be found in the
       Contrib directory at the primary FTP site, which is:

       Details of exactly which Perl regular expression features are  and  are
       not  supported  by  PCRE  are  given  in  separate  documents.  See the
       pcrepattern and pcrecompat pages.

       Some features of PCRE can be included, excluded, or  changed  when  the
       library  is  built.  The pcre_config() function makes it possible for a
       client  to  discover  which  features  are  available.   The   features
       themselves  are  described  in  the pcrebuild page. Documentation about
       building PCRE for various operating systems can be found in the  README
       file in the source distribution.

       The  library  contains  a number of undocumented internal functions and
       data tables that are used by more than one  of  the  exported  external
       functions,  but  which  are  not  intended for use by external callers.
       Their names all begin with "_pcre_", which hopefully will  not  provoke
       any name clashes. In some environments, it is possible to control which
       external symbols are exported when a shared library is  built,  and  in
       these cases the undocumented symbols are not exported.


       The  user  documentation  for  PCRE  comprises  a  number  of different
       sections. In the "man" format, each of these is a separate "man  page".
       In  the  HTML  format,  each  is a separate page, linked from the index
       page. In the plain text format, all the sections are concatenated,  for
       ease of searching. The sections are as follows:

         pcre              this document
         pcreapi           details of PCRE’s native C API
         pcrebuild         options for building PCRE
         pcrecallout       details of the callout feature
         pcrecompat        discussion of Perl compatibility
         pcrecpp           details of the C++ wrapper
         pcregrep          description of the pcregrep command
         pcrematching      discussion of the two matching algorithms
         pcrepartial       details of the partial matching facility
         pcrepattern       syntax and semantics of supported
                             regular expressions
         pcreperform       discussion of performance issues
         pcreposix         the POSIX-compatible C API
         pcreprecompile    details of saving and re-using precompiled patterns
         pcresample        discussion of the sample program
         pcretest          description of the pcretest testing command

       In addition, in the "man" and HTML formats, there is a short  page  for
       each C library function, listing its arguments and results.


       There  are some size limitations in PCRE but it is hoped that they will
       never in practice be relevant.

       The maximum length of a compiled pattern is 65539 (sic) bytes  if  PCRE
       is compiled with the default internal linkage size of 2. If you want to
       process regular expressions that are truly enormous,  you  can  compile
       PCRE  with  an  internal linkage size of 3 or 4 (see the README file in
       the source distribution and the pcrebuild documentation  for  details).
       In  these  cases the limit is substantially larger.  However, the speed
       of execution will be slower.

       All values in repeating quantifiers  must  be  less  than  65536.   The
       maximum number of capturing subpatterns is 65535.

       There  is  no limit to the number of non-capturing subpatterns, but the
       maximum depth of nesting of  all  kinds  of  parenthesized  subpattern,
       including   capturing  subpatterns,  assertions,  and  other  types  of
       subpattern, is 200.

       The maximum length of a subject string is the largest  positive  number
       that  an integer variable can hold. However, when using the traditional
       matching function,  PCRE  uses  recursion  to  handle  subpatterns  and
       indefinite  repetition.   This means that the available stack space may
       limit the size of a subject string that can  be  processed  by  certain


       From  release  3.3,  PCRE  has  had  some support for character strings
       encoded in the UTF-8 format. For release 4.0 this was greatly  extended
       to  cover  most  common  requirements,  and  in  release 5.0 additional
       support for Unicode general category properties was added.

       In order process UTF-8 strings, you must build PCRE  to  include  UTF-8
       support  in  the  code,  and, in addition, you must call pcre_compile()
       with the PCRE_UTF8 option flag. When you do this, both the pattern  and
       any  subject  strings  that are matched against it are treated as UTF-8
       strings instead of just strings of bytes.

       If you compile PCRE with UTF-8 support, but do not use it at run  time,
       the  library will be a bit bigger, but the additional run time overhead
       is limited to testing the PCRE_UTF8 flag in several places,  so  should
       not be very large.

       If PCRE is built with Unicode character property support (which implies
       UTF-8 support),  the  escape  sequences  \p{..},  \P{..},  and  \X  are
       supported.   The available properties that can be tested are limited to
       the general category properties such as Lu for an upper case letter  or
       Nd  for  a  decimal  number.  A  full  list is given in the pcrepattern
       documentation. The PCRE library is increased in size by about 90K  when
       Unicode property support is included.

       The following comments apply when PCRE is running in UTF-8 mode:

       1.  When you set the PCRE_UTF8 flag, the strings passed as patterns and
       subjects are checked for validity on entry to the  relevant  functions.
       If an invalid UTF-8 string is passed, an error return is given. In some
       situations, you may already know  that  your  strings  are  valid,  and
       therefore want to skip these checks in order to improve performance. If
       you set the PCRE_NO_UTF8_CHECK flag at compile time  or  at  run  time,
       PCRE  assumes  that  the  pattern or subject it is given (respectively)
       contains only valid UTF-8 codes. In this case, it does not diagnose  an
       invalid  UTF-8 string. If you pass an invalid UTF-8 string to PCRE when
       PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program  may

       2. In a pattern, the escape sequence \x{...}, where the contents of the
       braces is a string of hexadecimal digits, is  interpreted  as  a  UTF-8
       character  whose  code  number  is  the  given  hexadecimal number, for
       example: \x{1234}. If  a  non-hexadecimal  digit  appears  between  the
       braces,  the  item is not recognized.  This escape sequence can be used
       either as a literal, or within a character class.

       3. The original hexadecimal escape sequence, \xhh, matches  a  two-byte
       UTF-8 character if the value is greater than 127.

       4.  Repeat  quantifiers  apply  to  complete  UTF-8  characters, not to
       individual bytes, for example: \x{100}{3}.

       5. The dot metacharacter matches  one  UTF-8  character  instead  of  a
       single byte.

       6.  The  escape sequence \C can be used to match a single byte in UTF-8
       mode, but its use can lead to some strange effects.  This  facility  is
       not available in the alternative matching function, pcre_dfa_exec().

       7.  The  character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
       test characters of  any  code  value,  but  the  characters  that  PCRE
       recognizes as digits, spaces, or word characters remain the same set as
       before, all with values less than 256. This remains true even when PCRE
       includes  Unicode  property support, because to do otherwise would slow
       down PCRE in many common cases. If you really want to test for a  wider
       sense  of,  say,  "digit",  you must use Unicode property tests such as

       8. Similarly, characters that match the POSIX named  character  classes
       are all low-valued characters.

       9.  Case-insensitive  matching  applies only to characters whose values
       are less than 128, unless PCRE is built with Unicode property  support.
       Even  when  Unicode  property support is available, PCRE still uses its
       own character tables when checking the case of  low-valued  characters,
       so  as not to degrade performance.  The Unicode property information is
       used only for characters with higher values.


       Philip Hazel
       University Computing Service,
       Cambridge CB2 3QG, England.

       Putting an actual email address here seems to have been a spam  magnet,
       so  I’ve  taken  it  away.  If you want to email me, use my initial and
       surname, separated by a dot, at the domain