Provided by: libpcre3_7.6-2.1ubuntu1_i386 bug

NAME

       PCRE - Perl-compatible regular expressions

PCRE REGULAR EXPRESSION DETAILS


       The  syntax and semantics of the regular expressions that are supported
       by PCRE are described in  detail  below.  There  is  a  quick-reference
       syntax  summary  in the pcresyntax page. Perl’s regular expressions are
       described in its own documentation, and regular expressions in  general
       are  covered in a number of books, some of which have copious examples.
       Jeffrey  Friedl’s  "Mastering  Regular   Expressions",   published   by
       O’Reilly,  covers regular expressions in great detail. This description
       of PCRE’s regular expressions is intended as reference material.

       The original operation of PCRE was on strings of  one-byte  characters.
       However,  there is now also support for UTF-8 character strings. To use
       this, you must build PCRE to  include  UTF-8  support,  and  then  call
       pcre_compile()  with  the  PCRE_UTF8  option.  How this affects pattern
       matching is mentioned in several places below. There is also a  summary
       of  UTF-8  features  in  the  section on UTF-8 support in the main pcre
       page.

       The  remainder  of  this  document  discusses  the  patterns  that  are
       supported  by  PCRE  when  its  main matching function, pcre_exec(), is
       used.  From release  6.0,  PCRE  offers  a  second  matching  function,
       pcre_dfa_exec(),  which matches using a different algorithm that is not
       Perl-compatible. Some of the features discussed below are not available
       when  pcre_dfa_exec()  is used. The advantages and disadvantages of the
       alternative function, and how it differs from the normal function,  are
       discussed in the pcrematching page.

NEWLINE CONVENTIONS


       PCRE  supports five different conventions for indicating line breaks in
       strings:  a  single  CR  (carriage  return)  character,  a  single   LF
       (linefeed) character, the two-character sequence CRLF, any of the three
       preceding, or any  Unicode  newline  sequence.  The  pcreapi  page  has
       further  discussion  about  newlines,  and shows how to set the newline
       convention in the options arguments  for  the  compiling  and  matching
       functions.

       It  is  also  possible  to  specify  a newline convention by starting a
       pattern string with one of the following five sequences:

         (*CR)        carriage return
         (*LF)        linefeed
         (*CRLF)      carriage return, followed by linefeed
         (*ANYCRLF)   any of the three above
         (*ANY)       all Unicode newline sequences

       These override the default and the options given to pcre_compile(). For
       example, on a Unix system where LF is the default newline sequence, the
       pattern

         (*CR)a.b

       changes the convention to CR. That pattern matches "a
b" because LF is
       no  longer  a  newline. Note that these special settings, which are not
       Perl-compatible, are recognized only at the very start  of  a  pattern,
       and  that  they  must  be  in  upper  case. If more than one of them is
       present, the last one is used.

       The newline convention does not affect  what  the  \R  escape  sequence
       matches.  By  default,  this  is any Unicode newline sequence, for Perl
       compatibility. However, this can be changed; see the description of  \R
       in  the  section  entitled  "Newline  sequences"  below. A change of \R
       setting can be combined with a change of newline convention.

CHARACTERS AND METACHARACTERS


       A regular expression is a pattern that is  matched  against  a  subject
       string  from  left  to right. Most characters stand for themselves in a
       pattern, and match the corresponding characters in the  subject.  As  a
       trivial example, the pattern

         The quick brown fox

       matches a portion of a subject string that is identical to itself. When
       caseless matching is specified (the PCRE_CASELESS option), letters  are
       matched  independently  of case. In UTF-8 mode, PCRE always understands
       the concept of case for characters whose values are less than  128,  so
       caseless  matching  is  always  possible.  For  characters  with higher
       values, the concept of case is  supported  if  PCRE  is  compiled  with
       Unicode  property  support,  but  not  otherwise.   If  you want to use
       caseless matching for characters 128 and above, you  must  ensure  that
       PCRE  is  compiled  with Unicode property support as well as with UTF-8
       support.

       The power of regular expressions comes  from  the  ability  to  include
       alternatives  and  repetitions in the pattern. These are encoded in the
       pattern by the use of metacharacters, which do not stand for themselves
       but instead are interpreted in some special way.

       There  are  two  different  sets  of  metacharacters:  those  that  are
       recognized anywhere in the pattern except within square  brackets,  and
       those  that  are  recognized  within  square  brackets.  Outside square
       brackets, the metacharacters are as follows:

         \      general escape character with several uses
         ^      assert start of string (or line, in multiline mode)
         $      assert end of string (or line, in multiline mode)
         .      match any character except newline (by default)
         [      start character class definition
         |      start of alternative branch
         (      start subpattern
         )      end subpattern
         ?      extends the meaning of (
                also 0 or 1 quantifier
                also quantifier minimizer
         *      0 or more quantifier
         +      1 or more quantifier
                also "possessive quantifier"
         {      start min/max quantifier

       Part of a pattern that is in square brackets  is  called  a  "character
       class". In a character class the only metacharacters are:

         \      general escape character
         ^      negate the class, but only if the first character
         -      indicates character range
         [      POSIX character class (only if followed by POSIX
                  syntax)
         ]      terminates the character class

       The  following sections describe the use of each of the metacharacters.

BACKSLASH


       The backslash character has several uses. Firstly, if it is followed by
       a  non-alphanumeric  character,  it takes away any special meaning that
       character may have. This  use  of  backslash  as  an  escape  character
       applies both inside and outside character classes.

       For  example,  if  you want to match a * character, you write \* in the
       pattern.  This escaping action applies whether  or  not  the  following
       character  would  otherwise be interpreted as a metacharacter, so it is
       always safe to precede a non-alphanumeric  with  backslash  to  specify
       that  it  stands  for  itself.  In  particular,  if you want to match a
       backslash, you write \.

       If a pattern is compiled with the PCRE_EXTENDED option,  whitespace  in
       the  pattern (other than in a character class) and characters between a
       # outside a character class  and  the  next  newline  are  ignored.  An
       escaping  backslash  can be used to include a whitespace or # character
       as part of the pattern.

       If  you  want  to  remove  the  special  meaning  from  a  sequence  of
       characters,  you  can  do so by putting them between \Q and \E. This is
       different from Perl in that $ and @ are handled as literals in  \Q...\E
       sequences   in   PCRE,   whereas  in  Perl,  $  and  @  cause  variable
       interpolation. Note the following examples:

         Pattern            PCRE matches   Perl matches

         \Qabc$xyz\E        abc$xyz        abc followed by the
                                             contents of $xyz
         \Qabc\$xyz\E       abc\$xyz       abc\$xyz
         \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz

       The \Q...\E sequence is recognized both inside  and  outside  character
       classes.

   Non-printing characters

       A  second  use  of  backslash  provides  a way of encoding non-printing
       characters in patterns in a visible manner. There is no restriction  on
       the  appearance  of non-printing characters, apart from the binary zero
       that terminates a pattern, but when a pattern is being prepared by text
       editing,  it  is  usually  easier  to  use  one of the following escape
       sequences than the binary character it represents:

                 alarm, that is, the BEL character (hex 07)
         

Powered by the Ubuntu Manpage Repository generator
Maintained by Dustin Kirkland