Provided by: libbobcat-dev_4.01.03-2ubuntu1_amd64 bug

NAME

       FBB::Pattern - Performs RE pattern matching

SYNOPSIS

       #include <bobcat/pattern>
       Linking option: -lbobcat

DESCRIPTION

       Pattern  objects  may be used for Regular Expression (RE) pattern matching. The class is a
       wrapper around the regcomp(3) family of functions. By default it  uses  `extended  regular
       expressions’, requiring you to escape multipliers and bounding-characters when they should
       be interpreted as ordinary characters (i.e., *, +, ?, ^, $, |, (, ), [, ], {, } should  be
       escaped when used as literal characters).

       The Pattern class supports the use of the following (Perl-like) special escape sequences:
       \b - indicating a word-boundary
       \d - indicating a digit ([[:digit:]]) character
       \s - indicating a white-space ([:space:]) character
       \w - indicating a word ([:alnum:]) character

       The  corresponding  capitals  (e.g.,  \W)  define  the  complementary  character sets. The
       capitalized character set shorthands are not expanded  inside  explicit  character-classes
       (i.e., [ ... ] constructions). So [\W] represents a set of two characters: \ and W.

       As  the  backslash  (\)  is treated as a special character it should be handled carefully.
       Pattern converts the escape sequences \d \s \w (and outside of explicit character  classes
       the  sequences \D \S \W) to their respective character classes. All other escape sequences
       are kept as is, and the resulting regular expression is offered to  the  pattern  matching
       compilation  function  regcomp(3).  This  function  will again interpret escape sequences.
       Consequently some care should  be  exercised  when  defining  patterns  containing  escape
       sequences. Here are the rules:

       o      Special escape sequences (like \d) are converted to character classes. E.g.,

                  ---------------------------------------------------------
                  Specify:    Converts to:    regcomp uses:      Matches:
                  ---------------------------------------------------------
                  \d          [[:digit:]]     [[:digit:]]        3
                  ---------------------------------------------------------

       o      Ordinary escape sequences (like \x) are kept as-is. E.g.,

                  ---------------------------------------------------------
                  Specify:    Converts to:    regcomp uses:      Matches:
                  ---------------------------------------------------------
                  \x          \x              x                  x
                  ---------------------------------------------------------

       o      To specify a literal escape sequence, it must be written twice. E.g.,

                  ---------------------------------------------------------
                  Specify:    Converts to:    regcomp uses:      Matches:
                  ---------------------------------------------------------
                  \\x         \\x             \x                 \x
                  ---------------------------------------------------------

NAMESPACE

       FBB
       All  constructors,  members,  operators  and manipulators, mentioned in this man-page, are
       defined in the namespace FBB.

INHERITS FROM

       -

TYPEDEF

       o      Pattern::Position:
              A nested type representing the offsets of the first character and the offset beyond
              the  last  character  of  the  matched  text  or  indexed subexpression, defined as
              std::pair<std::string::size_type, std::string::size_type>.

CONSTRUCTORS

       o      Pattern():
              The default constructor defines no pattern, but is available as a placeholder  for,
              e.g.,  containers requiring default constructors. A Pattern object thus constructed
              cannot be used to match patterns, but  can  be  the  lvalue  in  assignments  where
              another  Pattern  object is the rvalue. However, it can receive a pattern using the
              member setPattern() (see below). An FBB::Exception object is thrown if  the  object
              could not be constructed.

       o      Pattern(std::string  const  &pattern,  bool caseSensitive = true, size_t nSub = 10,
              int options = REG_EXTENDED | REG_NEWLINE):
              This constructor  compiles  pattern,  preparing  the  Pattern  object  for  pattern
              matches.  The  second  parameter determines whether case sensitive matching will be
              used (the default) or not. Subexpressions are defined by  parentheses  pairs.  Each
              matching  pair  defines  a  subexpression,  where the order-number of their opening
              parentheses  determines  the  subexpression’s  index.  By  default   at   most   10
              subexpressions are recognized.  The options flags may be:

              REG_EXTENDED:
              Use  POSIX Extended Regular Expression syntax when interpreting regex.  If not set,
              POSIX Basic Regular Expression syntax is used.

              REG_NOSUB:
              Support for substring addressing of matches is  not required.    The   nmatch   and
              pmatch   parameters  to  regexec  are  ignored  if the pattern buffer  supplied was
              compiled with this flag set.

              REG_NEWLINE:
              Match-any-character  operators  don’t  match a newline.

              A non-matching list ([^...])  not containing a newline does not match a newline.

              Match-beginning-of-line operator (^) matches the empty string immediately  after  a
              newline,  regardless  of  whether  eflags, the execution flags of regexec, contains
              REG_NOTBOL.

              Match-end-of-line operator ($)  matches  the  empty string   immediately  before  a
              newline, regardless of whether eflags contains REG_NOTEOL.

       Pattern offers  copy and move constructors.

MEMBER FUNCTIONS

       All  members of std::ostringstream and   std::exception are available, as Pattern inherits
       from these classes.

       o      std::string before() const:
              Following a successful match, before() returns the text before the matched text.

       o      std::string beyond() const:
              Following a successful match, beyond() returns the text beyond the matched text.

       o      size_t end() const:
              Returns the number of matched elements (text  and  subexpressions).  end()  is  the
              lowest  index  value for which position() returns two std::string::npos values (see
              the position() member function, below).

       o      void match(std::string const &text, int options = 0):
              Match a string with a pattern.  If the text could  not  be  matched,  an  Exception
              exception is thrown , using Pattern::match() as its prefix-text.

              Options may be:

              REG_NOTBOL:
              The match-beginning-of-line operator always fails to match (but see the compilation
              flag REG_NEWLINE above) This flag may be used when different portions of  a  string
              are  passed to regexec and the beginning of the string should not be interpreted as
              the beginning of the line.

              REG_NOTEOL:
              The   match-end-of-line   operator   always   fails   to  match   (but   see    the
              compilation flag REG_NEWLINE)

       o      std::string matched() const:
              Following a successful match, this function returns the matched text.

       o      std::string const &pattern() const:
              This  member function returns the pattern that is offered to regcomp(3). It returns
              the contents of a static string that is  overwritten  at  each  construction  of  a
              Pattern object and at each call of the setPattern() member function.

       o      Pattern::Position position(size_t index) const:
              With  index == 0 the fully matched text is returned (identical to matched()). Other
              index  values   return   the   corresponding   subexpressions.   std::string::npos,
              std::string::npos is returned if index is at least end() (which may happen at index
              value 0).

       o      void setPattern(std::string const &pattern, bool caseSensitive = true, size_t  nSub
              = 10, int options = REG_EXTENDED | REG_NEWLINE):
              This  member  function installs a new  compiled pattern in its Pattern object. This
              member’s parameters are identical to the second constructor’s parameters. Refer  to
              that  constructor  for  details  about  the  parameters.  Like  the constructor, an
              FBB::Exception exception is thrown if the new pattern could not be compiled.

       o      void swap(Pattern &other):
              The contents of the current object and the other object are swapped.

OVERLOADED OPERATORS

       o      Pattern &operator=(Pattern &other):
              A standard overloaded assignment operator.

       o      std::string operator[](size_t index) const:
              Returns the matched text (for index 0) or the text of  a  subexpression.  An  empty
              string is returned for index values which are at least end().

       o      Pattern &operator<<(int matchOptions):
              Defines match-options to be used with the following overloaded operator.

       o      bool operator<<(std::string const &text):
              Performs  a  match(text,  matchOptions)  call, catching any exception that might be
              thrown. If no matchOptions were set using the above overloaded operator,  none  are
              used.  The  options  set this way are not `sticky’: when necessary, they have to be
              re-inserted before each new pattern matching. The  function  returns  true  if  the
              matching was successful, false otherwise.

EXAMPLE

       /*
                                     driver.cc
       */

       #include "driver.h"

       #include <bobcat/pattern>

       using namespace std;
       using namespace FBB;

       void showSubstr(string const &str)
       {
           static int
               count = 1;

           cout << "String " << count++ << " is ’" << str << "’\n";
       }

       int main(int argc, char **argv)
       {
           {
               Pattern one("one");
               Pattern two(one);
               Pattern three("a");
               Pattern four;
               three = two;
           }

           try
           {
               Pattern pattern("aap|noot|mies");

               {
                   Pattern extra(Pattern(pattern));
               }

               if (pattern << "noot")
                   cout << "noot matches\n";
               else
                   cout << ": noot doesn’t match\n";
           }
           catch (exception const &e)
           {
               cout << e.what() << ": compilation failed" << endl;
           }

           string pat = "\\d+";

           while (true)
           {
               cout << "Pattern: ’" << pat << "’\n";

               try
               {
                   Pattern patt(pat, argc == 1);   // case sensitive by default,
                                                   // any arg for case insensitive

                   cout << "Compiled pattern: " << patt.pattern() << endl;

                   Pattern pattern;
                   pattern = patt;                 // assignment operator

                   while (true)
                   {
                       cout << "string to match : ";

                       string st;
                       getline(cin, st);
                       if (st == "")
                           break;
                       cout << "String: ’" << st << "’\n";
                       try
                       {
                           pattern.match(st);

                           Pattern p3(pattern);

                           cout << "before:  " << p3.before() << "\n"
                                   "matched: " << p3.matched() << "\n"
                                   "beyond:  " << pattern.beyond() << "\n"
                                   "end() = " << pattern.end() << endl;

                           for (size_t idx = 0; idx < pattern.end(); ++idx)
                           {
                               string str = pattern[idx];

                               if (str == "")
                                   cout << "part " << idx << " not present\n";
                               else
                               {
                                   Pattern::Position pos = pattern.position(idx);

                                   cout << "part " << idx << ": ’" << str << "’ (" <<
                                           pos.first << "-" << pos.second << ")\n";
                               }
                           }
                       }
                       catch (exception const &e)
                       {
                           cout << e.what() << ": " << st << " doesn’t match" << endl;
                           continue;
                       }
                   }
               }
               catch (exception const &e)
               {
                   cout << e.what() << ": compilation failed" << endl;
               }

               cout << "New pattern: ";

               if (!getline(cin, pat) || !pat.length())
                   return 0;
           }
       }

FILES

       bobcat/pattern - defines the class interface

SEE ALSO

       bobcat(7), regcomp(3), regex(3), regex(7)

BUGS

       Using  Pattern  objects  as  static  data  members  of  classes  (or as global objects) is
       potentially dangerous. If the object files defining these static data members  are  stored
       in  a  dynamic  library they may not be initialized properly or timely, and their eventual
       destruction may result in a segmentation fault. This is a well-known problem  with  static
       data, see, e.g., http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.15. In situations
       like this prefer the use of a (shared, unique)  pointer  to  a  Pattern,  initialzing  the
       pointer when, e.g., first used.

DISTRIBUTION FILES

       o      bobcat_4.01.03-x.dsc: detached signature;

       o      bobcat_4.01.03-x.tar.gz: source archive;

       o      bobcat_4.01.03-x_i386.changes: change log;

       o      libbobcat1_4.01.03-x_*.deb: debian package holding the libraries;

       o      libbobcat1-dev_4.01.03-x_*.deb:  debian  package holding the libraries, headers and
              manual pages;

       o      http://sourceforge.net/projects/bobcat: public archive location;

BOBCAT

       Bobcat is an acronym of `Brokken’s Own Base Classes And Templates’.

COPYRIGHT

       This is free software, distributed under the terms  of  the  GNU  General  Public  License
       (GPL).

AUTHOR

       Frank B. Brokken (f.b.brokken@rug.nl).