Provided by: libbobcat-dev_6.03.02-2_amd64 bug

NAME

       FBB::Pattern - Performs RE pattern matching

SYNOPSIS

       #include <bobcat/pattern>
       Linking option: -lbobcat

DESCRIPTION

       Pattern  objects  may be used for Regular Expression (RE) pattern matching. The class is a
       wrapper around the regcomp(3) family of functions. By default it  uses  `extended  regular
       expressions’, requiring you to escape multipliers and bounding-characters when they should
       be interpreted as ordinary characters (i.e., *, +, ?, ^, $, |, (, ), [, ], {, } should  be
       escaped when used as literal characters).

       The Pattern class supports the use of the following (Perl-like) special escape sequences:
       \b - indicating a word-boundary
       \d - indicating a digit ([[:digit:]]) character
       \s - indicating a white-space ([:space:]) character
       \w - indicating a word ([:alnum:]) character

       The  corresponding  capitals  (e.g.,  \W)  define  the  complementary  character sets. The
       capitalized character set shorthands are not expanded  inside  explicit  character-classes
       (i.e., [ ... ] constructions). So [\W] represents a set of two characters: \ and W.

       As  the  backslash  (\)  is treated as a special character it should be handled carefully.
       Pattern converts the escape sequences \d \s \w (and outside of explicit character  classes
       the  sequences \D \S \W) to their respective character classes. All other escape sequences
       are kept as-is, and the resulting regular expression is offered to  the  pattern  matching
       compilation  function  regcomp(3). This function interprets escape sequences. Consequently
       some care should be exercised when defining patterns containing escape sequences. Here are
       the rules:

       o      Special escape sequences (like \d) are converted to character classes. E.g.,

                  ---------------------------------------------------------
                  Specify:    Converts to:    regcomp uses:      Matches:
                  ---------------------------------------------------------
                  \d          [[:digit:]]     [[:digit:]]        3
                  ---------------------------------------------------------

       o      Ordinary escape sequences (like \x) are kept as-is. E.g.,

                  ---------------------------------------------------------
                  Specify:    Converts to:    regcomp uses:      Matches:
                  ---------------------------------------------------------
                  \x          \x              x                  x
                  ---------------------------------------------------------

       o      To specify literal escape sequences, Raw String Literals are advised, as they don’t
              require doubling escape sequences. E.g., the following regular  expression  matches
              an (alpha-numeric) word, followed by optional blanks, a colon, more optional blanks
              and a (decimal) number:

                  R"((\w+)\s*:\s*\d+)"

NAMESPACE

       FBB
       All constructors, members, operators and manipulators, mentioned  in  this  man-page,  are
       defined in the namespace FBB.

INHERITS FROM

       -

TYPEDEF

       o      Pattern::Position:
              A nested type representing the offsets of the first character and the offset beyond
              the last character of  the  matched  text  or  indexed  subexpression,  defined  as
              std::pair<std::string::size_type, std::string::size_type>.

CONSTRUCTORS

       o      Pattern():
              The  default constructor defines no pattern, but is available as a placeholder for,
              e.g., containers requiring default constructors. A Pattern object thus  constructed
              cannot  be  used  to  match  patterns,  but  can be the lvalue in assignments where
              another Pattern object is the rvalue. However, it can receive a pattern  using  the
              member  setPattern()  (see below). An FBB::Exception object is thrown if the object
              could not be constructed.

       o      Pattern(std::string const &pattern, bool caseSensitive = true, size_t  nSub  =  10,
              int options = REG_EXTENDED | REG_NEWLINE):
              This  constructor  compiles  pattern,  preparing  the  Pattern  object  for pattern
              matches. The second parameter determines whether case sensitive  matching  will  be
              used  (the  default)  or not. Subexpressions are defined by parentheses pairs. Each
              matching pair defines a subexpression, where  the  order-number  of  their  opening
              parentheses   determines   the   subexpression’s  index.  By  default  at  most  10
              subexpressions are recognized.  The options flags may be:

              REG_EXTENDED:
              Use POSIX Extended Regular Expression syntax when interpreting regex.  If not  set,
              POSIX Basic Regular Expression syntax is used.

              REG_NOSUB:
              Support  for  substring  addressing of matches is  not required.   The  nmatch  and
              pmatch  parameters to regexec are ignored  if  the  pattern  buffer   supplied  was
              compiled with this flag set.

              REG_NEWLINE:
              Match-any-character  operators  don’t  match a newline.

              A non-matching list ([^...])  not containing a newline does not match a newline.

              Match-beginning-of-line  operator  (^) matches the empty string immediately after a
              newline, regardless of whether eflags, the execution  flags  of  regexec,  contains
              REG_NOTBOL.

              Match-end-of-line  operator  ($)   matches  the  empty string  immediately before a
              newline, regardless of whether eflags contains REG_NOTEOL.

       Copy and move constructors (and assignment operators) are available.

MEMBER FUNCTIONS

       All members of std::ostringstream and   std::exception are available, as Pattern  inherits
       from these classes.

       o      std::string before() const:
              Following a successful match, before() returns the text before the matched text.

       o      std::string beyond() const:
              Following a successful match, beyond() returns the text beyond the matched text.

       o      size_t end() const:
              Returns  the  number  of  matched  elements (text and subexpressions). end() is the
              lowest index value for which position() returns two std::string::npos  values  (see
              the position() member function, below).

       o      void match(std::string const &text, int options = 0):
              Match  a  string  with  a  pattern.  If the text could not be matched, an Exception
              exception is thrown , using Pattern::match() as its prefix-text.

              Options may be:

              REG_NOTBOL:
              The match-beginning-of-line operator always fails to match (but see the compilation
              flag  REG_NEWLINE  above) This flag may be used when different portions of a string
              are passed to regexec and the beginning of the string should not be interpreted  as
              the beginning of the line.

              REG_NOTEOL:
              The    match-end-of-line   operator   always   fails   to  match   (but   see   the
              compilation flag REG_NEWLINE)

       o      std::string matched() const:
              Following a successful match, this function returns the matched text.

       o      std::string const &pattern() const:
              This member function returns the pattern that is offered to regcomp(3). It  returns
              the  content  of  a  static  string  that  is overwritten at each construction of a
              Pattern object and at each call of the setPattern() member function.

       o      Pattern::Position position(size_t index) const:
              With index == 0 the fully matched text is returned (identical to matched()).  Other
              index   values   return   the   corresponding   subexpressions.  std::string::npos,
              std::string::npos is returned if index is at least end() (which may happen at index
              value 0).

       o      void  setPattern(std::string const &pattern, bool caseSensitive = true, size_t nSub
              = 10, int options = REG_EXTENDED | REG_NEWLINE):
              This member function installs a new compiled pattern in its  Pattern  object.  This
              member’s  parameters are identical to the second constructor’s parameters. Refer to
              that constructor for  details  about  the  parameters.  Like  the  constructor,  an
              FBB::Exception exception is thrown if the new pattern could not be compiled.

       o      void swap(Pattern &other):
              The content of the current object and the other object are swapped.

OVERLOADED OPERATORS

       o      std::string operator[](size_t index) const:
              Returns  the  matched  text  (for index 0) or the text of a subexpression. An empty
              string is returned for index values which are at least end().

       o      Pattern &operator<<(int matchOptions):
              Defines match-options to be used with the following overloaded operator.

       o      bool operator<<(std::string const &text):
              Performs a match(text, matchOptions) call, catching any  exception  that  might  be
              thrown.  If  no matchOptions were set using the above overloaded operator, none are
              used. The options set this way are not `sticky’: when necessary, they  have  to  be
              re-inserted  before  each  new  pattern  matching. The function returns true if the
              matching was successful, false otherwise.

EXAMPLE

       #include "driver.h"

       #include <bobcat/pattern>

       using namespace std;
       using namespace FBB;

       #include <algorithm>
       #include <cstring>

       void showSubstr(string const &str)
       {
           static int count = 0;

           cout << "String " << ++count << " is ’" << str << "’\n";
       }

       void match(Pattern const &patt, string const &text)
       try
       {
            Pattern pattern{ patt };

           pattern.match(text);

           Pattern p3(pattern);

           cout << "before:  " << p3.before() << "\n"
                   "matched: " << p3.matched() << "\n"
                   "beyond:  " << pattern.beyond() << "\n"
                   "end() = " << pattern.end() << ’\n’;

           for (size_t idx = 0; idx != pattern.end(); ++idx)
           {
               string str = pattern[idx];

               if (str.empty())
                   cout << "part " << idx << " not present\n";
               else
               {
                   Pattern::Position pos = pattern.position(idx);

                   cout << "part " << idx << ": ’" << str << "’ (" <<
                               pos.first << "-" << pos.second << ")\n";
               }
           }
       }
       catch (exception const &exc)
       {
           cout << exc.what() << ’\n’;
       }

       int main(int argc, char **argv)
       {
           string patStr = R"(\d+)";

           do
           {
               cout << "Pattern: ’" << patStr << "’\n";
               try
               {
                       // by default: case sensitive
                       // use any args. for case insensitive
                   Pattern patt(patStr, argc == 1);

                   cout << "Compiled pattern: " << patt.pattern() << ’\n’;

                   while (true)
                   {
                       cout << "string to match : ";

                       string text;
                       getline(cin, text);
                       if (text.empty())
                           break;
                       cout << "String: ’" << text << "’\n";
                       match(patt, text);
                   }
               }
               catch (exception const &exc)
               {
                   cout << exc.what() << ": compilation failed\n";
               }

               cout << "New pattern: ";
           }
           while (getline(cin, patStr) and not patStr.empty());
       }

FILES

       bobcat/pattern - defines the class interface

SEE ALSO

       bobcat(7), regcomp(3), regex(3), regex(7)

BUGS

       Using Pattern objects as static  data  members  of  classes  (or  as  global  objects)  is
       potentially  dangerous.  If the object files defining these static data members are stored
       in a dynamic library they may not be initialized properly or timely,  and  their  eventual
       destruction  may  result in a segmentation fault. This is a well-known problem with static
       data, see, e.g., http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.15. In situations
       like  this  prefer  the  use  of a (shared, unique) pointer to a Pattern, initializing the
       pointer when, e.g., first used.

BOBCAT PROJECT FILES

       o      https://fbb-git.gitlab.io/bobcat/: gitlab project page;

       o      bobcat_6.03.02-x.dsc: detached signature;

       o      bobcat_6.03.02-x.tar.gz: source archive;

       o      bobcat_6.03.02-x_i386.changes: change log;

       o      libbobcat1_6.03.02-x_*.deb: debian package containing the libraries;

       o      libbobcat1-dev_6.03.02-x_*.deb: debian package containing  the  libraries,  headers
              and manual pages;

BOBCAT

       Bobcat is an acronym of `Brokken’s Own Base Classes And Templates’.

COPYRIGHT

       This  is  free  software,  distributed  under  the terms of the GNU General Public License
       (GPL).

AUTHOR

       Frank B. Brokken (f.b.brokken@rug.nl).