focal (3) wildmat.3.gz

Provided by: inn2_2.6.3-3_amd64 bug

NAME

       uwildmat, uwildmat_simple, uwildmat_poison - Perform wildmat matching

SYNOPSIS

           #include <inn/libinn.h>

           bool uwildmat(const char *text, const char *pattern);

           bool uwildmat_simple(const char *text, const char *pattern);

           enum uwildmat uwildmat_poison(const char *text, const char *pattern);

DESCRIPTION

       uwildmat compares text against the wildmat expression pattern, returning true if and only
       if the expression matches the text.  "@" has no special meaning in pattern when passed to
       uwildmat.  Both text and pattern are assumed to be in the UTF-8 character encoding,
       although malformed UTF-8 sequences are treated in a way that attempts to be mostly
       compatible with single-octet character sets like ISO 8859-1.  (In other words, if you try
       to match ISO 8859-1 text with these routines everything should work as expected unless the
       ISO 8859-1 text contains valid UTF-8 sequences, which thankfully is somewhat rare.)

       uwildmat_simple is identical to uwildmat except that neither "!"  nor "," have any special
       meaning and pattern is always treated as a single pattern.  This function exists solely to
       support legacy interfaces like NNTP's XPAT command, and should be avoided when
       implementing new features.

       uwildmat_poison works similarly to uwildmat, except that "@" as the first character of one
       of the patterns in the expression (see below) "poisons" the match if it matches.
       uwildmat_poison returns UWILDMAT_MATCH if the expression matches the text, UWILDMAT_FAIL
       if it doesn't, and UWILDMAT_POISON if the expression doesn't match because a poisoned
       pattern matched the text.  These enumeration constants are defined in the inn/libinn.h
       header.

WILDMAT EXPRESSIONS

       A wildmat expression follows rules similar to those of shell filename wildcards but with
       some additions and changes.  A wildmat expression is composed of one or more wildmat
       patterns separated by commas.  Each character in the wildmat pattern matches a literal
       occurrence of that same character in the text, with the exception of the following
       metacharacters:

       ?       Matches any single character (including a single UTF-8 multibyte character, so "?"
               can match more than one byte).

       *       Matches any sequence of zero or more characters.

       \       Turns off any special meaning of the following character; the following character
               will match itself in the text.  "\" will escape any character, including another
               backslash or a comma that otherwise would separate a pattern from the next pattern
               in an expression.  Note that "\" is not special inside a character range (no
               metacharacters are).

       [...]   A character set, which matches any single character that falls within that set.
               The presence of a character between the brackets adds that character to the set;
               for example, "[amv]" specifies the set containing the characters "a", "m", and
               "v".  A range of characters may be specified using "-"; for example, "[0-5abc]" is
               equivalent to "[012345abc]".  The order of characters is as defined in the UTF-8
               character set, and if the start character of such a range falls after the ending
               character of the range in that ranking the results of attempting a match with that
               pattern are undefined.

               In order to include a literal "]" character in the set, it must be the first
               character of the set (possibly following "^"); for example, "[]a]" matches either
               "]" or "a".  To include a literal "-" character in the set, it must be either the
               first or the last character of the set.  Backslashes have no special meaning
               inside a character set, nor do any other of the wildmat metacharacters.

       [^...]  A negated character set.  Follows the same rules as a character set above, but
               matches any character not contained in the set.  So, for example, "[^]-]" matches
               any character except "]" and "-".

       In addition, "!" (and possibly "@") have special meaning as the first character of a
       pattern; see below.

       When matching a wildmat expression against some text, each comma-separated pattern is
       matched in order from left to right.  In order to match, the pattern must match the whole
       text; in regular expression terminology, it's implicitly anchored at both the beginning
       and the end.  For example, the pattern "a" matches only the text "a"; it doesn't match
       "ab" or "ba" or even "aa".  If none of the patterns match, the whole expression doesn't
       match.  Otherwise, whether the expression matches is determined entirely by the rightmost
       matching pattern; the expression matches the text if and only if the rightmost matching
       pattern is not negated.

       For example, consider the text "news.misc".  The expression "*" matches this text, of
       course, as does "comp.*,news.*" (because the second pattern matches).  "news.*,!news.misc"
       does not match this text because both patterns match, meaning that the rightmost takes
       precedence, and the rightmost matching pattern is negated.  "news.*,!news.misc,*.misc"
       does match this text, since the rightmost matching pattern is not negated.

       Note that the expression "!news.misc" can't match anything.  Either the pattern doesn't
       match, in which case no patterns match and the expression doesn't match, or the pattern
       does match, in which case because it's negated the expression doesn't match.
       "*,!news.misc", on the other hand, is a useful pattern that matches anything except
       "news.misc".

       "!" has significance only as the first character of a pattern; anywhere else in the
       pattern, it matches a literal "!" in the text like any other non-metacharacter.

       If the uwildmat_poison interface is used, then "@" behaves the same as "!" except that if
       an expression fails to match because the rightmost matching pattern began with "@",
       UWILDMAT_POISON is returned instead of UWILDMAT_FAIL.

       If the uwildmat_simple interface is used, the matching rules are the same as above except
       that none of "!", "@", or "," have any special meaning at all and only match those literal
       characters.

BUGS

       All of these functions internally convert the passed arguments to const unsigned char
       pointers.  The only reason why they take regular char pointers instead of unsigned char is
       for the convenience of INN and other callers that may not be using unsigned char
       everywhere they should.  In a future revision, the public interface should be changed to
       just take unsigned char pointers.

HISTORY

       Written by Rich $alz <rsalz@uunet.uu.net> in 1986, and posted to Usenet several times
       since then, most notably in comp.sources.misc in March, 1991.

       Lars Mathiesen <thorinn@diku.dk> enhanced the multi-asterisk failure mode in early 1991.

       Rich and Lars increased the efficiency of star patterns and reposted it to
       comp.sources.misc in April, 1991.

       Robert Elz <kre@munnari.oz.au> added minus sign and close bracket handling in June, 1991.

       Russ Allbery <eagle@eyrie.org> added support for comma-separated patterns and the "!" and
       "@" metacharacters to the core wildmat routines in July, 2000.  He also added support for
       UTF-8 characters, changed the default behavior to assume that both the text and the
       pattern are in UTF-8, and largely rewrote this documentation to expand and clarify the
       description of how a wildmat expression matches.

       Please note that the interfaces to these functions are named uwildmat and the like rather
       than wildmat to distinguish them from the wildmat function provided by Rich $alz's
       original implementation.  While this code is heavily based on Rich's original code, it has
       substantial differences, including the extension to support UTF-8 characters, and has
       noticeable functionality changes.  Any bugs present in it aren't Rich's fault.

       $Id: uwildmat.pod 10283 2018-05-14 12:43:05Z iulius $

SEE ALSO

       grep(1), fnmatch(3), regex(3), regexp(3).