Provided by: re2c_3.1-1build1_amd64 bug

NAME

       re2c - generate fast lexical analyzers for C/C++, Go and Rust

SYNOPSIS

       Note: examples are in C++ (but can be easily adapted to C).

          re2c    [ OPTIONS ] [ WARNINGS ] INPUT
          re2go   [ OPTIONS ] [ WARNINGS ] INPUT
          re2rust [ OPTIONS ] [ WARNINGS ] INPUT

       Input can be either a file or - for stdin.

INTRODUCTION

       re2c works as a preprocessor. It reads the input file (which is usually a program in the target language,
       but can be anything) and looks for blocks of code enclosed in special-form comments. The text outside  of
       these  blocks  is copied verbatim into the output file. The contents of the blocks are processed by re2c.
       It translates them to code in the target language and outputs the generated code in place of the block.

       Here is an example of a small program that checks if a given string contains a decimal number:

          // re2c $INPUT -o $OUTPUT -i --case-ranges
          #include <assert.h>

          bool lex(const char *s) {
              const char *YYCURSOR = s;
              /*!re2c
                  re2c:yyfill:enable = 0;
                  re2c:define:YYCTYPE = char;

                  number = [1-9][0-9]*;

                  number { return true; }
                  *      { return false; }
              */
          }

          int main() {
              assert(lex("1234"));
              return 0;
          }

       In the output everything between /*!re2c and */ has been replaced with the generated code:

          /* Generated by re2c */
          // re2c $INPUT -o $OUTPUT -i --case-ranges
          #include <assert.h>

          bool lex(const char *s) {
              const char *YYCURSOR = s;

          {
              char yych;
              yych = *YYCURSOR;
              switch (yych) {
                  case '1' ... '9': goto yy2;
                  default: goto yy1;
              }
          yy1:
              ++YYCURSOR;
              { return false; }
          yy2:
              yych = *++YYCURSOR;
              switch (yych) {
                  case '0' ... '9': goto yy2;
                  default: goto yy3;
              }
          yy3:
              { return true; }
          }

          }

          int main() {
              assert(lex("1234"));
              return 0;
          }

SYNTAX

       A re2c program consists of a sequence of blocks intermixed with code in the target  language.  There  are
       three main kinds of blocks:

          /*!re2c[:<name>] ... */
                 A  global  block  contains  definitions,  configurations,  directives and rules.  re2c compiles
                 regular expressions associated with each rule into a deterministic finite automaton, encodes it
                 in  the  form  of  conditional  jumps  in  the  target language and replaces the block with the
                 generated code. Names and configurations defined in a global block  are  added  to  the  global
                 scope  and become visible to subsequent blocks. At the start of the program the global scope is
                 initialized with command-line options.  The :<name> part is optional: if  specified,  the  name
                 can be used to refer to the block in another part of the program.

          /*!local:re2c[:<name>] ... */
                 A  local  block is like a global block, but the names and configurations in it have local scope
                 (they do not affect other blocks).

          /*!rules:re2c[:<name>] ... */
                 A rules block is like a local block, but it does not generate any  code  and  is  meant  to  be
                 reused  in  other  blocks.  This  is a way of sharing code (more details in the reusable blocks
                 section).

       There are also many auxiliary blocks; see section blocks and directives for a full list of them. A  block
       may contain the following kinds of statements:

          <name> = <regular expression>;
                 A  definition  binds  a name to a regular expression. Names may contain alphanumeric characters
                 and underscore. The regular expressions section gives an overview of re2c  syntax  for  regular
                 expressions.  Once  defined,  the  name  can be used in other regular expressions and in rules.
                 Recursion in named definitions is not allowed, and each name should be  defined  before  it  is
                 used.  A block inherits named definitions from the global scope.  Redefining a name that exists
                 in the current scope is an error.

          <configuration> = <value>;
                 A configuration allows one to change re2c behavior and customize the generated code. For a full
                 list  of  configurations  supported  by  re2c  see  the  configurations section. Depending on a
                 particular configuration, the value can be  a  keyword,  a  nonnegative  integer  number  or  a
                 one-line  string  which  should  be  enclosed  in double or single quotes unless it consists of
                 alphanumeric characters. A block inherits configurations from the global scope and may redefine
                 them  or add new ones. Configurations defined inside of a block affect the whole block, even if
                 they appear at the end of it.

          <regular expression> { <code> }
                 A rule binds a regular expression to  a  semantic  action  (a  block  of  code  in  the  target
                 language).  If  the  regular expression matches, the associated semantic action is executed. If
                 multiple rules match, the longest match takes precedence. If  multiple  rules  match  the  same
                 string,  the earliest one takes precedence. There are two special rules: the default rule * and
                 the end of input rule $. The default rule should always be defined, it has the lowest  priority
                 regardless  of  its  place  in the block, and it matches any code unit (not necessarily a valid
                 character, see the encoding support section). The end of input rule should be  defined  if  the
                 corresponding method for handling the end of input is used. If start conditions are used, rules
                 have more complex syntax.

          !<directive>;
                 A directive is one of the special predefined statements. Each directive has a  unique  purpose.
                 For  example,  the  !use  directive merges a rules block into the current one (see the reusable
                 blocks section), and the !include directive allows one  to  include  an  outer  file  (see  the
                 include files section).

PROGRAM INTERFACE

       The  generated  code interfaces with the outer program with the help of primitives -- symbolic names that
       can be defined as variables, functions or macros in the target language (collectively referred to as  the
       API).  The definition of primitives is left for the user: this gives them both freedom in customizing the
       lexer and responsibility to understand how it works.  Not all primitives have  to  be  defined  ---  only
       those  used  by  a  given program.  The manual provides definitions for the most popular use cases. For a
       full list of primitives and their meaning see the API primitives section.

       There are two API flavors that define the set of primitives used by re2c:

          Pointer API
                 This API is based on C pointer arithmetic. It was historically the first, and for a  long  time
                 the  only  one. It consists of pointer-like primitives YYCURSOR, YYMARKER, YYCTXMARKER, YYLIMIT
                 (which are normally defined as pointers of type YYCTYPE*) and YYFILL. This API  is  enabled  by
                 default  for  C,  and  it  cannot  be  used  with  other  backends  that do not support pointer
                 arithmetic.

          Generic API
                 This API is more flexible. It consists generic operations and does not  assume  any  particular
                 implementation.  The  primitives  are  YYPEEK, YYSKIP, YYBACKUP, YYBACKUPCTX, YYSTAGP, YYSTAGN,
                 YYMTAGP, YYMTAGN, YYRESTORE, YYRESTORECTX,  YYRESTORETAG,  YYSHIFT,  YYSHIFTSTAG,  YYSHIFTMTAG,
                 YYLESSTHAN  and  YYFILL.   For the C backend generic API is enabled with --api custom option or
                 re2c:api = custom; configuration; for Go and Rust it is enabled by  default.  Generic  API  was
                 added in version 0.14.

       There are two API styles that determine the form in which the primitives should be defined:

          Free-form
                 Free-form  style  is  enabled  with  configuration  re2c:api:style = free-form;.  In this style
                 interface primitives should be defined as free-form pieces of code with interpolated  variables
                 of  the form @@{var} or optionally just @@ if there is a single variable.  The set of variables
                 is specific to each primitive.  Generic API can be defined in terms of pointers cursor,  limit,
                 marker and ctxmarker as follows:

                     /*!re2c
                       re2c:define:YYPEEK       = "*cursor";
                       re2c:define:YYSKIP       = "++cursor;";
                       re2c:define:YYBACKUP     = "marker = cursor;";
                       re2c:define:YYRESTORE    = "cursor = marker;";
                       re2c:define:YYBACKUPCTX  = "ctxmarker = cursor;";
                       re2c:define:YYRESTORECTX = "cursor = ctxmarker;";
                       re2c:define:YYRESTORETAG = "cursor = ${tag};";
                       re2c:define:YYLESSTHAN   = "limit - cursor < @@{len}";
                       re2c:define:YYSTAGP      = "@@{tag} = cursor;";
                       re2c:define:YYSTAGN      = "@@{tag} = NULL;";
                       re2c:define:YYSHIFT      = "cursor += @@{shift};";
                       re2c:define:YYSHIFTSTAG  = "@@{tag} += @@{shift};";
                     */

          Function-like
                 Function-like  style  is  enabled with configuration re2c:api:style = functions;. In this style
                 primitives should be defined as functions or macros with parentheses, accepting  the  necessary
                 arguments.   For  historical  reasons this API style is the default for C/C++ backend.  Generic
                 API can be defined in terms of pointers cursor, limit, marker and ctxmarker as follows:

                     #define  YYPEEK()                 *cursor
                     #define  YYSKIP()                 ++cursor
                     #define  YYBACKUP()               marker = cursor
                     #define  YYRESTORE()              cursor = marker
                     #define  YYBACKUPCTX()            ctxmarker = cursor
                     #define  YYRESTORECTX()           cursor = ctxmarker
                     #define  YYRESTORETAG(tag)        cursor = tag
                     #define  YYLESSTHAN(len)          limit - cursor < len
                     #define  YYSTAGP(tag)             tag = cursor
                     #define  YYSTAGN(tag)             tag = NULL
                     #define  YYSHIFT(shift)           cursor += shift
                     #define  YYSHIFTSTAG(tag, shift)  tag += shift

       For YYFILL definition and instructions how to customize or disable end-of-input checks see  the  handling
       the end of input and buffer refilling sections.

OPTIONS

       Some of the options have corresponding configurations, others are global and cannot be changed after re2c
       starts reading the input file.  Debug options generally require building  re2c  in  debug  configuration.
       Internal options are useful for experimenting with the algorithms used in re2c.

       -? --help -h
              Show help message.

       --api --input <default | custom>
              Specify the API used by the generated code to interface with used-defined code: default is the API
              based on pointer arithmetic (the default for C), and custom is the generic API (the default for Go
              and Rust).

       --bit-vectors -b
              Optimize conditional jumps using bit masks.  This option implies --nested-ifs.

       --case-insensitive
              Treat single-quoted and double-quoted strings as case-insensitive.

       --case-inverted
              Invert  the  meaning  of  single-quoted  and double-quoted strings: treat single-quoted strings as
              case-sensitive and double-quoted strings as case-insensitive.

       --case-ranges
              Collapse consecutive cases in a switch statements into a range of the  form  low  ...  high.  This
              syntax  is  a C/C++ language extension that is supported by compilers like GCC, Clang and Tcc. The
              main advantage over using single cases is smaller  generated  code  and  faster  generation  time,
              although  for  some  compilers  like  Tcc  it also results in smaller binary size.  This option is
              supported only for C.

       --computed-gotos -g
              Optimize conditional jumps using non-standard "computed goto" extension (which must  be  supported
              by  the  compiler).  re2c  generates  jump  tables only in complex cases with a lot of conditional
              branches. Complexity threshold can be configured with cgoto:threshold configuration.  This  option
              implies --bit-vectors. It is supported only for C.

       --conditions --start-conditions -c
              Enable  support  of Flex-like "conditions": multiple interrelated lexers within one block. This is
              an alternative to manually specifying different re2c blocks connected with goto or function calls.

       --depfile FILE
              Write dependency information to FILE in the form of a Makefile rule <output-file>  :  <input-file>
              [include-file  ...].  This  allows one to track build dependencies in the presence of include:re2c
              directives, so that updating include files triggers regeneration of the output file.  This  option
              depends on the --output option.

       --ebcdic --ecb -e
              Generate  a  lexer that reads input in EBCDIC encoding. re2c assumes that the character range is 0
              -- 0xFF and character size is 1 byte.

       --empty-class <match-empty | match-none | error>
              Define the way re2c treats empty character classes. With match-empty  (the  default)  empty  class
              matches  empty  input  (which is illogical, but backwards-compatible). With match-none empty class
              always fails to match.  With error empty class raises a compilation error.

       --encoding-policy <fail | substitute | ignore>
              Define the way re2c treats Unicode surrogates.  With  fail  re2c  aborts  with  an  error  when  a
              surrogate  is  encountered.  With substitute re2c silently replaces surrogates with the error code
              point 0xFFFD. With ignore (the default) re2c treats surrogates as normal code points. The  Unicode
              standard says that standalone surrogates are invalid, but real-world libraries and programs behave
              in different ways.

       --flex-syntax -F
              Partial support for Flex syntax: in this mode named definitions don't need the equal sign and  the
              terminating  semicolon,  and  when  used  they must be surrounded with curly braces. Names without
              curly braces are treated as double-quoted strings.

       --header --type-header -t HEADER
              Generate a HEADER file. The contents of the file can be specified with  directives  header:re2c:on
              and  header:re2c:off.   If conditions are used the header will have a condition enum automatically
              appended to it (unless there is an explicit conditions:re2c directive).

       -I PATH
              Add PATH to the list of locations which are used when searching for include files. This option  is
              useful  in  combination  with  include:re2c directive. re2c looks for FILE in the directory of the
              parent file and in the include locations specified with -I option.

       --input-encoding <ascii | utf8>
              Specify the way re2c parses regular expressions.  With ascii (the default) re2c handles  input  as
              ASCII-encoded:  any  sequence  of  code units is a sequence of standalone 1-byte characters.  With
              utf8 re2c handles input as UTF8-encoded and recognizes multibyte characters.

       --invert-captures
              Invert the meaning of capturing and non-capturing groups. By default (...)  is  capturing  and  (!
              ...) is non-capturing. With this option (! ...) is capturing and (...) is non-capturing.

       --lang <c | go | rust>
              Specify  the  output language. Supported languages are C, Go and Rust.  The default is C for re2c,
              Go for re2go and Rust for re2rust.

       --leftmost-captures
              Enable submatch extraction with leftmost greedy capturing groups.

       --location-format <gnu | msvc>
              Specify location format in messages.  With gnu locations  are  printed  as  'filename:line:column:
              ...'.  With msvc locations are printed as 'filename(line,column) ...'.  The default is gnu.

       --loop-switch
              Encode  DFA  in  a form of a loop over a switch statement. Individual states are switch cases. The
              current state is stored in a variable yystate.  Transitions between states update yystate  to  the
              case  label  of  the destination state and continue to the head of the loop. This option is always
              enabled for Rust, as it has no goto statement and cannot use the goto/label approach which is  the
              default for C and Go backends.

       --nested-ifs -s
              Use  nested  if statements instead of switch statements in conditional jumps. This usually results
              in more efficient code with non-optimizing compilers.

       --no-debug-info -i
              Do not output line directives. This may be useful when the generated code is stored in  a  version
              control  system (to avoid huge autogenerated diffs on small changes). This option is on by default
              for Rust, as it does not have line directives.

       --no-generation-date
              Suppress date output in the generated file.

       --no-version
              Suppress version output in the generated file.

       --no-unsafe
              Do not generate unsafe wrapper over YYPEEK (this option is  specific  to  Rust).  For  performance
              reasons  YYPEEK should avoid bounds-checking, as the lexer already performs end-of-input checks in
              a more efficient way.  The user may choose to provide a safe YYPEEK definition,  or  a  definition
              that  is  unsafe  only  in  release  builds,  in  which case the --no-unsafe option helps to avoid
              warnings about redundant unsafe blocks.

       --output -o OUTPUT
              Specify the OUTPUT file.

       --posix-captures -P
              Enable submatch extraction with POSIX-style capturing groups.

       --reusable -r
              Deprecated since version 2.2 (reusable blocks are allowed by default now).

       --skeleton -S
              Ignore user-defined interface code and generate a self-contained "skeleton" program. Additionally,
              generate  input  files  with strings derived from the regular grammar and compressed match results
              that are used to verify "skeleton" behavior on all inputs. This option is useful for finding  bugs
              in optimizations and code generation. This option is supported only for C.

       --storable-state -f
              Generate  a  lexer which can store its inner state.  This is useful in push-model lexers which are
              stopped by an outer program when there is not enough input,  and  then  resumed  when  more  input
              becomes  available.  In  this  mode  users  should  additionally  define YYGETSTATE and YYSETSTATE
              primitives, and variables yych, yyaccept and state should be part of the stored lexer state.

       --tags -T
              Enable submatch extraction with tags.

       --ucs2 --wide-chars -w
              Generate a lexer that reads UCS2-encoded input. re2c assumes that the  character  range  is  0  --
              0xFFFF and character size is 2 bytes.  This option implies --nested-ifs.

       --utf8 --utf-8 -8
              Generate a lexer that reads input in UTF-8 encoding. re2c assumes that the character range is 0 --
              0x10FFFF and character size is 1 byte.

       --utf16 --utf-16 -x
              Generate a lexer that reads UTF16-encoded input. re2c assumes that the character  range  is  0  --
              0x10FFFF and character size is 2 bytes.  This option implies --nested-ifs.

       --utf32 --unicode -u
              Generate  a  lexer  that  reads UTF32-encoded input. re2c assumes that the character range is 0 --
              0x10FFFF and character size is 4 bytes.  This option implies --nested-ifs.

       --verbose
              Output a short message in case of success.

       --vernum -V
              Show version information in MMmmpp format (major, minor, patch).

       --version -v
              Show version information.

       --single-pass -1
              Deprecated. Does nothing (single pass is the default now).

       --debug-output -d
              Emit YYDEBUG invocations in the generated code. This is useful to trace lexer execution.

       --dump-adfa
              Debug option: output DFA after tunneling (in .dot format).

       --dump-cfg
              Debug option: output control flow graph of tag variables (in .dot format).

       --dump-closure-stats
              Debug option: output statistics on the number of states in closure.

       --dump-dfa-det
              Debug option: output DFA immediately after determinization (in .dot format).

       --dump-dfa-min
              Debug option: output DFA after minimization (in .dot format).

       --dump-dfa-tagopt
              Debug option: output DFA after tag optimizations (in .dot format).

       --dump-dfa-tree
              Debug option: output DFA under construction with states represented as tag history trees (in  .dot
              format).

       --dump-dfa-raw
              Debug option: output DFA under construction with expanded state-sets (in .dot format).

       --dump-interf
              Debug option: output interference table produced by liveness analysis of tag variables.

       --dump-nfa
              Debug option: output NFA (in .dot format).

       --emit-dot -D
              Instead  of  normal output generate lexer graph in .dot format.  The output can be converted to an
              image with the help of Graphviz (e.g. something like dot -Tpng -odfa.png dfa.dot).

       --dfa-minimization <moore | table>
              Internal option: DFA minimization algorithm used by re2c. The moore option is the Moore  algorithm
              (it  is  the  default).  The table option is the "table filling" algorithm. Both algorithms should
              produce the same DFA up to states relabeling; table filling is simpler and much slower and  serves
              as a reference implementation.

       --eager-skip
              Internal  option: make the generated lexer advance the input position eagerly -- immediately after
              reading the input symbol. This changes the default behavior when the input  position  is  advanced
              lazily -- after transition to the next state.

       --no-lookahead
              Internal  option,  deprecated.   It  used  to  enable  TDFA(0)  algorithm. Unlike TDFA(1), TDFA(0)
              algorithm does not use one-symbol lookahead.  It  applies  register  operations  to  the  incoming
              transitions  rather  than  the  outgoing  ones.  Benchmarks  showed that TDFA(0) algorithm is less
              efficient than TDFA(1).

       --no-optimize-tags
              Internal option: suppress optimization of tag variables (useful for debugging).

       --posix-closure <gor1 | gtop>
              Internal option: specify shortest-path algorithm used for the construction of epsilon-closure with
              POSIX  disambiguation semantics: gor1 (the default) stands for Goldberg-Radzik algorithm, and gtop
              stands for "global topological order" algorithm.

       --posix-prectable <complex | naive>
              Internal option: specify the algorithm  used  to  compute  POSIX  precedence  table.  The  complex
              algorithm  computes  precedence  table  in  one  traversal  of  tag history tree and has quadratic
              complexity in the number of TNFA states; it is the default. The  naive  algorithm  has  worst-case
              cubic  complexity  in  the  number  of TNFA states, but it is much simpler than complex and may be
              slightly faster in non-pathological cases.

       --stadfa
              Internal option, deprecated.  It used to enable staDFA algorithm, which differs from TDFA in  that
              register operations are placed in states rather than on transitions. Benchmarks showed that staDFA
              algorithm is less efficient than TDFA.

       --fixed-tags <none | toplevel | all>
              Internal option: specify whether the fixed-tag optimization should be applied to all  tags  (all),
              none  of  them  (none),  or  only  those in toplevel concatenation (toplevel). The default is all.
              "Fixed" tags are those that are located within a fixed distance to some other tag (called "base").
              In  such  cases  only  the  base  tag  needs  to be tracked, and the value of the fixed tag can be
              computed as the value of the base tag plus a static offset. For tags that are under alternative or
              repetition  it is also necessary to check if the base tag has a no-match value (in that case fixed
              tag should also be set to no-match, disregarding the offset). For tags in top-level  concatenation
              the check is not needed, because they always match.

WARNINGS

       Warnings can be invividually enabled, disabled and turned into an error.

       -W     Turn on all warnings.

       -Werror
              Turn  warnings  into  errors.  Note  that  this option alone doesn't turn on any warnings; it only
              affects those warnings that have been turned on so far or will be turned on later.

       -W<warning>
              Turn on warning.

       -Wno-<warning>
              Turn off warning.

       -Werror-<warning>
              Turn on warning and treat it as an error (this implies -W<warning>).

       -Wno-error-<warning>
              Don't treat this particular warning as an error. This doesn't turn off the warning itself.

       -Wcondition-order
              Warn if the generated program makes implicit assumptions about condition numbering. One should use
              either  the  ---header  option or the conditions:re2c directive to generate a mapping of condition
              names to numbers and then use the autogenerated condition names.

       -Wempty-character-class
              Warn if a regular expression contains an empty character class. Trying to match an empty character
              class  makes  no  sense: it should always fail.  However, for backwards compatibility reasons re2c
              permits empty character classes and treats them as empty strings. Use the --empty-class option  to
              change the default behavior.

       -Wmatch-empty-string
              Warn  if  a rule is nullable (matches an empty string).  If the lexer runs in a loop and the empty
              match is unintentional, the lexer may unexpectedly hang in an infinite loop.

       -Wswapped-range
              Warn if the lower bound of a range is greater than its upper bound. The  default  behavior  is  to
              silently swap the range bounds.

       -Wundefined-control-flow
              Warn  if  some  input  strings  cause undefined control flow in the lexer (the faulty patterns are
              reported). This is a dangerous and common mistake. It can be easily fixed by  adding  the  default
              rule  *  which  has  the lowest priority, matches any code unit, and always consumes a single code
              unit.

       -Wunreachable-rules
              Warn about rules that are shadowed by other rules and will never match.

       -Wuseless-escape
              Warn if a symbol is escaped when it shouldn't be.  By default, re2c silently ignores such escapes,
              but this may as well indicate a typo or an error in the escape sequence.

       -Wnondeterministic-tags
              Warn if a tag has n-th degree of nondeterminism, where n is greater than 1.

       -Wsentinel-in-midrule
              Warn  if  the sentinel symbol occurs in the middle of a rule --- this may cause reads past the end
              of buffer, crashes or memory corruption in the generated lexer. This warning is only applicable if
              the  sentinel  method  of  checking  for  the  end  of  input  is  used.  It is set to an error if
              re2c:sentinel configuration is used.

BLOCKS AND DIRECTIVES

       Below is the list of re2c directives (syntactic constructs that mark the beginning and end  of  the  code
       that should be processed by re2c). Named blocks were added in re2c version 2.2. They are exactly the same
       as unnamed blocks, except that the name can be used to reference a block in other parts of  the  program.
       More information on each directive can be found in the related sections.

       /*!re2c[:<name>] ... */
              A global re2c block with an optional name. The block may contain named definitions, configurations
              and rules in any order. Named definitions and configurations are defined in the global  scope,  so
              they  are  inherited  by  subsequent blocks. The code for a global block is generated at the point
              where the block is specified.

       /*!local:re2c[:<name>] ... */
              A local re2c block with an optional name. Unlike global  blocks,  definitions  and  configurations
              inside  of  a  local block are not added into the global scope. In all other respects local blocks
              are the same as global blocks.

       /*!rules:re2c[:<name>] ... */
              A reusable block with an optional name. Rules blocks have the same structure as  local  or  global
              blocks,  but  they  do  not produce any code and they can be reused multiple times in other blocks
              with the help of a !use:<name>; directive or a /*!use:re2c[:<name>] ... */ block. A rules block on
              its  own  does  not add any definitions into the global scope. The code for it is generated at the
              point of use. Prior to re2c version 2.2 rules blocks required -r --reusable option.

       /*!use:re2c[:<name>] ... */
              A use block that references a previously defined rules block. If the name is specified, re2c looks
              for  a  rules blocks with this name. Otherwise the most recent rules block is used (either a named
              or an unnamed one). A use block can add definitions, configurations and rules of  its  own,  which
              are added to those of the referenced rules block. Prior to re2c version 2.2 use blocks required -r
              --reusable option.

       !use:<name>;
              An in-block use directive that merges a previously defined rules block  with  the  specified  name
              into  the  current  block. Named definitions, configurations and rules of the referenced block are
              added to the current ones. Conflicts between overlapping rules and configurations are resolved  in
              the usual way: the first rule takes priority, and the latest configuration overrides the preceding
              ones. One exception is the special rules *, $ and <!> for which a  block-local  definition  always
              takes  priority.  A  use  directive  can  be  placed  anywhere inside of a block, and multiple use
              directives are allowed.

       /*!max:re2c[:<name1>[:<name2>...]] ... */
              A directive that generates YYMAXFILL definition.  An optional list of block names specifies  which
              blocks  should  be  included  when computing YYMAXFILL value (if the list is empty, all blocks are
              included).  By default the generated code is a macro-definition for C (#define YYMAXFILL <n>),  or
              a  global  variable  for  Go  (var  YYMAXFILL  int  =  <n>). It can be customized with an optional
              configuration format that specifies a template string where @@{max} (or @@ for short) is  replaced
              with the numeric value of YYMAXFILL.

       /*!maxnmatch:re2c[:<name1>[:<name2>...]] ... */
              A  directive  that  generates YYMAXNMATCH definition (it requires -P --posix-captures option).  An
              optional list of block names specifies which blocks should be included when computing  YYMAXNMATCH
              value  (if  the  list  is  empty,  all  blocks  are included).  By default the generated code is a
              macro-definition for C (#define YYMAXNMATCH <n>), or a global variable for Go (var YYMAXNMATCH int
              =  <n>).  It  can  be  customized  with an optional configuration format that specifies a template
              string where @@{max} (or @@ for short) is replaced with the numeric value of YYMAXNMATCH.

       /*!stags:re2c[:<name1>[:<name2>...]] ... */, /*!mtags:re2c[:<name1>[:<name2>...]] ... */
              Directives that specify a template piece of code that is expanded for  each  s-tag/m-tag  variable
              generated by re2c.  An optional list of block names specifies which blocks should be included when
              computing the set of tag variables (if the list is empty, all blocks are included).  There are two
              optional  configurations:  format and separator.  Configuration format specifies a template string
              where @@{tag} (or @@ for short) is replaced with the name of  each  tag  variable.   Configuration
              separator  specifies  a  piece  of code used to join the generated format pieces for different tag
              variables.

       /*!getstate:re2c[:<name1>[:<name2>...]] ... */
              A directive that generates conditional dispatch on the lexer state (it  requires  --storable-state
              option).   An  optional list of block names specifies which blocks should be included in the state
              dispatch. The default transition goes to the start label of the first block on the  list.  If  the
              list  is empty, all blocks are included, and the default transition goes to the first block in the
              file that has a start label.  This directive is incompatible with  the  --loop-switch  option  and
              Rust, as it requires cross-block transitions that are unsupported without the goto statement.

       /*!conditions:re2c[:<name1>[:<name2>...]] ... */, /*!types:re2c... */
              A  directive  that generates condition enumeration (it requires --conditions option).  An optional
              list of block names specifies which blocks should be included when computing the set of conditions
              (if  the list is empty, all blocks are included).  By default the generated code is an enumeration
              YYCONDTYPE. It can be customized with optional configurations format and separator.  Configuration
              format  specifies  a template string where @@{cond} (or @@ for short) is replaced with the name of
              each condition, and @@{num} is replaced with a numeric index  of  that  condition.   Configuration
              separator  specifies  a  piece  of  code  used  to  join the generated format pieces for different
              conditions.

       /*!include:re2c <file> */
              This directive allows one to include <file>, which must be a double-quoted file path. The contents
              of the file are literally substituted in place of the directive, in the same way as #include works
              in C/C++. This directive can be used together with the --depfile option to generate  build  system
              dependencies on the included files.

       !include <file>;
              This directive is the same as /*!include:re2c <file> */, except that it should be used inside of a
              re2c block.

       /*!header:re2c:on*/
              This directive marks the start of header file.  Everything  after  it  and  up  to  the  following
              /*!header:re2c:off*/  directive is processed by re2c and written to the header file specified with
              -t --type-header option.

       /*!header:re2c:off*/
              This directive marks the end of header file started with /*!header:re2c:on*/.

       /*!ignore:re2c ... */
              A block which contents are ignored and removed from the output file.

       %{ ... %}
              A global re2c block in the --flex-support  mode.  This  is  deprecated  and  exists  for  backward
              compatibility.

API PRIMITIVES

       Here  is  a  list of API primitives that may be used by the generated code in order to interface with the
       outer program.  Which primitives are needed depends on multiple  factors,  including  the  complexity  of
       regular  expressions,  input  representation,  buffering, the use of various features and so on.  All the
       necessary primitives should be defined by the user in the form of macros, functions, variables, free-form
       pieces  of  code,  or  any  other suitable form.  re2c does not (and cannot) check the definitions, so if
       anything is missing or defined incorrectly the generated code will not compile.

       YYCTYPE
              The type of the input characters (code units).  For ASCII, EBCDIC and UTF-8 encodings it should be
              1-byte  unsigned integer.  For UTF-16 or UCS-2 it should be 2-byte unsigned integer. For UTF-32 it
              should be 4-byte unsigned integer.

       YYCURSOR
              A pointer-like l-value that  stores  the  current  input  position  (usually  a  pointer  of  type
              YYCTYPE*).  Initially  YYCURSOR  should  point to the first input character. It is advanced by the
              generated code.  When a rule matches, YYCURSOR points to  the  position  after  the  last  matched
              character. It is used only in C pointer API.

       YYLIMIT
              A pointer-like r-value that stores the end of input position (usually a pointer of type YYCTYPE*).
              Initially YYLIMIT should point to the position after the last available input character. It is not
              changed  by  the  generated  code. The lexer compares YYCURSOR to YYLIMIT in order to determine if
              there are enough input characters left.  YYLIMIT is used only in C pointer API.

       YYMARKER
              A pointer-like l-value (usually a pointer of type YYCTYPE*) that stores the position of the latest
              matched  rule. It is used to restore the YYCURSOR position if the longer match fails and the lexer
              needs to rollback. Initialization is not needed. YYMARKER is used only in C pointer API.

       YYCTXMARKER
              A pointer-like l-value that stores the position of the trailing context (usually a pointer of type
              YYCTYPE*).  No  initialization  is  needed.   It  is used only in C pointer API, and only with the
              lookahead operator /.

       YYFILL A generic API primitive with one argument len.  YYFILL should provide  at  least  len  more  input
              characters or fail.  If re2c:eof is used, then len is always 1 and  YYFILL should always return to
              the calling function; zero return value indicates success.  If re2c:eof is not used,  then  YYFILL
              return  value  is  ignored  and  it  should  not  return  on  failure. The maximum value of len is
              YYMAXFILL.  The definition of YYFILL can be either function-like or free-form depending on the API
              style (see re2c:api:style and re2c:define:YYFILL:naked).

       YYMAXFILL
              An  integral  constant  equal to the maximum value of the argument to YYFILL.  It can be generated
              with /*!max:re2c*/ directive.

       YYLESSTHAN
              A generic API primitive with one argument len.  It should be defined as an r-value of boolean type
              that equals true if and only if there are less than len input characters left.  The definition can
              be either function-like or free-form depending on the API style (see re2c:api:style).

       YYPEEK A generic API primitive with no arguments.  It should be defined as an  r-value  of  type  YYCTYPE
              that  is  equal  to  the  character  at  the  current input position. The definition can be either
              function-like or free-form depending on the API style (see re2c:api:style).

       YYSKIP A generic API primitive with no arguments.  YYSKIP should advance the current  input  position  by
              one  character. The definition can be either function-like or free-form depending on the API style
              (see re2c:api:style).

       YYBACKUP
              A generic API primitive with no arguments.  YYBACKUP should save the current input position, which
              is  later  restored  with  YYRESTORE.   The definition should be either function-like or free-form
              depending on the API style (see re2c:api:style).

       YYRESTORE
              A generic API primitive with no arguments.  YYRESTORE should restore the current input position to
              the value saved by YYBACKUP.  The definition should be either function-like or free-form depending
              on the API style (see re2c:api:style).

       YYBACKUPCTX
              A generic API primitive with zero arguments.  YYBACKUPCTX should save the current  input  position
              as  the position of the trailing context, which is later restored by YYRESTORECTX.  The definition
              should be either function-like or free-form depending on the API style (see re2c:api:style).

       YYRESTORECTX
              A generic API primitive with no arguments.   YYRESTORECTX  should  restore  the  trailing  context
              position  saved  with  YYBACKUPCTX.   The  definition  should be either function-like or free-form
              depending on the API style (see re2c:api:style).

       YYRESTORETAG
              A generic API primitive with one argument tag.  YYRESTORETAG should restore the  trailing  context
              position  to  the  value  of  tag.   The  definition  should  be either function-like or free-form
              depending on the API style (see re2c:api:style).

       YYSTAGP
              A generic API primitive with one argument tag, where tag can  be  a  pointer  or  an  offset  (see
              submatch  extraction  section for details).  YYSTAGP should set tag to the current input position.
              The definition should be either function-like  or  free-form  depending  on  the  API  style  (see
              re2c:api:style).

       YYSTAGN
              A  generic  API  primitive  with  one  argument  tag, where tag can be a pointer or an offset (see
              submatch extraction section for details).  YYSTAGN should to set tag to a  value  that  represents
              non-existent input position.  The definition should be either function-like or free-form depending
              on the API style (see re2c:api:style).

       YYMTAGP
              A generic API primitive with one argument tag.  YYMTAGP should append the current position to  the
              submatch  history of tag (see the submatch extraction section for details.)  The definition should
              be either function-like or free-form depending on the API style (see re2c:api:style).

       YYMTAGN
              A generic API primitive with one argument tag.  YYMTAGN should  append  a  value  that  represents
              non-existent  input  position position to the submatch history of tag (see the submatch extraction
              section for details.)  The definition can be either function-like or free-form  depending  on  the
              API style (see re2c:api:style).

       YYSHIFT
              A  generic API primitive with one argument shift.  YYSHIFT should shift the current input position
              by shift characters (the shift value may be negative). The definition can be either  function-like
              or free-form depending on the API style (see re2c:api:style).

       YYSHIFTSTAG
              A generic  API primitive with two arguments, tag and shift.  YYSHIFTSTAG should shift tag by shift
              characters (the shift value may be negative).  The  definition  can  be  either  function-like  or
              free-form depending on the API style (see re2c:api:style).

       YYSHIFTMTAG
              A  generic  API  primitive with two arguments, tag and shift.  YYSHIFTMTAG should shift the latest
              value in the history of tag by shift characters (the shift value may be negative).  The definition
              should be either function-like or free-form depending on the API style (see re2c:api:style).

       YYMAXNMATCH
              An  integral  constant  equal  to  the  maximal  number of POSIX capturing groups in a rule. It is
              generated with /*!maxnmatch:re2c*/ directive.

       YYCONDTYPE
              The type of the condition enum.  It should be generated either with the /*!types:re2c*/  directive
              or the -t --type-header option.

       YYGETCONDITION
              An  API primitive with zero arguments.  It should be defined as an r-value of type YYCONDTYPE that
              is equal to the current condition identifier.  The  definition  can  be  either  function-like  or
              free-form depending on the API style (see re2c:api:style and re2c:define:YYGETCONDITION:naked).

       YYSETCONDITION
              An  API  primitive  with  one  argument cond.  The meaning of YYSETCONDITION is to set the current
              condition identifier to  cond.   The  definition  should  be  either  function-like  or  free-form
              depending on the API style (see re2c:api:style and re2c:define:YYSETCONDITION@cond).

       YYGETSTATE
              An  API primitive with zero arguments.  It should be defined as an r-value of integer type that is
              equal to the current lexer state. Should be initialized  to  -1.  The  definition  can  be  either
              function-like    or   free-form   depending   on   the   API   style   (see   re2c:api:style   and
              re2c:define:YYGETSTATE:naked).

       YYSETSTATE
              An API primitive with one argument state.  The meaning of YYSETSTATE is to set the  current  lexer
              state  to  state.  The definition should be either function-like or free-form depending on the API
              style (see re2c:api:style and re2c:define:YYSETSTATE@state).

       YYDEBUG
              A debug API primitive with two arguments. It can be used to debug  the  generated  code  (with  -d
              --debug-output  option).  YYDEBUG should return no value and accept two arguments: state (either a
              DFA state index or -1) and symbol (the current input symbol).

       yych   An l-value of type YYCTYPE that stores the current input character.  User definition is  necessary
              only with -f --storable-state option.

       yyaccept
              An  l-value  of  unsigned  integral  type that stores the number of the latest matched rule.  User
              definition is necessary only with -f --storable-state option.

       yynmatch
              An l-value of unsigned integral type that stores the number  of  POSIX  capturing  groups  in  the
              matched rule.  Used only with -P --posix-captures option.

       yypmatch
              An  array  of  l-values  that  are  used  to  hold  the  tag values corresponding to the capturing
              parentheses in the matching rule. Array length must be at least yynmatch * 2 (usually  YYMAXNMATCH
              * 2 is a good choice).  Used only with -P --posix-captures option.

CONFIGURATIONS

       re2c:api, re2c:flags:input
              Same as the --api option.

       re2c:api:sigil
              Specify  the  marker  ("sigil")  that is used for argument placeholders in the API primitives. The
              default is @@. A placeholder starts with sigil followed by the argument name in curly braces.  For
              example,  if sigil is set to $, then placeholders will have the form ${name}. Single-argument APIs
              may use shorthand notation without the name in braces. This option can be  overridden  by  options
              for individual API primitives, e.g.  re2c:define:YYFILL@len for YYFILL.

       re2c:api:style
              Specify  API  style.  Possible values are functions (the default for C) and free-form (the default
              for Go and Rust).  In functions style API primitives  are  generated  with  an  argument  list  in
              parentheses following the name of the primitive. The arguments are provided only for autogenerated
              parameters (such as the number of characters passed to YYFILL), but  not  for  the  general  lexer
              context,  so  the  primitives  behave  more  like  macros in C/C++ or closures in Go and Rust.  In
              free-form style API primitives do not have a  fixed  form:  they  should  be  defined  as  strings
              containing  free-form  pieces  of code with interpolated variables of the form @@{var} or @@ (they
              correspond to arguments in  function-like  style).   This  configuration  may  be  overridden  for
              individual API primitives, see for example re2c:define:YYFILL:naked configuration for YYFILL.

       re2c:bit-vectors, re2c:flags:bit-vectors, re2c:flags:b
              Same as the --bit-vectors option, but can be configured on per-block basis.

       re2c:case-insensitive, re2c:flags:case-insensitive
              Same as the --case-insensitive option, but can be configured on per-block basis.

       re2c:case-inverted, re2c:flags:case-inverted
              Same as the --case-inverted option, but can be configured on per-block basis.

       re2c:case-ranges, re2c:flags:case-ranges
              Same as the --case-ranges option, but can be configured on per-block basis.

       re2c:computed-gotos, re2c:flags:computed-gotos, re2c:flags:g
              Same as the --computed-gotos option, but can be configured on per-block basis.

       re2c:computed-gotos:threshold, re2c:cgoto:threshold
              If  computed goto is used, this configuration specifies the complexity threshold that triggers the
              generation of jump tables instead of nested if statements and bitmaps. The default value is 9.

       re2c:cond:goto
              Specifies a piece of code used for the autogenerated shortcut rules :=> in conditions. The default
              is  goto  @@;.   The  @@  placeholder  is  substituted  with  condition  name  (see configurations
              re2c:api:sigil and re2c:cond:goto@cond).

       re2c:cond:goto@cond
              Specifies the sigil used for argument substitution in re2c:cond:goto definition. The default value
              is @@.  Overrides the more generic re2c:api:sigil configuration.

       re2c:cond:divider
              Defines     the     divider    for    condition    blocks.     The    default    value    is    /*
              *********************************** */.  Placeholders are substituted  with  condition  name  (see
              re2c:api;sigil and re2c:cond:divider@cond).

       re2c:cond:divider@cond
              Specifies the sigil used for argument substitution in re2c:cond:divider definition. The default is
              @@.  Overrides the more generic re2c:api:sigil configuration.

       re2c:cond:prefix, re2c:condprefix
              Specifies the prefix used for condition labels.  The default is yyc_.

       re2c:cond:enumprefix, re2c:condenumprefix
              Specifies the prefix used for condition identifiers.  The default is yyc.

       re2c:debug-output, re2c:flags:debug-output, re2c:flags:d
              Same as the --debug-output option, but can be configured on per-block basis.

       re2c:define:YYBACKUP
              Defines generic API primitive YYBACKUP (see the API primitives section).

       re2c:define:YYBACKUPCTX
              Defines generic API primitive YYBACKUPCTX (see the API primitives section).

       re2c:define:YYCONDTYPE
              Defines YYCONDTYPE (see the API primitives section).

       re2c:define:YYCTYPE
              Defines YYCTYPE (see the API primitives section).

       re2c:define:YYCTXMARKER
              Defines API primitive YYCTXMARKER (see the API primitives section).

       re2c:define:YYCURSOR
              Defines API primitive YYCURSOR (see the API primitives section).

       re2c:define:YYDEBUG
              Defines API primitive YYDEBUG (see the API primitives section).

       re2c:define:YYFILL
              Defines API primitive YYFILL (see the API primitives section).

       re2c:define:YYFILL@len
              Specifies the sigil  used  for  argument  substitution  in  YYFILL  definition.  Defaults  to  @@.
              Overrides the more generic re2c:api:sigil configuration.

       re2c:define:YYFILL:naked
              Overrides  the  more  generic  re2c:api:style configuration for YYFILL.  Zero value corresponds to
              free-form API style.

       re2c:define:YYGETCONDITION
              Defines API primitive YYGETCONDITION (see the API primitives section).

       re2c:define:YYGETCONDITION:naked
              Overrides the more generic re2c:api:style configuration for YYGETCONDITION. Zero value corresponds
              to free-form API style.

       re2c:define:YYGETSTATE
              Defines API primitive YYGETSTATE (see the API primitives section).

       re2c:define:YYGETSTATE:naked
              Overrides  the more generic re2c:api:style configuration for YYGETSTATE. Zero value corresponds to
              free-form API style.

       re2c:define:YYLESSTHAN
              Defines generic API primitive YYLESSTHAN (see the API primitives section).

       re2c:define:YYLIMIT
              Defines API primitive YYLIMIT (see the API primitives section).

       re2c:define:YYMARKER
              Defines API primitive YYMARKER (see the API primitives section).

       re2c:define:YYMTAGN
              Defines generic API primitive YYMTAGN (see the API primitives section).

       re2c:define:YYMTAGP
              Defines generic API primitive YYMTAGP (see the API primitives section).

       re2c:define:YYPEEK
              Defines generic API primitive YYPEEK (see the API primitives section).

       re2c:define:YYRESTORE
              Defines generic API primitive YYRESTORE (see the API primitives section).

       re2c:define:YYRESTORECTX
              Defines generic API primitive YYRESTORECTX (see the API primitives section).

       re2c:define:YYRESTORETAG
              Defines generic API primitive YYRESTORETAG (see the API primitives section).

       re2c:define:YYSETCONDITION
              Defines API primitive YYSETCONDITION (see the API primitives section).

       re2c:define:YYSETCONDITION@cond
              Specifies the sigil used for argument substitution in YYSETCONDITION definition. The default value
              is @@.  Overrides the more generic re2c:api:sigil configuration.

       re2c:define:YYSETCONDITION:naked
              Overrides the more generic re2c:api:style configuration for YYSETCONDITION. Zero value corresponds
              to free-form API style.

       re2c:define:YYSETSTATE
              Defines API primitive YYSETSTATE (see the API primitives section).

       re2c:define:YYSETSTATE@state
              Specifies the sigil used for argument substitution in YYSETSTATE definition. The default value  is
              @@.  Overrides the more generic re2c:api:sigil configuration.

       re2c:define:YYSETSTATE:naked
              Overrides  the more generic re2c:api:style configuration for YYSETSTATE. Zero value corresponds to
              free-form API style.

       re2c:define:YYSKIP
              Defines generic API primitive YYSKIP (see the API primitives section).

       re2c:define:YYSHIFT
              Defines generic API primitive YYSHIFT (see the API primitives section).

       re2c:define:YYSHIFTMTAG
              Defines generic API primitive YYSHIFTMTAG (see the API primitives section).

       re2c:define:YYSHIFTSTAG
              Defines generic API primitive YYSHIFTSTAG (see the API primitives section).

       re2c:define:YYSTAGN
              Defines generic API primitive YYSTAGN (see the API primitives section).

       re2c:define:YYSTAGP
              Defines generic API primitive YYSTAGP (see the API primitives section).

       re2c:empty-class, re2c:flags:empty-class
              Same as the --empty-class option, but can be configured on per-block basis.

       re2c:encoding:ebcdic, re2c:flags:ecb, re2c:flags:e
              Same as the --ebcdic option, but can be configured on per-block basis.

       re2c:encoding:ucs2, re2c:flags:wide-chars, re2c:flags:w
              Same as the --ucs2 option, but can be configured on per-block basis.

       re2c:encoding:utf8, re2c:flags:utf-8, re2c:flags:8
              Same as the --utf8 option, but can be configured on per-block basis.

       re2c:encoding:utf16, re2c:flags:utf-16, re2c:flags:x
              Same as the --utf16 option, but can be configured on per-block basis.

       re2c:encoding:utf32, re2c:flags:unicode, re2c:flags:u
              Same as the --utf32 option, but can be configured on per-block basis.

       re2c:encoding-policy, re2c:flags:encoding-policy
              Same as the --encoding-policy option, but can be configured on per-block basis.

       re2c:eof
              Specifies the sentinel symbol used with the end-of-input rule $. The default value is -1  ($  rule
              is  not  used).  Other  possible  values  include  all  valid code units. Only decimal numbers are
              recognized.

       re2c:header, re2c:flags:type-header, re2c:flags:t
              Specifies the name of the generated header file relative to the directory of the output file. Same
              as the --header option except that the file path is relative.

       re2c:indent:string
              Specifies  the  string  used  for  indentation. The default is a single tab character "\t". Indent
              string should contain whitespace characters only.   To  disable  indentation  entirely,  set  this
              configuration to an empty string.

       re2c:indent:top
              Specifies the minimum amount of indentation to use. The default value is zero. The value should be
              a non-negative integer number.

       re2c:invert-captures
              Same as the --invert-captures option, but can be configured on per-block basis.

       re2c:label:prefix, re2c:labelprefix
              Specifies the prefix used for DFA state labels. The default is yy.

       re2c:label:start, re2c:startlabel
              Controls the generation of a block start label. The default value is zero, which  means  that  the
              start  label  is  generated  only  if  it  is  used. An integer value greater than zero forces the
              generation of start label even if it is unused by the lexer. A  string  value  also  forces  start
              label  generation and sets the label name to the specified string. This configuration applies only
              to the current block (it is reset to default for the next block).

       re2c:label:yyFillLabel
              Specifies the prefix of YYFILL labels used with re2c:eof and in storable state mode.

       re2c:label:yyloop
              Specifies the name of the label marking the start of the lexer loop with --loop-switch option. The
              default is yyloop.

       re2c:label:yyNext
              Specifies  the  name  of  the optional label that follows YYGETSTATE switch in storable state mode
              (enabled with re2c:state:nextlabel). The default is yyNext.

       re2c:leftmost-captures
              Same as the --leftmost-captures option, but can be configured on per-block basis.

       re2c:lookahead, re2c:flags:lookahead
              Deprecated (see the deprecated --no-lookahead option).

       re2c:nested-ifs, re2c:flags:nested-ifs, re2c:flags:s
              Same as the --nested-ifs option, but can be configured on per-block basis.

       re2c:posix-captures, re2c:flags:posix-captures, re2c:flags:P
              Same as the --posix-captures option, but can be configured on per-block basis.

       re2c:tags, re2c:flags:tags, re2c:flags:T
              Same as the --tags option, but can be configured on per-block basis.

       re2c:tags:expression
              Specifies the expression used for tag variables.  By default re2c  generates  expressions  of  the
              form  yyt<N>.  This might be inconvenient, for example if tag variables are defined as fields in a
              struct. All occurrences of @@{tag} or @@ are replaced with  the  actual  tag  name.  For  example,
              re2c:tags:expression  = "s.@@"; results in expressions of the form s.yyt<N> in the generated code.
              See also re2c:api:sigil configuration.

       re2c:tags:prefix
              Specifies the prefix for tag variable names. The default is yyt.

       re2c:sentinel
              Specifies the sentinel symbol used for the end-of-input checks (when bounds  checks  are  disabled
              with  re2c:yyfill:enable  =  0;  and re2c:eof is not set). This configuration does not affect code
              generation: its purpose is to verify that the sentinel is not allowed in the middle of a rule, and
              ensure  that  the  lexer won't read past the end of buffer. The default value is -1` (in that case
              re2c assumes that the sentinel is zero, which is the most common case). Only decimal  numbers  are
              recognized.

       re2c:state:abort
              If  set  to a positive integer value, changes the default case in YYGETSTATE switch: by default it
              aborts the program, and an explicit -1 case contains transition to the start of the block.

       re2c:state:nextlabel
              Controls if the YYGETSTATE switch is followed by an yyNext label (the default value is zero, which
              corresponds to no label).  Alternatively one can use re2c:label:start to generate a specific start
              label, or an explicit getstate:re2c directive to generate the YYGETSTATE  switch  separately  from
              the lexer block.

       re2c:unsafe, re2c:flags:unsafe
              Same  as  the  --no-unsafe  option,  but can be configured on per-block basis.  If set to zero, it
              suppresses the generation of unsafe wrappers around YYPEEK. The default is non-zero (wrappers  are
              generated).  This configuration is specific to Rust.

       re2c:variable:yyaccept
              Specifies the name of the yyaccept variable (see the API primitives section).

       re2c:variable:yybm
              Specifies the name of the yybm variable (used for bitmaps).

       re2c:variable:yybm:hex, re2c:yybm:hex
              If  set  to nonzero, bitmaps for the --bit-vectors option are generated in hexadecimal format. The
              default is zero (bitmaps are in decimal format).

       re2c:variable:yych
              Specifies the name of the yych variable (see the API primitives section).

       re2c:variable:yych:emit, re2c:yych:emit
              If set to zero, yych definition is not generated.  The default is non-zero.

       re2c:variable:yych:conversion, re2c:yych:conversion
              If set to non-zero, re2c automatically generates a conversion to YYCTYPE every time yych is  read.
              The default is to zero (no conversion).

       re2c:variable:yyctable
              Specifies  the  name  of the yyctable variable (the jump table generated for YYGETCONDITION switch
              with --computed-gotos option).

       re2c:variable:yytarget
              Specifies the name of the yytarget variable.

       re2c:variable:yystable
              Deprecated.

       re2c:variable:yystate
              Specifies the name of the yystate variable (used  with  the  --loop-switch  option  to  store  the
              current DFA state).

       re2c:yyfill:check
              If  set  to zero, suppresses the generation of pre-YYFILL check for the number of input characters
              (the YYLESSTHAN definition in generic API and the YYLIMIT-based comparison in C pointer API).  The
              default is non-zero (generate the check).

       re2c:yyfill:enable
              If set to zero, suppresses the generation of YYFILL (together with the check). This should be used
              when the whole input fits into one piece of memory (there  is  no  need  for  buffering)  and  the
              end-of-input  checks do not rely on the YYFILL checks (e.g. if a sentinel character is used).  Use
              warnings (-W option) and re2c:sentinel configuration to verify that  the  generated  lexer  cannot
              read past the end of input.  The default is non-zero (YYFILL is enabled).

       re2c:yyfill:parameter
              If  set  to  zero,  suppresses the generation of parameter passed to YYFILL.  The parameter is the
              minimum number of characters that must be  supplied.   Defaults  to  non-zero  (the  parameter  is
              generated).  This configuration can be overridden with re2c:define:YYFILL:naked or re2c:api:style.

REGULAR EXPRESSIONS

       re2c uses the following syntax for regular expressions:

       • "foo" case-sensitive string literal

       • 'foo' case-insensitive string literal

       • [a-xyz], [^a-xyz] character class (possibly negated)

       • . any character except newline

       • R \ S difference of character classes R and SR* zero or more occurrences of RR+ one or more occurrences of RR? optional RR{n} repetition of R exactly n times

       • R{n,} repetition of R at least n times

       • R{n,m} repetition of R from n to m times

       • (R)  just  R; parentheses are used to override precedence.  If submatch extraction is enabled, (R) is a
         capturing or a non-capturing group depending on --invert-captures option.

       • (!R) If submatch extraction is enabled, (!R) is a non-capturing  or  a  capturing  group  depending  on
         --invert-captures option.

       • R S concatenation: R followed by SR | S alternative: R or SR / S lookahead: R followed by S, but S is not consumed

       • name the regular expression defined as name (or literal string "name" in Flex compatibility mode)

       • {name} the regular expression defined as name in Flex compatibility mode

       • @stag an s-tag: saves the last input position at which @stag matches in a variable named stag#mtag an m-tag: saves all input positions at which #mtag matches in a variable named mtag

       Character classes and string literals may contain the following escape sequences: \a, \b, \f, \n, \r, \t,
       \v, \\, octal escapes \ooo and hexadecimal escapes \xhh, \uhhhh and \Uhhhhhhhh.

HANDLING THE END OF INPUT

       One of the main problems for the lexer is to know when to stop.  There are a few terminating conditions:

       • the lexer may match some rule (including default rule *) and come to a final state

       • the lexer may fail to match any rule and come to a default state

       • the lexer may reach the end of input

       The first two conditions terminate the lexer in a "natural" way: it comes to a  state  with  no  outgoing
       transitions,  and  the  matching automatically stops. The third condition, end of input, is different: it
       may happen in any state, and the lexer should be able to  handle  it.  Checking  for  the  end  of  input
       interrupts the normal lexer workflow and adds conditional branches to the generated program, therefore it
       is necessary to minimize the number of such checks. re2c supports a few different  methods  for  handling
       the  end  of  input.  Which  one  to  use  depends on the complexity of regular expressions, the need for
       buffering, performance considerations and other factors. Here is a list of methods:

       • Sentinel.  This method eliminates the need for the end of input checks altogether.  It  is  simple  and
         efficient, but limited to the case when there is a natural "sentinel" character that can never occur in
         valid input. This character may still occur in invalid input, but it  should  not  be  allowed  by  the
         regular  expressions,  except  perhaps as the last character of a rule. The sentinel is appended at the
         end of input and serves as a stop signal: when the lexer reads this character, it is  either  a  syntax
         error  or  the  end  of  input.  In  both cases the lexer should stop. This method is used if YYFILL is
         disabled with re2c:yyfill:enable = 0; and re2c:eof has the default value -1.

       • Sentinel with bounds checks.  This method is generic:  it  allows  one  to  handle  any  input  without
         restrictions  on  the  regular  expressions. The idea is to reduce the number of end of input checks by
         performing them only on certain characters. Similar to the "sentinel" method, one of the characters  is
         chosen  as a "sentinel" and appended at the end of input. However, there is no restriction on where the
         sentinel may occur (in fact, any character can be chosen for a sentinel).  When the  lexer  reads  this
         character,  it  additionally  performs  a  bounds check.  If the current position is within bounds, the
         lexer resumes matching and handles the sentinel as a regular character.  Otherwise  it  invokes  YYFILL
         (unless  it  is  disabled).  If  more  input is supplied, the lexer will rematch the last character and
         continue as if the sentinel wasn't there. Otherwise it must be the real end of  input,  and  the  lexer
         stops.  This method is used when re2c:eof has non-negative value (it should be set to the numeric value
         of the sentinel). YYFILL is optional.

       • Bounds checks with padding.  This method is generic, and it may  be  faster  than  the  "sentinel  with
         bounds  checks"  method, but it is also more complex. The idea is to partition DFA states into strongly
         connected components (SCCs) and generate a single check per SCC for  enough  characters  to  cover  the
         longest  non-looping  path  in this SCC. This reduces the number of checks, but there is a problem with
         short lexemes at the end of input, as the check requires enough characters to cover the longest lexeme.
         This  can  be  fixed  by  padding  the input with a few fake characters that do not form a valid lexeme
         suffix (so that the lexer cannot match them). The length of padding should be YYMAXFILL, generated with
         /*!max:re2c*/.  If there is not enough input, the lexer invokes YYFILL which should supply at least the
         required number of characters or not return.  This method is used if YYFILL is enabled and re2c:eof  is
         -1 (this is the default configuration).

       • Custom  checks.   Generic  API  allows one to override basic operations like reading a character, which
         makes it possible to include the end-of-input checks as part of them.  This approach is error-prone and
         should be used with caution. To use a custom method, enable generic API with --api custom or re2c:api =
         custom; and disable default bounds checks with re2c:yyfill:enable = 0; or re2c:yyfill:check = 0;.

       The following subsections contain an example of each method.

   Sentinel
       This example uses a sentinel character to handle the end of input.  The  program  counts  space-separated
       words  in  a null-terminated string. The sentinel is null: it is the last character of each input string,
       and it is not allowed in the middle of a lexeme by any of the rules (in particular, it is not included in
       character  ranges  where  it  is  easy  to overlook). If a null occurs in the middle of a string, it is a
       syntax error and the lexer will match default rule *, but it won't read past the end of  input  or  crash
       (use  -Wsentinel-in-midrule  warning  and  re2c:sentinel  configuration  to  verify  this). Configuration
       re2c:yyfill:enable = 0; suppresses the generation of bounds checks and YYFILL invocations.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>

          // Expect a null-terminated string.
          static int lex(const char *YYCURSOR) {
              int count = 0;

              for (;;) {
              /*!re2c
                  re2c:define:YYCTYPE = char;
                  re2c:yyfill:enable = 0;

                  *      { return -1; }
                  [\x00] { return count; }
                  [a-z]+ { ++count; continue; }
                  [ ]+   { continue; }
              */
              }
          }

          int main() {
              assert(lex("") == 0);
              assert(lex("one two three") == 3);
              assert(lex("f0ur") == -1);
              return 0;
          }

   Sentinel with bounds checks
       This example uses sentinel with bounds checks to handle the end  of  input  (this  method  was  added  in
       version  1.2).  The program counts space-separated single-quoted strings. The sentinel character is null,
       which is specified with re2c:eof = 0; configuration.  As  in  the  sentinel  method,  null  is  the  last
       character  of  each  input string, but it is allowed in the middle of a rule (for example, 'aaa\0aa'\0 is
       valid input, but 'aaa\0 is a syntax error).  Bounds checks are generated in each state  that  matches  an
       input  character,  but  they  are  scoped  to the branch that handles null. Bounds checks are of the form
       YYLIMIT <= YYCURSOR or YYLESSTHAN(1) with generic API. If the check condition is true, lexer has  reached
       the  end of input and should stop (YYFILL is disabled with re2c:yyfill:enable = 0; as the input fits into
       one buffer, see the YYFILL with sentinel section for an example that uses YYFILL). Reaching  the  end  of
       input opens three possibilities: if the lexer is in the initial state it will match the end-of-input rule
       $, otherwise it may fallback to a previously matched rule (including default rule *) or go to  a  default
       state, causing -Wundefined-control-flow.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>

          // Expect a null-terminated string.
          static int lex(const char *str, unsigned int len) {
              const char *YYCURSOR = str, *YYLIMIT = str + len, *YYMARKER;
              int count = 0;

              for (;;) {
              /*!re2c
                  re2c:define:YYCTYPE = char;
                  re2c:yyfill:enable = 0;
                  re2c:eof = 0;

                  str = ['] ([^'\\] | [\\][^])* ['];

                  *    { return -1; }
                  $    { return count; }
                  str  { ++count; continue; }
                  [ ]+ { continue; }
              */
              }
          }

          #define TEST(s, r) assert(lex(s, sizeof(s) - 1) == r)
          int main() {
              TEST("", 0);
              TEST("'qu\0tes' 'are' 'fine: \\'' ", 3);
              TEST("'unterminated\\'", -1);
              return 0;
          }

   Bounds checks with padding
       This  example  uses  bounds  checks  with  padding  to handle the end of input (this method is enabled by
       default). The program counts space-separated single-quoted strings. There is a padding of YYMAXFILL  null
       characters appended at the end of input, where YYMAXFILL value is autogenerated with /*!max:re2c*/. It is
       not necessary to use null for padding --- any characters can be used as long as they do not form a  valid
       lexeme  suffix  (in  this example padding should not contain single quotes, as they may be mistaken for a
       suffix of a single-quoted string). There is a "stop" rule that matches the first padding character (null)
       and  terminates  the lexer (note that it checks if null is at the beginning of padding, otherwise it is a
       syntax error). Bounds checks are generated only in some  states  that  are  determined  by  the  strongly
       connected  components  of  the  underlying  automaton.  Checks  have the form (YYLIMIT - YYCURSOR) < n or
       YYLESSTHAN(n) with generic API, where n is the minimum number of characters that are needed for the lexer
       to  proceed  (it  also means that the next bounds check will occur in at most n characters). If the check
       condition is true, the lexer has reached the end of input and will invoke YYFILL(n)  that  should  either
       supply  at least n input characters or not return. In this example YYFILL always fails and terminates the
       lexer with an error (which is fine because the input fits into one buffer). See the YYFILL  with  padding
       section for an example that refills the input buffer with YYFILL.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <stdlib.h>
          #include <string.h>

          /*!max:re2c*/

          static int lex(const char *str, unsigned int len) {
              // Make a copy of the string with YYMAXFILL zeroes at the end.
              char *buf = (char*) malloc(len + YYMAXFILL);
              memcpy(buf, str, len);
              memset(buf + len, 0, YYMAXFILL);

              const char *YYCURSOR = buf, *YYLIMIT = buf + len + YYMAXFILL;
              int count = 0;

          loop:
              /*!re2c
                  re2c:api:style = free-form;
                  re2c:define:YYCTYPE = char;
                  re2c:define:YYFILL  = "goto fail;";

                  str = ['] ([^'\\] | [\\][^])* ['];

                  [\x00] {
                      // Check that it is the sentinel, not some unexpected null.
                      if (YYCURSOR - 1 == buf + len) goto exit; else goto fail;
                  }
                  str  { ++count; goto loop; }
                  [ ]+ { goto loop; }
                  *    { goto fail; }
              */

          fail:
              count = -1;

          exit:
              free(buf);
              return count;
          }

          #define TEST(s, r) assert(lex(s, sizeof(s) - 1) == r)
          int main() {
              TEST("", 0);
              TEST("'qu\0tes' 'are' 'fine: \\'' ", 3);
              TEST("'unterminated\\'", -1);
              TEST("'unexpected \0 null\\'", -1);
              return 0;
          }

   Custom checks
       This  example  uses  a  custom  end-of-input  handling  method  based on generic API.  The program counts
       space-separated single-quoted strings. It is the same as the sentinel with bounds checks example,  except
       that  the  input  is not null-terminated (this method can be used if padding is not an option, not even a
       single character). To cover up for the absence of sentinel character at  the  end  of  input,  YYPEEK  is
       redefined to perform a bounds check before it reads the next input character. This is inefficient because
       checks are done very often. If the check condition fails, YYPEEK returns the real character, otherwise it
       returns a fake sentinel character.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <stdlib.h>
          #include <string.h>

          static int lex(const char *str, unsigned int len) {
              // For the sake of example create a string without terminating null.
              char *buf = (char*) malloc(len);
              memcpy(buf, str, len);

              const char *cur = buf, *lim = buf + len, *mar;
              int count = 0;

              for (;;) {
              /*!re2c
                  re2c:yyfill:enable = 0;
                  re2c:eof = 0;
                  re2c:api = custom;
                  re2c:api:style = free-form;
                  re2c:define:YYCTYPE = char;
                  re2c:define:YYLESSTHAN = "cur >= lim";
                  re2c:define:YYPEEK = "cur < lim ? *cur : 0";  // fake null
                  re2c:define:YYSKIP = "++cur;";
                  re2c:define:YYBACKUP = "mar = cur;";
                  re2c:define:YYRESTORE = "cur = mar;";

                  str = ['] ([^'\\] | [\\][^])* ['];

                  *    { count = -1; break; }
                  $    { break;; }
                  str  { ++count; continue; }
                  [ ]+ { continue; }
              */
              }

              free(buf);
              return count;
          }

          #define TEST(s, r) assert(lex(s, sizeof(s) - 1) == r)
          int main() {
              TEST("", 0);
              TEST("'qu\0tes' 'are' 'fine: \\'' ", 3);
              TEST("'unterminated\\'", -1);
              return 0;
          }

BUFFER REFILLING

       The  need  for  buffering  arises when the input cannot be mapped in memory all at once: either it is too
       large, or it comes in a streaming fashion (like reading from a socket). The usual technique in such cases
       is to allocate a fixed-sized memory buffer and process input in chunks that fit into the buffer. When the
       current chunk is processed, it is moved out and new data is moved in. In practice  it  is  somewhat  more
       complex,  because  lexer  state  consists  not  of  a  single  input  position, but a set of interrelated
       positions:

       • cursor: the next input character to be read (YYCURSOR in C pointer API or YYSKIP/YYPEEK in generic API)

       • limit: the position after the last available input character (YYLIMIT  in  C  pointer  API,  implicitly
         handled by YYLESSTHAN in generic API)

       • marker: the position of the most recent match, if any (YYMARKER in default API or YYBACKUP/YYRESTORE in
         generic API)

       • token: the start of the current lexeme (implicit in re2c API, as it is not needed for the normal  lexer
         operation and can be defined and updated by the user)

       • context   marker:   the   position   of   the  trailing  context  (YYCTXMARKER  in  C  pointer  API  or
         YYBACKUPCTX/YYRESTORECTX in generic API)

       • tag variables: submatch positions (defined with  /*!stags:re2c*/  and  /*!mtags:re2c*/  directives  and
         YYSTAGP/YYSTAGN/YYMTAGP/YYMTAGN in generic API)

       Not  all  these are used in every case, but if used, they must be updated by YYFILL. All active positions
       are contained in the segment between token and cursor, therefore  everything  between  buffer  start  and
       token  can  be  discarded,  the  segment  from  token and up to limit should be moved to the beginning of
       buffer, and the free space at the end of buffer should be filled  with  new  data.   In  order  to  avoid
       frequent  YYFILL  calls  it  is  best  to fill in as many input characters as possible (even though fewer
       characters might suffice to resume  the  lexer).  The  details  of  YYFILL  implementation  are  slightly
       different  depending  on which EOF handling method is used: the case of EOF rule is somewhat simpler than
       the case of bounds-checking with padding. Also note that if -f --storable-state option  is  used,  YYFILL
       has slightly different semantics (described in the section about storable state).

   YYFILL with sentinel
       If  EOF  rule  is used, YYFILL is a function-like primitive that accepts no arguments and returns a value
       which is checked against zero. YYFILL invocation is triggered by  condition  YYLIMIT  <=  YYCURSOR  in  C
       pointer  API  and  YYLESSTHAN()  in  generic API. A non-zero return value means that YYFILL has failed. A
       successful YYFILL call must supply at least one character and adjust input positions  accordingly.  Limit
       must  always  be  set  to  one  after  the  last input position in buffer, and the character at the limit
       position must be the sentinel symbol specified by re2c:eof configuration. The  pictures  below  show  the
       relative  locations  of input positions in buffer before and after YYFILL call (sentinel symbol is marked
       with #, and the second picture shows the case when there is not enough input to fill the whole buffer).

                         <-- shift -->
                       >-A------------B---------C-------------D#-----------E->
                       buffer       token    marker         limit,
                                                            cursor
          >-A------------B---------C-------------D------------E#->
                       buffer,  marker        cursor        limit
                       token

                         <-- shift -->
                       >-A------------B---------C-------------D#--E (EOF)
                       buffer       token    marker         limit,
                                                            cursor
          >-A------------B---------C-------------D---E#........
                       buffer,  marker       cursor limit
                       token

       Here is an example of a program that reads input file input.txt in chunks of  4096  bytes  and  uses  EOF
       rule.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <stdio.h>
          #include <string.h>

          #define BUFSIZE 4095

          struct Input {
              FILE *file;
              char buf[BUFSIZE + 1], *lim, *cur, *mar, *tok; // +1 for sentinel
              bool eof;
          };

          static int fill(Input &in) {
              if (in.eof) return 1;

              const size_t shift = in.tok - in.buf;
              const size_t used = in.lim - in.tok;

              // Error: lexeme too long. In real life could reallocate a larger buffer.
              if (shift < 1) return 2;

              // Shift buffer contents (discard everything up to the current token).
              memmove(in.buf, in.tok, used);
              in.lim -= shift;
              in.cur -= shift;
              in.mar -= shift;
              in.tok -= shift;

              // Fill free space at the end of buffer with new data from file.
              in.lim += fread(in.lim, 1, BUFSIZE - used, in.file);
              in.lim[0] = 0;
              in.eof = in.lim < in.buf + BUFSIZE;
              return 0;
          }

          static int lex(Input &in) {
              int count = 0;
              for (;;) {
                  in.tok = in.cur;
              /*!re2c
                  re2c:api:style = free-form;
                  re2c:define:YYCTYPE  = char;
                  re2c:define:YYCURSOR = in.cur;
                  re2c:define:YYMARKER = in.mar;
                  re2c:define:YYLIMIT  = in.lim;
                  re2c:define:YYFILL   = "fill(in) == 0";
                  re2c:eof = 0;

                  str = ['] ([^'\\] | [\\][^])* ['];

                  *    { return -1; }
                  $    { return count; }
                  str  { ++count; continue; }
                  [ ]+ { continue; }
              */
              }
          }

          int main() {
              const char *fname = "input";
              const char content[] = "'qu\0tes' 'are' 'fine: \\'' ";

              // Prepare input file: a few times the size of the buffer, containing
              // strings with zeroes and escaped quotes.
              FILE *f = fopen(fname, "w");
              for (int i = 0; i < BUFSIZE; ++i) {
                  fwrite(content, 1, sizeof(content) - 1, f);
              }
              fclose(f);
              int count = 3 * BUFSIZE; // number of quoted strings written to file

              // Initialize lexer state: all pointers are at the end of buffer.
              Input in;
              in.file = fopen(fname, "r");
              in.cur = in.mar = in.tok = in.lim = in.buf + BUFSIZE;
              in.eof = 0;
              // Sentinel (at YYLIMIT pointer) is set to zero, which triggers YYFILL.
              in.lim[0] = 0;

              // Run the lexer.
              assert(lex(in) == count);

              // Cleanup: remove input file.
              fclose(in.file);
              remove(fname);
              return 0;
          }

   YYFILL with padding
       In the default case (when EOF rule is not used) YYFILL is a function-like primitive that accepts a single
       argument and does not return any value.  YYFILL invocation is triggered by condition (YYLIMIT - YYCURSOR)
       <  n  in  C  pointer  API  and YYLESSTHAN(n) in generic API. The argument passed to YYFILL is the minimal
       number of characters that must be supplied. If it fails to do so, YYFILL must not  return  to  the  lexer
       (for  that  reason  it is best implemented as a macro that returns from the calling function on failure).
       In case of a successful YYFILL invocation the limit position must be set either to  one  after  the  last
       input  position  in  buffer,  or to the end of YYMAXFILL padding (in case YYFILL has successfully read at
       least n characters, but not enough to fill the entire buffer).  The  pictures  below  show  the  relative
       locations  of  input  positions  in  buffer  before and after YYFILL invocation (YYMAXFILL padding on the
       second picture is marked with # symbols).

                         <-- shift -->                 <-- need -->
                       >-A------------B---------C-----D-------E---F--------G->
                       buffer       token    marker cursor  limit

          >-A------------B---------C-----D-------E---F--------G->
                       buffer,  marker cursor               limit
                       token

                         <-- shift -->                 <-- need -->
                       >-A------------B---------C-----D-------E-F        (EOF)
                       buffer       token    marker cursor  limit

          >-A------------B---------C-----D-------E-F###############
                       buffer,  marker cursor                   limit
                       token                        <- YYMAXFILL ->

       Here is an example of a program that reads input  file  input.txt  in  chunks  of  4096  bytes  and  uses
       bounds-checking with padding.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <stdio.h>
          #include <string.h>

          /*!max:re2c*/
          #define BUFSIZE (4096 - YYMAXFILL)

          struct Input {
              FILE *file;
              char buf[BUFSIZE + YYMAXFILL], *lim, *cur, *tok;
              bool eof;
          };

          static int fill(Input &in, size_t need) {
              if (in.eof) return 1;

              const size_t shift = in.tok - in.buf;
              const size_t used = in.lim - in.tok;

              // Error: lexeme too long. In real life could reallocate a larger buffer.
              if (shift < need) return 2;

              // Shift buffer contents (discard everything up to the current token).
              memmove(in.buf, in.tok, used);
              in.lim -= shift;
              in.cur -= shift;
              in.tok -= shift;

              // Fill free space at the end of buffer with new data from file.
              in.lim += fread(in.lim, 1, BUFSIZE - used, in.file);

              // If read less than expected, this is end of input => add zero padding
              // so that the lexer can access characters at the end of buffer.
              if (in.lim < in.buf + BUFSIZE) {
                  in.eof = true;
                  memset(in.lim, 0, YYMAXFILL);
                  in.lim += YYMAXFILL;
              }

              return 0;
          }

          static int lex(Input &in) {
              int count = 0;
              for (;;) {
                  in.tok = in.cur;
              /*!re2c
                  re2c:api:style = free-form;
                  re2c:define:YYCTYPE  = char;
                  re2c:define:YYCURSOR = in.cur;
                  re2c:define:YYLIMIT  = in.lim;
                  re2c:define:YYFILL   = "if (fill(in, @@) != 0) return -1;";

                  str = ['] ([^'\\] | [\\][^])* ['];

                  [\x00] {
                      // Check that it is the sentinel, not some unexpected null.
                      return in.tok == in.lim - YYMAXFILL ? count : -1;
                  }
                  str  { ++count; continue; }
                  [ ]+ { continue; }
                  *    { return -1; }
              */
              }
          }

          int main() {
              const char *fname = "input";
              const char content[] = "'qu\0tes' 'are' 'fine: \\'' ";

              // Prepare input file: a few times the size of the buffer, containing
              // strings with zeroes and escaped quotes.
              FILE *f = fopen(fname, "w");
              for (int i = 0; i < BUFSIZE; ++i) {
                  fwrite(content, 1, sizeof(content) - 1, f);
              }
              fclose(f);
              int count = 3 * BUFSIZE; // number of quoted strings written to file

              // Initialize lexer state: all pointers are at the end of buffer.
              // This immediately triggers YYFILL, as the check `in.cur < in.lim` fails.
              Input in;
              in.file = fopen(fname, "r");
              in.cur = in.tok = in.lim = in.buf + BUFSIZE;
              in.eof = 0;

              // Run the lexer.
              assert(lex(in) == count);

              // Cleanup: remove input file.
              fclose(in.file);
              remove(fname);
              return 0;
          }

MULTIPLE BLOCKS

       Sometimes  it  is  necessary  to have multiple interrelated lexers (for example, if there is a high-level
       state machine that transitions between lexer modes). This can be  implemented  using  multiple  connected
       re2c blocks. Another option is to use start conditions.

       The  implementation of connections between blocks depends on the target language.  In languages that have
       goto statement (such as C/C++ and Go) one can have all blocks in one function, each of them prefixed with
       a label. Transition from one block to another is a simple goto.  In languages that do not have goto (such
       as Rust) it is necessary to use a loop with a  switch  on  a  state  variable,  similar  to  the  yystate
       loop/switch generated by re2c, or else wrap each block in a function and use function calls.

       The example below uses multiple blocks to parse binary, octal, decimal and hexadecimal numbers. Each base
       has  its  own  block.  The  initial  block  determines  base  and  dispatches  to  other  blocks.  Common
       configurations are defined in a separate block at the beginning of the program; they are inherited by the
       other blocks.

          // re2c $INPUT -o $OUTPUT -i
          #include <stdint.h>
          #include <limits.h>
          #include <assert.h>

          static const uint64_t ERROR = UINT64_MAX;

          template<int BASE> static void add(uint64_t &u, char d) {
              u = u * BASE + d;
              if (u > UINT32_MAX) u = ERROR;
          }

          static uint64_t parse_u32(const char *s) {
              const char *YYCURSOR = s, *YYMARKER;
              uint64_t u = 0;

              /*!re2c
                  re2c:yyfill:enable = 0;
                  re2c:define:YYCTYPE = char;

                  end = "\x00";

                  '0b' / [01]        { goto bin; }
                  "0"                { goto oct; }
                  "" / [1-9]         { goto dec; }
                  '0x' / [0-9a-fA-F] { goto hex; }
                  *                  { return ERROR; }
              */
          bin:
              /*!re2c
                  end   { return u; }
                  [01]  { add<2>(u, YYCURSOR[-1] - '0'); goto bin; }
                  *     { return ERROR; }
              */
          oct:
              /*!re2c
                  end   { return u; }
                  [0-7] { add<8>(u, YYCURSOR[-1] - '0'); goto oct; }
                  *     { return ERROR; }
              */
          dec:
              /*!re2c
                  end   { return u; }
                  [0-9] { add<10>(u, YYCURSOR[-1] - '0'); goto dec; }
                  *     { return ERROR; }
              */
          hex:
              /*!re2c
                  end   { return u; }
                  [0-9] { add<16>(u, YYCURSOR[-1] - '0');      goto hex; }
                  [a-f] { add<16>(u, YYCURSOR[-1] - 'a' + 10); goto hex; }
                  [A-F] { add<16>(u, YYCURSOR[-1] - 'A' + 10); goto hex; }
                  *     { return ERROR; }
              */
          }

          int main() {
              assert(parse_u32("") == ERROR);
              assert(parse_u32("1234567890") == 1234567890);
              assert(parse_u32("0b1101") == 13);
              assert(parse_u32("0x7Fe") == 2046);
              assert(parse_u32("0644") == 420);
              assert(parse_u32("9999999999") == ERROR);
              return 0;
          }

START CONDITIONS

       Start conditions are enabled with --start-conditions option.  They  provide  a  way  to  encode  multiple
       interrelated automata within the same re2c block.

       Each condition corresponds to a single automaton and has a unique name specified by the user and a unique
       internal number defined by re2c. The numbers are used to switch between conditions:  the  generated  code
       uses  YYGETCONDITION  and  YYSETCONDITION  primitives to get the current condition or set it to the given
       number. Use  /*!conditions:re2c*/  directive  or  the  --header  option  to  generate  numeric  condition
       identifiers. Configuration re2c:cond:enumprefix specifies the generated identifier prefix.

       In  condition  mode  every  rule must be prefixed with a list of comma-separated condition names in angle
       brackets, or a wildcard <*> to denote all conditions. The rule syntax is extended as follows:

          < cond-list > regexp action
                 A rule that is merged to every condition on the cond-list.  It matches regexp and executes  the
                 associated action.

          < cond-list > regexp => cond action
                 A rule that is merged to every condition on the cond-list.  It matches regexp, sets the current
                 condition to cond and executes the associated action.

          < cond-list > regexp :=> cond
                 A rule that is merged to every condition on the cond-list.  It matches regexp  and  immediately
                 transitions to cond (there is no semantic action).

          <! cond-list > action
                 The  action is prepended to semantic actions of all rules for every condition on the cond-list.
                 This may be used to deduplicate common code.

          < > action
                 A rule that is merged to a special entry condition with number zero and name  "0".  It  matches
                 empty string and executes the action.

          < > => cond action
                 A  rule  that  is merged to a special entry condition with number zero and name "0". It matches
                 empty string, sets the current condition to cond and executes the action.

          < > :=> cond
                 A rule that is merged to a special entry condition with number zero and name  "0".  It  matches
                 empty string and immediately transitions to cond.

       The  code  re2c  generates for conditions depends on whether re2c uses goto/label approach or loop/switch
       approach to encode the automata.

       In languages that have goto statement (such as C/C++ and Go)  conditions  are  naturally  implemented  as
       blocks  of code prefixed with labels of the form yyc_<cond>, where cond is a condition name (label prefix
       can be changed with re2c:cond:prefix). Transitions between conditions  are  implemented  using  goto  and
       condition  labels. Before all conditions re2c generates an initial switch on YYGETSTATE that jumps to the
       start state of the current condition.  The shortcut rules :=> bypass the initial switch and jump directly
       to  the  specified  condition (re2c:cond:goto can be used to change the default behavior). The rules with
       semantic actions do not automatically jump to the next condition; this should be done by the user-defined
       action code.

       In  languages  that  do  not have goto (such as Rust) re2c reuses the yystate variable to store condition
       numbers. Each condition gets a numeric identifier equal to the number of its start state,  and  a  switch
       between  conditions  is  no different than a switch between DFA states of a single condition. There is no
       need for a separate initial condition switch.  (Since the same approach is  used  to  implement  storable
       states, YYGETCONDITION/YYSETCONDITION are redundant if both storable states and conditions are used).

       The program below uses start conditions to parse binary, octal, decimal and hexadecimal numbers. There is
       a single block where each base has its own condition, and the initial condition is connected  to  all  of
       them.  User-defined variable cond stores the current condition number; it is initialized to the number of
       the initial condition generated with /*!conditions:re2c*/.

          // re2c $INPUT -o $OUTPUT -ci
          #include <stdint.h>
          #include <limits.h>
          #include <assert.h>

          static const uint64_t ERROR = UINT64_MAX;
          /*!conditions:re2c*/

          template<int BASE> static void add(uint64_t &u, char d) {
              u = u * BASE + d;
              if (u > UINT32_MAX) u = ERROR;
          }

          static uint64_t parse_u32(const char *s) {
              const char *YYCURSOR = s, *YYMARKER;
              int c = yycinit;
              uint64_t u = 0;

              /*!re2c
                  re2c:api:style = free-form;
                  re2c:define:YYCTYPE        = char;
                  re2c:define:YYGETCONDITION = "c";
                  re2c:define:YYSETCONDITION = "c = @@;";
                  re2c:yyfill:enable = 0;

                  <*> * { return ERROR; }

                  <init> '0b' / [01]        :=> bin
                  <init> "0"                :=> oct
                  <init> "" / [1-9]         :=> dec
                  <init> '0x' / [0-9a-fA-F] :=> hex

                  <bin, oct, dec, hex> "\x00" { return u; }

                  <bin> [01]  { add<2>(u,  YYCURSOR[-1] - '0');      goto yyc_bin; }
                  <oct> [0-7] { add<8>(u,  YYCURSOR[-1] - '0');      goto yyc_oct; }
                  <dec> [0-9] { add<10>(u, YYCURSOR[-1] - '0');      goto yyc_dec; }
                  <hex> [0-9] { add<16>(u, YYCURSOR[-1] - '0');      goto yyc_hex; }
                  <hex> [a-f] { add<16>(u, YYCURSOR[-1] - 'a' + 10); goto yyc_hex; }
                  <hex> [A-F] { add<16>(u, YYCURSOR[-1] - 'A' + 10); goto yyc_hex; }
              */
          }

          int main() {
              assert(parse_u32("") == ERROR);
              assert(parse_u32("1234567890") == 1234567890);
              assert(parse_u32("0b1101") == 13);
              assert(parse_u32("0x7Fe") == 2046);
              assert(parse_u32("0644") == 420);
              assert(parse_u32("9999999999") == ERROR);
              return 0;
          }

STORABLE STATE

       With --storable-state option re2c generates a lexer that can store  its  current  state,  return  to  the
       caller, and later resume operations exactly where it left off. The default mode of operation in re2c is a
       "pull" model, in which the lexer "pulls" more input whenever it needs it. This  may  be  unacceptable  in
       cases  when  the  input  becomes  available  piece  by piece (for example, if the lexer is invoked by the
       parser, or if the lexer program communicates via a socket protocol with some other program that must wait
       for  a  reply  from  the  lexer before it transmits the next message). Storable state feature is intended
       exactly for such cases: it allows one to generate lexers that work in a  "push"  model.  When  the  lexer
       needs  more  input,  it  stores  its  state  and  returns  to  the caller. Later, when more input becomes
       available, the caller resumes the lexer exactly where it stopped.  There  are  a  few  changes  necessary
       compared to the "pull" model:

       • Define YYSETSTATE() and YYGETSTATE(state) primitives.

       • Define  yych,  yyaccept  (if  used)  and state variables as a part of persistent lexer state. The state
         variable should be initialized to -1.

       • YYFILL should return to the outer program instead of trying to supply more input.  Return  code  should
         indicate that lexer needs more input.

       • The outer program should recognize situations when lexer needs more input and respond appropriately.

       • Optionally  use  getstate:re2c  to  generate  YYGETSTATE switch detached from the main lexer. This only
         works for languages that have goto (not in --loop-switch mode).

       • Use re2c:eof and the sentinel with bounds checks method to  handle  the  end  of  input.  Padding-based
         method  may  not work because it is unclear when to append padding: the current end of input may not be
         the ultimate end of input, and appending padding too early may cut off a partially read greedy  lexeme.
         Furthermore,  due to high-level program logic getting more input may depend on processing the lexeme at
         the end of buffer (which already is blocked due to the end-of-input condition).

       Here is an example of a "push" model lexer that simulates reading packets from a socket. The lexer  loops
       until  it  encounters the end of input and returns to the calling function. The calling function provides
       more input by "sending" the next packet and resumes lexing. This process stops when all the packets  have
       been sent, or when there is an error.

          // re2c $INPUT -o $OUTPUT -f
          #include <assert.h>
          #include <stdio.h>
          #include <string.h>

          #define DEBUG 0
          #define LOG(...) if (DEBUG) fprintf(stderr, __VA_ARGS__);

          // Use a small buffer to cover the case when a lexeme doesn't fit.
          // In real world use a larger buffer.
          #define BUFSIZE 10

          struct State {
              FILE *file;
              char buf[BUFSIZE + 1], *lim, *cur, *mar, *tok;
              int state;
          };

          typedef enum {END, READY, WAITING, BAD_PACKET, BIG_PACKET} Status;

          static Status fill(State &st) {
              const size_t shift = st.tok - st.buf;
              const size_t used = st.lim - st.tok;
              const size_t free = BUFSIZE - used;

              // Error: no space. In real life can reallocate a larger buffer.
              if (free < 1) return BIG_PACKET;

              // Shift buffer contents (discard already processed data).
              memmove(st.buf, st.tok, used);
              st.lim -= shift;
              st.cur -= shift;
              st.mar -= shift;
              st.tok -= shift;

              // Fill free space at the end of buffer with new data.
              const size_t read = fread(st.lim, 1, free, st.file);
              st.lim += read;
              st.lim[0] = 0; // append sentinel symbol

              return READY;
          }

          static Status lex(State &st, unsigned int *recv) {
              char yych;
              /*!getstate:re2c*/

              for (;;) {
                  st.tok = st.cur;
              /*!re2c
                  re2c:api:style = free-form;
                  re2c:define:YYCTYPE    = "char";
                  re2c:define:YYCURSOR   = "st.cur";
                  re2c:define:YYMARKER   = "st.mar";
                  re2c:define:YYLIMIT    = "st.lim";
                  re2c:define:YYGETSTATE = "st.state";
                  re2c:define:YYSETSTATE = "st.state = @@;";
                  re2c:define:YYFILL     = "return WAITING;";
                  re2c:eof = 0;

                  packet = [a-z]+[;];

                  *      { return BAD_PACKET; }
                  $      { return END; }
                  packet { *recv = *recv + 1; continue; }
              */
              }
          }

          void test(const char **packets, Status expect) {
              // Create a "socket" (open the same file for reading and writing).
              const char *fname = "pipe";
              FILE *fw = fopen(fname, "w");
              FILE *fr = fopen(fname, "r");
              setvbuf(fw, NULL, _IONBF, 0);
              setvbuf(fr, NULL, _IONBF, 0);

              // Initialize lexer state: `state` value is -1, all pointers are at the end
              // of buffer.
              State st;
              st.file = fr;
              st.cur = st.mar = st.tok = st.lim = st.buf + BUFSIZE;
              // Sentinel (at YYLIMIT pointer) is set to zero, which triggers YYFILL.
              st.lim[0] = 0;
              st.state = -1;

              // Main loop. The buffer contains incomplete data which appears packet by
              // packet. When the lexer needs more input it saves its internal state and
              // returns to the caller which should provide more input and resume lexing.
              Status status;
              unsigned int send = 0, recv = 0;
              for (;;) {
                  status = lex(st, &recv);
                  if (status == END) {
                      LOG("done: got %u packets\n", recv);
                      break;
                  } else if (status == WAITING) {
                      LOG("waiting...\n");
                      if (*packets) {
                          LOG("sent packet %u\n", send);
                          fprintf(fw, "%s", *packets++);
                          ++send;
                      }
                      status = fill(st);
                      LOG("queue: '%s'\n", st.buf);
                      if (status == BIG_PACKET) {
                          LOG("error: packet too big\n");
                          break;
                      }
                      assert(status == READY);
                  } else {
                      assert(status == BAD_PACKET);
                      LOG("error: ill-formed packet\n");
                      break;
                  }
              }

              // Check results.
              assert(status == expect);
              if (status == END) assert(recv == send);

              // Cleanup: remove input file.
              fclose(fw);
              fclose(fr);
              remove(fname);
          }

          int main() {
              const char *packets1[] = {0};
              const char *packets2[] = {"zero;", "one;", "two;", "three;", "four;", 0};
              const char *packets3[] = {"zer0;", 0};
              const char *packets4[] = {"looooooooooong;", 0};

              test(packets1, END);
              test(packets2, END);
              test(packets3, BAD_PACKET);
              test(packets4, BIG_PACKET);

              return 0;
          }

REUSABLE BLOCKS

       Reusable  blocks  are  re2c  blocks  that  can be reused any number of times and combined with other re2c
       blocks. They are defined with /*!rules:re2c[:<name>] ... */ (the <name> is optional). A rules  block  can
       be  used  in two contexts: either in a use block, or in a use directive inside of another block. The code
       for a rules block is generated at every point of use.

       Use blocks are defined with /*!use:re2c[:<name>] ... */. The <name> is optional; if  not  specified,  the
       associated  rules  block  is  the  most  recent one (whether named or unnamed). A use block can add named
       definitions, configurations and rules of its own.  An important use case for use blocks is a  lexer  that
       supports  multiple  input encodings: the same rules block is reused multiple times with encoding-specific
       configurations (see the example below).

       In-block use directive !use:<name>; can be used from inside of a re2c block.  It  merges  the  referenced
       block  <name>  into  the  current  one.  If  some of the merged rules and configurations overlap with the
       previously defined ones, conflicts are resolved in the usual way: the earliest rule takes  priority,  and
       latest configuration overrides preceding ones. One exception are the special rules *, $ and (in condition
       mode) <!>, for which a block-local definition overrides any inherited ones. Use directive allows  one  to
       combine different re2c blocks together in one block (see the example below).

       Named  blocks  and  in-block  use  directive were added in re2c version 2.2.  Since that version reusable
       blocks are allowed by default (no special option is needed). Before version 2.2 reuse  mode  was  enabled
       with -r --reusable option. Before version 1.2 reusable blocks could not be mixed with normal blocks.

   Example of a !use directive
          // re2c $INPUT -o $OUTPUT
          #include <assert.h>

          // This example shows how to combine reusable re2c blocks: two blocks
          // ('colors' and 'fish') are merged into one. The 'salmon' rule occurs
          // in both blocks; the 'fish' block takes priority because it is used
          // earlier. Default rule * occurs in all three blocks; the local (not
          // inherited) definition takes priority.

          enum What { COLOR, FISH, DUNNO };

          /*!rules:re2c:colors
              *                            { assert(false); }
              "red" | "salmon" | "magenta" { return COLOR; }
          */

          /*!rules:re2c:fish
              *                            { assert(false); }
              "haddock" | "salmon" | "eel" { return FISH; }
          */

          static What lex(const char *s) {
              const char *YYCURSOR = s, *YYMARKER;
              /*!re2c
                  re2c:yyfill:enable = 0;
                  re2c:define:YYCTYPE = char;

                  !use:fish;
                  !use:colors;
                  * { return DUNNO; }  // overrides inherited '*' rules
              */
          }

          int main() {
              assert(lex("salmon") == FISH);
              assert(lex("what?") == DUNNO);
              return 0;
          }

   Example of a /*!use:re2c ... */ block
          // re2c $INPUT -o $OUTPUT --input-encoding utf8
          #include <assert.h>
          #include <stdint.h>

          // This example supports multiple input encodings: UTF-8 and UTF-32.
          // Both lexers are generated from the same rules block, and the use
          // blocks add only encoding-specific configurations.
          /*!rules:re2c
              re2c:yyfill:enable = 0;

              "∀x ∃y" { return 0; }
              *       { return 1; }
          */

          static int lex_utf8(const uint8_t *s) {
              const uint8_t *YYCURSOR = s, *YYMARKER;
              /*!use:re2c
                  re2c:define:YYCTYPE = uint8_t;
                  re2c:encoding:utf8 = 1;
              */
          }

          static int lex_utf32(const uint32_t *s) {
              const uint32_t *YYCURSOR = s, *YYMARKER;
              /*!use:re2c
                  re2c:define:YYCTYPE = uint32_t;
                  re2c:encoding:utf32 = 1;
              */
          }

          int main() {
              static const uint8_t s8[] = // UTF-8
                  { 0xe2, 0x88, 0x80, 0x78, 0x20, 0xe2, 0x88, 0x83, 0x79 };

              static const uint32_t s32[] = // UTF32
                  { 0x00002200, 0x00000078, 0x00000020, 0x00002203, 0x00000079 };

              assert(lex_utf8(s8) == 0);
              assert(lex_utf32(s32) == 0);
              return 0;
          }

SUBMATCH EXTRACTION

       re2c has two options for submatch extraction.

       The  first option is -T --tags. With this option one can use standalone tags of the form @stag and #mtag,
       where stag and mtag are arbitrary used-defined names. Tags can be  used  anywhere  inside  of  a  regular
       expression;  semantically  they are just position markers. Tags of the form @stag are called s-tags: they
       denote a single submatch value (the last input position where this tag matched). Tags of the  form  #mtag
       are  called  m-tags: they denote multiple submatch values (the whole history of repetitions of this tag).
       All tags should be defined by the user as variables with the corresponding names.  With  standalone  tags
       re2c  uses  leftmost  greedy  disambiguation: submatch positions correspond to the leftmost matching path
       through the regular expression.

       The second option is -P --posix-captures: it enables  POSIX-compliant  capturing  groups.  In  this  mode
       parentheses  in  regular  expressions  denote  the  beginning  and the end of capturing groups; the whole
       regular expression is group number zero. The number of groups for  the  matching  rule  is  stored  in  a
       variable  yynmatch,  and submatch results are stored in yypmatch array. Both yynmatch and yypmatch should
       be defined by the user, and yypmatch size must be at least [yynmatch *  2].  re2c  provides  a  directive
       /*!maxnmatch:re2c*/  that  defines  YYMAXNMATCH: a constant  equal to the maximal value of yynmatch among
       all rules. Note that re2c implements POSIX-compliant disambiguation: each subexpression matches  as  long
       as  possible,  and  subexpressions  that  start  earlier  in  regular expression have priority over those
       starting later. Capturing groups are translated into s-tags under the hood, therefore  we  use  the  word
       "tag" to describe them as well.

       With  both  -P  --posix-captures  and  T --tags options re2c uses efficient submatch extraction algorithm
       described in the Tagged Deterministic Finite Automata with Lookahead  paper.  The  overhead  on  submatch
       extraction  in  the  generated  lexer  grows  with the number of tags --- if this number is moderate, the
       overhead is barely noticeable. In the lexer  tags  are  implemented  using  a  number  of  tag  variables
       generated  by  re2c.  There  is  no  one-to-one  correspondence  between tag variables and tags: a single
       variable may be reused for different tags, and one tag may require multiple variables  to  hold  all  its
       ambiguous  values. Eventually ambiguity is resolved, and only one final variable per tag survives. When a
       rule matches, all its tags are set to the values of the corresponding tag variables.  The exact number of
       tag variables is unknown to the user; this number is determined by re2c. However, tag variables should be
       defined by the user as a part of  the  lexer  state  and  updated  by  YYFILL,  therefore  re2c  provides
       directives /*!stags:re2c*/ and /*!mtags:re2c*/ that can be used to declare, initialize and manipulate tag
       variables. These directives have two optional configurations: format  =  "@@";  (specifies  the  template
       where  @@ is substituted with the name of each tag variable), and separator = ""; (specifies the piece of
       code used to join the generated pieces for different tag variables).

       S-tags support the following operations:

       • save input position to an s-tag: t = YYCURSOR with C pointer API or a user-defined operation YYSTAGP(t)
         with generic API

       • save default value to an s-tag: t = NULL with C pointer API or a user-defined operation YYSTAGN(t) with
         generic API

       • copy one s-tag to another: t1 = t2

       M-tags support the following operations:

       • append input position to an m-tag: a user-defined operation YYMTAGP(t) with both  default  and  generic
         API

       • append default value to an m-tag: a user-defined operation YYMTAGN(t) with both default and generic API

       • copy one m-tag to another: t1 = t2

       S-tags  can  be  implemented  as  scalar  values  (pointers  or  offsets).  M-tags  need  a  more complex
       representation, as they need to  store  a  sequence  of  tag  values.  The  most  naive  and  inefficient
       representation of an m-tag is a list (array, vector) of tag values; a more efficient representation is to
       store all m-tags in a prefix-tree represented as array of nodes (v, p), where v is tag value and p  is  a
       pointer to parent node.

       Here  is  a  simple  example  of  using  s-tags  to  parse  semantic versions consisting of three numeric
       components: major, minor, patch (the latter is optional).  See below for a more complex example that uses
       YYFILL.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <stddef.h>

          struct SemVer { int major, minor, patch; };

          static int s2n(const char *s, const char *e) { // pre-parsed string to number
              int n = 0;
              for (; s < e; ++s) n = n * 10 + (*s - '0');
              return n;
          }

          static bool lex(const char *str, SemVer &ver) {
              const char *YYCURSOR = str, *YYMARKER;

              // User-defined tag variables that are available in semantic action.
              const char *t1, *t2, *t3, *t4, *t5;

              // Autogenerated tag variables used by the lexer to track tag values.
              /*!stags:re2c format = 'const char *@@;\n'; */

              /*!re2c
                  re2c:yyfill:enable = 0;
                  re2c:define:YYCTYPE = char;
                  re2c:tags = 1;

                  num = [0-9]+;

                  @t1 num @t2 "." @t3 num @t4 ("." @t5 num)? [\x00] {
                      ver.major = s2n(t1, t2);
                      ver.minor = s2n(t3, t4);
                      ver.patch = t5 != NULL ? s2n(t5, YYCURSOR - 1) : 0;
                      return true;
                  }
                  * { return false; }
              */
          }

          int main() {
              SemVer v;
              assert(lex("23.34", v) && v.major == 23 && v.minor == 34 && v.patch == 0);
              assert(lex("1.2.999", v) && v.major == 1 && v.minor == 2 && v.patch == 999);
              assert(!lex("1.a", v));
              return 0;
          }

       Here  is  a  more  complex  example  of  using  s-tags with YYFILL to parse a file with newline-separated
       semantic versions. Tag variables are part of the lexer state, and they are adjusted in YYFILL like  other
       input  positions.   Note  that  it  is  necessary  for  s-tags because their values are invalidated after
       shifting buffer contents. It may not be necessary in a custom implementation where  tag  variables  store
       offsets  relative  to  the  start  of the input string rather than the buffer, which may be the case with
       m-tags.

          // re2c $INPUT -o $OUTPUT --tags
          #include <assert.h>
          #include <stddef.h>
          #include <stdio.h>
          #include <string.h>
          #include <vector>

          #define BUFSIZE 4095

          struct Input {
              FILE *file;
              char buf[BUFSIZE + 1], *lim, *cur, *mar, *tok;
              // Tag variables must be part of the lexer state passed to YYFILL.
              // They don't correspond to tags and should be autogenerated by re2c.
              /*!stags:re2c format = 'const char *@@;'; */
              bool eof;
          };

          struct SemVer { int major, minor, patch; };

          static bool operator==(const SemVer &x, const SemVer &y) {
              return x.major == y.major && x.minor == y.minor && x.patch == y.patch;
          }

          static int s2n(const char *s, const char *e) { // pre-parsed string to number
              int n = 0;
              for (; s < e; ++s) n = n * 10 + (*s - '0');
              return n;
          }

          static int fill(Input &in) {
              if (in.eof) return 1;

              const size_t shift = in.tok - in.buf;
              const size_t used = in.lim - in.tok;

              // Error: lexeme too long. In real life could reallocate a larger buffer.
              if (shift < 1) return 2;

              // Shift buffer contents (discard everything up to the current token).
              memmove(in.buf, in.tok, used);
              in.lim -= shift;
              in.cur -= shift;
              in.mar -= shift;
              in.tok -= shift;
              // Tag variables need to be shifted like other input positions. The check
              // for non-NULL is only needed if some tags are nested inside of alternative
              // or repetition, so that they can have NULL value.
              /*!stags:re2c format = "if (in.@@) in.@@ -= shift;\n"; */

              // Fill free space at the end of buffer with new data from file.
              in.lim += fread(in.lim, 1, BUFSIZE - used, in.file);
              in.lim[0] = 0;
              in.eof = in.lim < in.buf + BUFSIZE;
              return 0;
          }

          static bool lex(Input &in, std::vector<SemVer> &vers) {
              // User-defined local variables that store final tag values.
              // They are different from tag variables autogenerated with `stags:re2c`,
              // as they are set at the end of match and used only in semantic actions.
              const char *t1, *t2, *t3, *t4;
              for (;;) {
                  in.tok = in.cur;
              /*!re2c
                  re2c:eof = 0;
                  re2c:api:style = free-form;
                  re2c:define:YYCTYPE  = char;
                  re2c:define:YYCURSOR = in.cur;
                  re2c:define:YYMARKER = in.mar;
                  re2c:define:YYLIMIT  = in.lim;
                  re2c:define:YYFILL   = "fill(in) == 0";
                  re2c:tags:expression = "in.@@";

                  num = [0-9]+;

                  num @t1 "." @t2 num @t3 ("." @t4 num)? [\n] {
                      int major = s2n(in.tok, t1);
                      int minor = s2n(t2, t3);
                      int patch = t4 != NULL ? s2n(t4, in.cur - 1) : 0;
                      SemVer ver = {major, minor, patch};
                      vers.push_back(ver);
                      continue;
                  }
                  $ { return true; }
                  * { return false; }
              */}
          }

          int main() {
              const char *fname = "input";
              const SemVer semver = {1, 22, 333};
              std::vector<SemVer> expect(BUFSIZE, semver), actual;

              // Prepare input file (make sure it exceeds buffer size).
              FILE *f = fopen(fname, "w");
              for (int i = 0; i < BUFSIZE; ++i) fprintf(f, "1.22.333\n");
              fclose(f);

              // Reopen input file for reading.
              f = fopen(fname, "r");

              // Initialize lexer state: all pointers are at the end of buffer.
              Input in;
              in.file = f;
              in.cur = in.mar = in.tok = in.lim = in.buf + BUFSIZE;
              /*!stags:re2c format = "in.@@ = in.lim;\n"; */
              in.eof = false;
              // Sentinel (at YYLIMIT pointer) is set to zero, which triggers YYFILL.
              *in.lim = 0;

              // Run the lexer and check results.
              assert(lex(in, actual) && expect == actual);

              // Cleanup: remove input file.
              fclose(f);
              remove(fname);
              return 0;
          }

       Here is an example of using POSIX capturing groups to parse semantic versions.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <stddef.h>

          // Maximum number of capturing groups among all rules.
          /*!maxnmatch:re2c*/

          struct SemVer { int major, minor, patch; };

          static int s2n(const char *s, const char *e) { // pre-parsed string to number
              int n = 0;
              for (; s < e; ++s) n = n * 10 + (*s - '0');
              return n;
          }

          static bool lex(const char *str, SemVer &ver) {
              const char *YYCURSOR = str, *YYMARKER;

              // Allocate memory for capturing parentheses (twice the number of groups).
              const char *yypmatch[YYMAXNMATCH * 2];
              size_t yynmatch;

              // Autogenerated tag variables used by the lexer to track tag values.
              /*!stags:re2c format = 'const char *@@;\n'; */

              /*!re2c
                  re2c:yyfill:enable = 0;
                  re2c:define:YYCTYPE = char;
                  re2c:posix-captures = 1;

                  num = [0-9]+;

                  (num) "." (num) ("." num)? [\x00] {
                      // `yynmatch` is the number of capturing groups
                      assert(yynmatch == 4);
                      // Even `yypmatch` values are for opening parentheses, odd values
                      // are for closing parentheses, the first group is the whole match.
                      ver.major = s2n(yypmatch[2], yypmatch[3]);
                      ver.minor = s2n(yypmatch[4], yypmatch[5]);
                      ver.patch = yypmatch[6] ? s2n(yypmatch[6] + 1, yypmatch[7]) : 0;
                      return true;
                  }
                  * { return false; }
              */
          }

          int main() {
              SemVer v;
              assert(lex("23.34", v) && v.major == 23 && v.minor == 34 && v.patch == 0);
              assert(lex("1.2.999", v) && v.major == 1 && v.minor == 2 && v.patch == 999);
              assert(!lex("1.a", v));
              return 0;
          }

       Here is an example of using m-tags to parse a version with a variable number of components. Tag variables
       are stored in a trie.

          // re2c $INPUT -o $OUTPUT
          #include <assert.h>
          #include <stddef.h>
          #include <vector>

          static const int MTAG_ROOT = -1;

          // An m-tag tree is a way to store histories with an O(1) copy operation.
          // Histories naturally form a tree, as they have common start and fork at some
          // point. The tree is stored as an array of pairs (tag value, link to parent).
          // An m-tag is represented with a single link in the tree (array index).
          struct Mtag {
              const char *elem; // tag value
              int pred; // index of the predecessor node or root
          };
          typedef std::vector<Mtag> MtagTrie;

          typedef std::vector<int> Ver; // unbounded number of version components

          static int s2n(const char *s, const char *e) { // pre-parsed string to number
              int n = 0;
              for (; s < e; ++s) n = n * 10 + (*s - '0');
              return n;
          }

          // Append a single value to an m-tag history.
          static void add_mtag(MtagTrie &trie, int &mtag, const char *value) {
              Mtag m = {value, mtag};
              mtag = (int)trie.size();
              trie.push_back(m);
          }

          // Recursively unwind tag histories and collect version components.
          static void unfold(const MtagTrie &trie, int x, int y, Ver &ver) {
              // Reached the root of the m-tag tree, stop recursion.
              if (x == MTAG_ROOT && y == MTAG_ROOT) return;

              // Unwind history further.
              unfold(trie, trie[x].pred, trie[y].pred, ver);

              // Get tag values. Tag histories must have equal length.
              assert(x != MTAG_ROOT && y != MTAG_ROOT);
              const char *ex = trie[x].elem, *ey = trie[y].elem;

              if (ex != NULL && ey != NULL) {
                  // Both tags are valid pointers, extract component.
                  ver.push_back(s2n(ex, ey));
              } else {
                  // Both tags are NULL (this corresponds to zero repetitions).
                  assert(ex == NULL && ey == NULL);
              }
          }

          static bool parse(const char *str, Ver &ver) {
              const char *YYCURSOR = str, *YYMARKER;
              MtagTrie mt;

              // User-defined tag variables that are available in semantic action.
              const char *t1, *t2;
              int t3, t4;

              // Autogenerated tag variables used by the lexer to track tag values.
              /*!stags:re2c format = 'const char *@@ = NULL;'; */
              /*!mtags:re2c format = 'int @@ = MTAG_ROOT;'; */

              /*!re2c
                  re2c:api:style = free-form;
                  re2c:define:YYCTYPE = char;
                  re2c:define:YYSTAGP = "@@ = YYCURSOR;";
                  re2c:define:YYSTAGN = "@@ = NULL;";
                  re2c:define:YYMTAGP = "add_mtag(mt, @@, YYCURSOR);";
                  re2c:define:YYMTAGN = "add_mtag(mt, @@, NULL);";
                  re2c:yyfill:enable = 0;
                  re2c:tags = 1;

                  num = [0-9]+;

                  @t1 num @t2 ("." #t3 num #t4)* [\x00] {
                      ver.clear();
                      ver.push_back(s2n(t1, t2));
                      unfold(mt, t3, t4, ver);
                      return true;
                  }
                  * { return false; }
              */
          }

          int main() {
              Ver v;
              assert(parse("1", v) && v == Ver({1}));
              assert(parse("1.2.3.4.5.6.7", v) && v == Ver({1, 2, 3, 4, 5, 6, 7}));
              assert(!parse("1.2.", v));
              return 0;
          }

ENCODING SUPPORT

       It  is  necessary  to  understand  the  difference  between code points and code units. A code point is a
       numeric identifier of a symbol. A code unit is the smallest unit of storage in the encoded text. A single
       code point may be represented with one or more code units. In a fixed-length encoding all code points are
       represented with the same number of code  units.  In  a  variable-length  encoding  code  points  may  be
       represented  with a different number of code units.  Note that the "any" rule [^] matches any code point,
       but not necessarily any code unit (the only way to match any code unit regardless of the encoding is  the
       default  rule  *).   The  generated lexer works with a stream of code units: yych stores a code unit, and
       YYCTYPE is the code unit type. Regular expressions, on the other hand, are specified  in  terms  of  code
       points.  When re2c compiles regular expressions to automata it translates code points to code units. This
       is generally not a simple mapping: in variable-length  encodings  a  single  code  point  range  may  get
       translated to a complex code unit graph.  The following encodings are supported:

       • ASCII  (enabled  by  default).  It  is  a fixed-length encoding with code space [0-255] and 1-byte code
         points and code units.

       • EBCDIC (enabled with --ebcdic or re2c:encoding:ebcdic). It is a fixed-length encoding with  code  space
         [0-255] and 1-byte code points and code units.

       • UCS2  (enabled  with  --ucs2  or  re2c:encoding:ucs2).  It  is  a fixed-length encoding with code space
         [0-0xFFFF] and 2-byte code points and code units.

       • UTF8 (enabled with --utf8 or re2c:encoding:utf8). It is a variable-length Unicode encoding.  Code  unit
         size is 1 byte. Code points are represented with 1 -- 4 code units.

       • UTF16  (enabled  with  --utf16  or re2c:encoding:utf16). It is a variable-length Unicode encoding. Code
         unit size is 2 bytes. Code points are represented with 1 -- 2 code units.

       • UTF32 (enabled with --utf32 or re2c:encoding:utf32). It is a fixed-length Unicode  encoding  with  code
         space [0-0x10FFFF] and 4-byte code points and code units.

       Include file include/unicode_categories.re provides re2c definitions for the standard Unicode categories.

       Option  --input-encoding  specifies source file encoding, which can be used to enable Unicode literals in
       regular expressions. For example --input-encoding utf8 tells re2c that the source file  is  in  UTF8  (it
       differs  from  --utf8  which  sets  input text encoding). Option --encoding-policy specifies the way re2c
       handles Unicode surrogates (code points in range [0xD800-0xDFFF]).

       Below is an example of a lexer for UTF8 encoded Unicode identifiers.

          // re2c $INPUT -o $OUTPUT -8 --case-ranges -i
          #include <assert.h>
          #include <stdint.h>

          /*!include:re2c "unicode_categories.re" */

          static int lex(const char *s) {
              const char *YYCURSOR = s, *YYMARKER;
              /*!re2c
                  re2c:define:YYCTYPE = 'unsigned char';
                  re2c:yyfill:enable = 0;

                  // Simplified "Unicode Identifier and Pattern Syntax"
                  // (see https://unicode.org/reports/tr31)
                  id_start    = L | Nl | [$_];
                  id_continue = id_start | Mn | Mc | Nd | Pc | [\u200D\u05F3];
                  identifier  = id_start id_continue*;

                  identifier { return 0; }
                  *          { return 1; }
              */
          }

          int main() {
              assert(lex("_Ыдентификатор") == 0);
              return 0;
          }

INCLUDE FILES

       re2c allows one to include other files using directive /*!include:re2c FILE */ or !include FILE ;,  where
       FILE is a path to the file to be included.  The first form should be used outside of re2c blocks, and the
       second form allows one to include a file in the middle of a re2c block. re2c looks for included files  in
       the  directory  of  the  including  file and in include locations, which can be specified with -I option.
       Include directives in re2c work in the same way as C/C++ #include: the contents of FILE  are  copy-pasted
       verbatim  in  place of the directive. Include files may have further includes of their own. Use --depfile
       option to track build dependencies of the output file on include files.  re2c  provides  some  predefined
       include  files  that  can  be  found  in  the  include/  subdirectory of the project. These files contain
       definitions that can be useful to other projects (such as Unicode categories) and form something  like  a
       standard library for re2c.  Below is an example of using include directive.

   Include file 1 (definitions.h)
          typedef enum { OK, FAIL } Result;

          /*!re2c
              number = [1-9][0-9]*;
          */

   Include file 2 (extra_rules.re.inc)
          // floating-point numbers
          frac  = [0-9]* "." [0-9]+ | [0-9]+ ".";
          exp   = 'e' [+-]? [0-9]+;
          float = frac exp? | [0-9]+ exp;

          float { return OK; }

   Input file
          // re2c $INPUT -o $OUTPUT -i
          #include <assert.h>
          /*!include:re2c "definitions.h" */

          Result lex(const char *s) {
              const char *YYCURSOR = s, *YYMARKER;
              /*!re2c
                  re2c:define:YYCTYPE = char;
                  re2c:yyfill:enable = 0;

                  *      { return FAIL; }
                  number { return OK; }
                  !include "extra_rules.re.inc";
              */
          }

          int main() {
              assert(lex("123") == OK);
              assert(lex("123.4567") == OK);
              return 0;
          }

HEADER FILES

       re2c  allows  one  to  generate  header  file  from  the input .re file using option -t, --type-header or
       configuration re2c:flags:type-header and directives  /*!header:re2c:on*/  and  /*!header:re2c:off*/.  The
       first  directive  marks  the  beginning  of  header  file,  and the second directive marks the end of it.
       Everything between these directives is processed by re2c, and the generated code is written to  the  file
       specified  by  the  -t --type-header option (or stdout if this option was not used). Autogenerated header
       file may be needed in cases when re2c is used to generate definitions of constants, variables and structs
       that must be visible from other translation units.

       Here  is  an  example  of  generating  a header file that contains definition of the lexer state with tag
       variables (the number variables depends on the regular grammar and is unknown to the programmer).

   Input file
          // re2c $INPUT -o $OUTPUT -i --header lexer/state.h
          #include <assert.h>
          #include <stddef.h>
          #include "lexer/state.h" // the header is generated by re2c

          /*!header:re2c:on*/
          struct LexerState {
              const char *str, *cur;
              /*!stags:re2c format = "const char *@@;"; */
          };
          /*!header:re2c:off*/

          long lex(LexerState& st) {
              const char *t;
              /*!re2c
                  re2c:header = "lexer/state.h";
                  re2c:yyfill:enable = 0;
                  re2c:define:YYCTYPE = char;
                  re2c:define:YYCURSOR = "st.cur";
                  re2c:tags = 1;
                  re2c:tags:expression = "st.@@";

                  [a]* @t [b]* { return t - st.str; }
              */
          }

          int main() {
              const char *s = "ab";
              LexerState st = { s, s /*!stags:re2c format = ", NULL"; */ };
              assert(lex(st) == 1);
              return 0;
          }

   Header file
          /* Generated by re2c */

          typedef struct {
              const char *str, *cur, *mar;
              const char *yyt1;
          } LexerState;

SKELETON PROGRAMS

       With the -S, --skeleton option, re2c ignores all non-re2c code and generates a self-contained  C  program
       that  can  be  further compiled and executed. The program consists of lexer code and input data. For each
       constructed DFA (block or condition) re2c generates a standalone lexer and two files: an .input file with
       strings derived from the DFA and a .keys file with expected match results. The program runs each lexer on
       the corresponding .input file and compares results with the expectations.   Skeleton  programs  are  very
       useful for a number of reasons:

       • They  can  check correctness of various re2c optimizations (the data is generated early in the process,
         before any DFA transformations have taken place).

       • Generating a set of input data with good coverage may be useful for both testing and benchmarking.

       • Generating self-contained executable programs allows one to get minimized test cases (the original code
         may be large or have a lot of dependencies).

       The  difficulty  with  generating  input  data  is  that for all but the most trivial cases the number of
       possible input strings is too large (even if the string length is limited). re2c solves  this  difficulty
       by  generating  sufficiently  many  strings  to  cover  almost all DFA transitions. It uses the following
       algorithm. First, it constructs a skeleton of the DFA. For encodings with 1-byte code unit size (such  as
       ASCII, UTF-8 and EBCDIC) skeleton is just an exact copy of the original DFA. For encodings with multibyte
       code units skeleton is a copy of DFA with certain transitions omitted: namely, re2c  takes  at  most  256
       code  units  for  each disjoint continuous range that corresponds to a DFA transition.  The chosen values
       are evenly distributed and include range bounds. Instead of trying to cover all  possible  paths  in  the
       skeleton  (which is infeasible) re2c generates sufficiently many paths to cover all skeleton transitions,
       and thus trigger the corresponding conditional jumps in  the  lexer.   The  algorithm  implementation  is
       limited  by  ~1Gb of transitions and consumes constant amount of memory (re2c writes data to file as soon
       as it is generated).

VISUALIZATION AND DEBUG

       With the -D, --emit-dot option, re2c does not generate code. Instead, it dumps the generated DFA  in  DOT
       format.   One  can convert this dump to an image of the DFA using Graphviz or another library.  Note that
       this option shows the final DFA after it has gone through a number of optimizations and  transformations.
       Earlier stages can be dumped with various debug options, such as --dump-nfa, --dump-dfa-raw etc. (see the
       full list of options).

SEE ALSO

       You can find more information about re2c at the official website: http://re2c.org.  Similar programs  are
       flex(1), lex(1), quex(http://quex.sourceforge.net).

AUTHORS

       re2c was originaly written by Peter Bumbulis in 1993.  Since then it has been developed and maintained by
       multiple volunteers; mots notably, Brain Young, Marcus Boerger, Dan Nuffer and Ulya Trofimovich.

                                                                                                         RE2C(1)