Ubuntu Manpage: ctags-optlib - Universal Ctags parser definition language

Provided by: universal-ctags_6.2.1-1_amd64

Name

       ctags-optlib - Universal Ctags parser definition language

SYNOPSIS

       ctags [options] [file(s)]
       etags [options] [file(s)]

DESCRIPTION

       Exuberant  Ctags,  the  ancestor  of  Universal  Ctags,  has provided the way to define a new parser from
       command line.  Universal Ctags extends and refines this feature. optlib  parser  is  the  name  for  such
       parser  in  Universal  Ctags. "opt" intends a parser is defined with combination of command line options.
       "lib" intends an optlib parser can be more than ad-hoc personal configuration.

       This man page is for people who want to define an optlib parser. The  readers  should  read  ctags(1)  of
       Universal Ctags first.

       Following options are for defining (or customizing) a parser:

       • --langdef=<name>

       • --map-<LANG>=[+|-]<extension>|<pattern>

       • --kinddef-<LANG>=<letter>,<name>,<description>

       • --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]

       • --mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/{mgroup=<N>}[<flags>]

       Following options are for controlling loading parser definition:

       • --options=<pathname>

       • --options-maybe=<pathname>

       • --optlib-dir=[+]<directory>

       The  design  of  options and notations for defining a parser in Exuberant Ctags may focus on reducing the
       number of typing by user.  Reducing the number of typing is important for users who want  to  define  (or
       customize) a parser quickly.

       On  the  other  hand, the design in Universal Ctags focuses on maintainability. The notation of Universal
       Ctags is redundant than that of Exuberant Ctags; the newly introduced kind should be declared explicitly,
       (long) names are approved than one-letter flags specifying kinds, and naming rules are stricter.

       This man page explains only stable options and  flags.   Universal  Ctags  also  introduces  experimental
       options  and  flags which have names starting with _. For documentation on these options and flags, visit
       Universal Ctags web site at <https://ctags.io/>.

   Storing a parser definition to a file
       Though it is possible to define a parser from command line, you don't want to type the same command  line
       each time when you need the parser.  You can store options for defining a parser into a file.

       ctags  loads  files (preload files) listed in "FILES" section of ctags(1) at program starting up. You can
       put your parser definition needed usually to the files.

       --options=<pathname>, --options-maybe=<pathname>, and --optlib-dir=[+]<directory> are for loading  optlib
       files you need occasionally. See "Option File Options" section of ctags(1) for these options.

       As  explained  in  "FILES"  section  of ctags(1), options for defining a parser listed line by line in an
       optlib file. Prefixed white spaces are ignored. A line  starting  with  '#'  is  treated  as  a  comment.
       Escaping shell meta character is not needed.

       Use .ctags as file extension for optlib file. You can define multiple parsers in an optlib file but it is
       better to make a file for each parser definition.

       --_echo=<msg> and --_force-quit=<num> options are for debugging optlib parser.

   Overview for defining a parser
       1. Design the parser

          You  need  know  both  the target language and the ctags' concepts (definition, reference, kind, role,
          field, extra). About the concepts, ctags(1) of Universal Ctags may help you.

       2. Give a name to the parser

          Use --langdef=<name> option. <name> is referred as <LANG> in the later steps.

       3. Give a file pattern or file extension for activating the parser

          Use --map-<LANG>=[+|-]<extension>|<pattern>.

       4. Define kinds

          Use --kinddef-<LANG>=<letter>,<name>,<description> option.  Universal Ctags  introduces  this  option.
          Exuberant  Ctags  doesn't  have.  In Exuberant Ctags, a kind is defined as a side effect of specifying
          --regex-<LANG>= option. So user doesn't have a chance to recognize how  important  the  definition  of
          kind.

       5. Define patterns

          Use   --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]  option  for  a  single-line
          regular              expression.              You              can              also               use
          --mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/{mgroup=<N>}[<flags>]   option  for  a
          multi-line regular expression.

          As     <kind-spec>,     you     can      use      the      one-letter      flag      defined      with
          --kinddef-<LANG>=<letter>,<name>,<description> option.

OPTIONS

       --langdef=<name>
              Defines  a new user-defined language, <name>, to be parsed with regular expressions. Once defined,
              <name> may be used in other options taking language names.

              <name> must consist  of  alphanumeric  characters,  '#',  or  '+'  ('[a-zA-Z0-9#+]+').  The  graph
              characters  other  than  '#'  and  '+'  are  disallowed  (or reserved). Some of them ([-=:{.]) are
              disallowed because they can make the command line parser of ctags confused. The rest of  them  are
              just reserved for future extending ctags.

              all  is an exception.  all as <name> is not acceptable. It is a reserved word. See the description
              of --kinds-(<LANG>|all)=[+|-](<kinds>|*) option in ctags(1) about how the reserved word is used.

              NONE is another exception. NONE as <name> is not acceptable.

              The names of built-in parsers are capitalized. When ctags evaluates an option in a  command  line,
              and chooses a parser, ctags uses the names of parsers in a case-insensitive way. Therefore, giving
              a  name  started from a lowercase character doesn't help you to avoid the parser name confliction.
              However, in a tags file, ctags prints parser names in a case-sensitive way;  it  prints  a  parser
              name  as specified in --langdef=<name> option.  Therefore, we recommend you to give a name started
              from a lowercase character to your private optlib parser. With this convention,  people  can  know
              where a tag entry in a tag file comes from a built-in parser or a private optlib parser.

       --kinddef-<LANG>=<letter>,<name>,<description>
              Define a kind for <LANG>.  Be not confused this with --kinds-<LANG>.

              <letter>  must  be an alphabetical character ('[a-zA-EG-Z]') other than "F". "F" has been reserved
              for representing a file since Exuberant Ctags.

              <name>  must  start  with  an  alphabetic  character,  and  the   rest   must    be   alphanumeric
              ('[a-zA-Z][a-zA-Z0-9]*').  Do  not  use  "file" as <name>. It has been reserved for representing a
              file since Exuberant Ctags.

              Note that using a number character in a <name> violates the version 2 of tags file  format  though
              ctags accepts it. For more detail, see tags(5).

              <description>  comes  from any printable ASCII characters. The exception is { and \. { is reserved
              for adding flags this option in the future. So put \ before { to include { to  a  description.  To
              include \ itself to a description, put \ before \.

              Both <letter>, <name> and their combination must be unique in a <LANG>.

              This  option  is  newly  introduced in Universal Ctags.  This option reduces the typing defining a
              regex pattern with --regex-<LANG>=, and keeps the consistency of kind definitions in a language.

              The <letter> can be used as an argument for --kinds-<LANG> option to enable or disable  the  kind.
              Unless  K  field  is  enabled, the <letter> is used as value in the "kind" extension field in tags
              output.

              The <name> surrounded by braces can be used as an argument for --kind-<LANG> option. If K field is
              enabled, the <name> is used as value in the "kind" extension field in tags output.

              The <description> and <letter> are listed in  --list-kinds  output.  All  three  elements  of  the
              kind-spec are listed in --list-kinds-full output. Don't use braces in the <description>. They will
              be used meta characters in the future.

       --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
              Define a single-line regular expression.

              The /<line_pattern>/<name_pattern>/ pair defines a regular expression replacement pattern, similar
              in  style  to  sed  substitution commands, s/regexp/replacement/, with which to generate tags from
              source files mapped to the  named  language,  <LANG>,  (case-insensitive;  either  a  built-in  or
              user-defined language).

              The  regular expression, <line_pattern>, defines an extended regular expression (roughly that used
              by egrep(1)), which is used to locate a single source line containing a tag and  may  specify  tab
              characters using \t.

              When  a  matching  line  is found, a tag will be generated for the name defined by <name_pattern>,
              which generally will contain the special back-references  \1  through  \9  to  refer  to  matching
              sub-expression groups within <line_pattern>.

              The  '/' separator characters shown in the parameter to the option can actually be replaced by any
              character. Note that whichever separator character  is  used  will  have  to  be  escaped  with  a
              backslash  ('\')  character  wherever  it  is  used  in  the  parameter  as something other than a
              separator. The regular expression defined by this option is added to the current list  of  regular
              expressions  for the specified language unless the parameter is omitted, in which case the current
              list is cleared.

              Unless modified by <flags>, <line_pattern> is interpreted as a POSIX extended regular  expression.
              The  <name_pattern> should expand for all matching lines to a non-empty string of characters, or a
              warning message will be reported unless {placeholder} regex flag is specified.

              A kind specifier (<kind-spec>) for tags matching regexp  may  follow  <name_pattern>,  which  will
              determine what kind of tag is reported in the kind extension field (see tags(5)).

              <kind-spec> has two forms: one-letter form and full form.

              The      one-letter  form  in  the  form  of <letter>. It just refers a kind <letter> defined with
              --kinddef-<LANG>. This form is recommended in Universal Ctags.

              The full form of <kind-spec> is in the form  of  <letter>,<name>,<description>.  Either  the  kind
              <name>    and/or    the    <description>    can    be    omitted.    See    the   description   of
              --kinddef-<LANG>=<letter>,<name>,<description> option about the elements.

              The full form is supported only for keeping the compatibility with Exuberant Ctags which does  not
              have  --kinddef-<LANG>  option.  Supporting  the  form will be removed from Universal Ctags in the
              future.

              About <flags>, see "FLAGS FOR --regex-<LANG> OPTION".

              For more information on the regular expressions used by ctags, see either the regex(5,7) man page,
              or the GNU info documentation for regex (e.g. "info regex").

       --list-regex-flags
              Lists the flags that can be used in --regex-<LANG> option.

       --list-mline-regex-flags
              Lists the flags that can be used in --mline-regex-<LANG> option.

       --mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/{mgroup=<N>}[<flags>]
              Define a multi-line regular expression.

              This option is similar to --regex-<LANG> option except the pattern is applied to the whole  file’s
              contents, not line by line.

              See "FLAGS FOR --mline-regex-<LANG> OPTION" about {mgroup=<N>}.  {mgroup=<N>} flag is a must.

       --_echo=<message>
              Print  <message>  to  the standard error stream.  This is helpful to understand (and debug) optlib
              loading feature of Universal Ctags.

       --_force-quit[=<num>]
              Exits immediately when this option is processed.  If <num> is used as exit status. The default  is
              0.  This is helpful to debug optlib loading feature of Universal Ctags.

   FLAGS FOR --regex-<LANG> OPTION
       You  can  specify  more  than  one  flag,  <letter>|{<name>}, at the end of --regex-<LANG> to control how
       Universal Ctags uses the pattern.

       Exuberant Ctags uses a <letter> to represent a flag. In Universal Ctags, a <name>  surrounded  by  braces
       (name  form)  can  be  used  in  addition  to <letter>. The name form makes a user reading an optlib file
       easier.

       The most of all flags newly added in Universal Ctags don't have the  one-letter  representation.  All  of
       them have only the name representation. --list-regex-flags lists all the flags.

       basic (one-letter form b)
              The pattern is interpreted as a POSIX basic regular expression.

       exclusive (one-letter form x)
              Skip  testing  the  other  patterns  if a line is matched to this pattern. This is useful to avoid
              using CPU to parse line comments.

       extend (one-letter form e)
              The pattern is interpreted as a POSIX extended regular expression (default).

       pcre2 (one-letter form p, experimental)
              The pattern is interpreted as a PCRE2 regular expression explained in pcre2syntax(3).   This  flag
              is  available  only  if  the  ctags is built with pcre2 library. See the output of --list-features
              option to know whether your ctags is built-with pcre2 or not.

       icase (one-letter form i)
              The regular expression is to be applied in a case-insensitive manner.

       placeholder
              Don't emit a tag captured with a regex pattern.  The replacement can be an empty string.  See  the
              following description of scope=... flag about how this is useful.

       postrun
              Match the pattern at the end of all the parsing processes, including:

              • running the built-in code for <LANG>,

              • applying --mline-regex-<LANG> patterns,

              • applying --_mtable-regex-<LANG> patterns, and

              • applying non-postrun --regex-<LANG>= patterns.

              This flag is helpful when combined with scope=intervaltab.

              The  built-in  code  processes source files line-by-line delivered by the main part of ctags.  The
              main part applies non-postrun --regex-<LANG>= patterns to a line just after delivering the line to
              the code of built-in code. Thus, non-postrun --regex-<LANG>= patterns cannot  refer  to  the  tags
              information finally extracted by the built-in code.

              This  is  where  the  postrun comes into play. The main part never applies postrun --regex-<LANG>=
              patterns when delivering lines to the code of built-in  code.  Instead,  it  applies  the  postrun
              patterns  in batch after delivering all lines to the built-in code. The postrun patterns can refer
              to the tags information extracted by the built-in code.

              --mline-regex-<LANG> and --_mtable-regex-<LANG> have no  {postrun}  flag  because  the  main  part
              always applies the patterns specified with the options after running the built-in code for <LANG>.

              See also the description of scope=intervaltab flag.

       scope=intervaltab
              Use  the  interval  table  maintained by the main part of ctags to fill in the scope: field.  This
              flag is useful for extending a built-in parser with the --regex-<LANG>= option with postrun flag.

              The interval table holds tag entries having both line: and end:  fields.  These  tag  entries  are
              stored  in  a  table  keyed  by  their line: and end: field pairs. Therefore, the table can answer
              queries like, "Is there a tag entry that includes this line?" or "Which tag  entry  contains  this
              line?"

              The  source  line,  where postrun --regex-<LANG> pattern finds a language object, can be a key for
              such queries. The tag entry returned by the table is set in the scope: field of the newly  created
              tag entry for the language object.

              postrun  flag  is  needed  for running the built-in parser that stores tag entries before applying
              patterns specified with --regex-<LANG>.

              See also the example in "Using the interval table".

       scope=(ref|push|pop|clear|set|replace)
              Specify what to do with the internal scope stack.

              A parser programmed with --regex-<LANG> has a stack (scope stack) internally. You can use  it  for
              tracking scope information. The scope=... flag is for manipulating and utilizing the scope stack.

              If  {scope=push}  is  specified,  a  tag  captured  with  --regex-<LANG>  is  pushed to the stack.
              {scope=push} implies {scope=ref}.

              You can fill the scope field (scope:) of captured tag with {scope=ref}.  If  {scope=ref}  flag  is
              given,  ctags attaches the tag at the top to the tag captured with --regex-<LANG> as the value for
              the scope: field.

              ctags pops the tag at the top of the stack when --regex-<LANG> with {scope=pop} is matched to  the
              input line.

              Specifying  {scope=clear}  removes  all the tags in the scope.  Specifying {scope=set} removes all
              the tags in the scope, and then pushes the captured tag as {scope=push} does.

              {scope=replace} does the three things sequentially. First it does the same  as  {scope=pop},  then
              fills  the  scope:  field of the tag captured with --regex-<LANG>, and pushes the tag to the scope
              stack as if {scope=push} was given finally.  You cannot specify another scope action together with
              {scope=replace}.

              You  don't  want  to  specify  {scope=pop}{scope=push}  as  an  alternative  to   {scope=replace};
              {scope=pop}{scope=push} fills the scope: field of the tag captured with --regex-<LANG> first, then
              pops  the tag at the top of the stack, and pushes the captured tag to the scope stack finally. The
              timing   when   filling   the   end   field   is    different    between    {scope=replace}    and
              {scope=pop}{scope=push}.

              In  some  cases,  you  may  want to use --regex-<LANG> only for its side effects: using it only to
              manipulate the stack but not for capturing a tag. In such a case, make <name_pattern> component of
              --regex-<LANG> option empty while specifying  {placeholder}  as  a  regex  flag.  For  example,  a
              non-named tag can be put on the stack by giving a regex flag "{scope=push}{placeholder}".

              You may wonder what happens if a regex pattern with {scope=ref} flag matches an input line but the
              stack is empty, or a non-named tag is at the top. If the regex pattern contains a {scope=ref} flag
              and  the  stack  is  empty,  the {scope=ref} flag is ignored and nothing is attached to the scope:
              field.

              If the top of the stack contains an unnamed tag, ctags searches deeper into the stack to find  the
              top-most  named  tag.  If  it  reaches  the  bottom  of the stack without finding a named tag, the
              {scope=ref} flag is ignored and nothing is attached to the scope: field.

              When a named tag on the stack is popped or cleared as the side effect of a pattern matching, ctags
              attaches the line number of the match to the end: field of the named tag.

              ctags clears all of the tags on the stack when it reaches the end of the input  source  file.  The
              line number of the end is attached to the end: field of the cleared tags.

       warning=<message>
              print the given <message> at WARNING level

       fatal=<message>
              print the given <message> and exit

   FLAGS FOR --mline-regex-<LANG> OPTION
       mgroup=<N>
              decide the location of the tag extracted with --mline-regex-<LANG> option.

              <N>  is  the  number  of  a  capture group in the pattern, which is used to record the line number
              location of the tag. mgroup=<N> flag is not an optional. You must add an mgroup=<N> flag, even  if
              the <N> is 0 (meaning the start position of the whole regex pattern).

       scope=intervaltab
              See the description for the flag in "FLAGS FOR --regex-<LANG> OPTION".

              Unlike --regex-<LANG> option, you don't have to specify postrun flag.

EXAMPLES

   Perl Pod
       This  is  the  definition  (pod.ctags)  used  in ctags for parsing Pod (<https://perldoc.perl.org/perlpod
       .html>) file.

          --langdef=pod
          --map-pod=+.pod

          --kinddef-pod=c,chapter,chapters
          --kinddef-pod=s,section,sections
          --kinddef-pod=S,subsection,subsections
          --kinddef-pod=t,subsubsection,subsubsections

          --regex-pod=/^=head1[ \t]+(.+)/\1/c/
          --regex-pod=/^=head2[ \t]+(.+)/\1/s/
          --regex-pod=/^=head3[ \t]+(.+)/\1/S/
          --regex-pod=/^=head4[ \t]+(.+)/\1/t/

   Using scope regex flags
       Let's think about writing a parser for a very small subset of the Ruby language.

       input source file (input.srb):

          class Example
            def methodA
                  puts "in class_method"
            end
            def methodB
                  puts "in class_method"
            end
          end

       The parser for the input should capture Example with class kind, methodA, and methodB with  method  kind.
       methodA  and  methodB  should  have  Example  as  their scope. end: fields of each tag should have proper
       values.

       optlib file (sub-ruby.ctags):

          --langdef=subRuby
          --map-subRuby=.srb
          --kinddef-subRuby=c,class,classes
          --kinddef-subRuby=m,method,methods
          --regex-subRuby=/^class[ \t]+([a-zA-Z][a-zA-Z0-9]+)/\1/c/{scope=push}
          --regex-subRuby=/^end///{scope=pop}{placeholder}
          --regex-subRuby=/^[ \t]+def[ \t]+([a-zA-Z][a-zA-Z0-9_]+)/\1/m/{scope=push}
          --regex-subRuby=/^[ \t]+end///{scope=pop}{placeholder}

       command line and output:

          $ ctags --quiet --fields=+eK \
          --options=./sub-ruby.ctags -o - input.srb
          Example input.srb       /^class Example$/;"     class   end:8
          methodA input.srb       /^  def methodA$/;"     method  class:Example   end:4
          methodB input.srb       /^  def methodB$/;"     method  class:Example   end:7

   Using the interval table
       Let's try to extract kernel threads defined in Linux kernel.

       kthread_run is the function for creating and starting a kernel thread. For example, kswapd kernel  thread
       is created with the function like (quoted from linux/mm/vmscan.c):

          void kswapd_run(int nid)
          {
                          /* ... */
                          pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
                          /* ... */

       kthread-v1.ctags illustrates the way to extract the name of kernel threads appeared at the third argument
       of kthread_run.

       kthread-v1.ctags:

          --kinddef-C=K,kernelThread,the name of kernel thread in Linux kernel
          --regex-C=/kthread_run\([^"]+"([^"]+)"/\1/K/

       With kthread-v1.ctags, ctags emits the following tags:

          $ ctags --options=linux/kthread-v1.ctags  -o - vmscan.c
          ...
          kswapd%d        vmscan.c        /^... = kthread_run(kswapd, pgdat, "kswapd%d"...$/;"    K
          ...

       Using the interval table, you can attach the name of the function where the kernel thread is defined.

       kthread-v2.ctags:

          --kinddef-C=K,kernelThread,the name of kernel thread in Linux kernel
          --regex-C=/kthread_run\([^"]+"([^"]+)"/\1/K/{postrun}{scope=intervaltab}

       With kthread-v2.ctags, ctags emits the following tags:

          $ ctags --options=linux/kthread-v2.ctags  -o - vmscan.c
          ...
          kswapd%d        vmscan.c        /^... = kthread_run(kswapd, pgdat, "kswapd%d"...$/;"    K       function:kswapd_run
          ...

       With the new .ctags file, function:kswapd_run is attach to the tag entry as its scope field.

AUTHOR

       Universal   Ctags  project  <https://ctags.io/>  (This  man  page  partially  derived  from  ctags(1)  of
       Executable-ctags)

       Darren Hiebert <<dhiebert@users.sourceforge.net>> <http://DarrenHiebert.com/>

6.2.1                                                                                            ctags-optlib(7)

Name

SYNOPSIS

DESCRIPTION

OPTIONS

EXAMPLES

SEE ALSO

AUTHOR