Ubuntu Manpage: ctags-optlib - Universal Ctags parser definition language

Provided by: universal-ctags_5.9.20210829.0-1_amd64

NAME

       ctags-optlib - Universal Ctags parser definition language

SYNOPSIS

       ctags [options] [file(s)]
       etags [options] [file(s)]

DESCRIPTION

       Exuberant  Ctags,  the  ancestor  of  Universal  Ctags,  has provided the way to define a new parser from
       command line.  Universal Ctags extends and refines this feature. optlib  parser  is  the  name  for  such
       parser  in  Universal  Ctags. "opt" intends a parser is defined with combination of command line options.
       "lib" intends an optlib parser can be more than ad-hoc personal configuration.

       This man page is for people who want to define an optlib parser. The  readers  should  read  ctags(1)  of
       Universal Ctags first.

       Following options are for defining (or customizing) a parser:

       • --langdef=<name>

       • --map-<LANG>=[+|-]<extension>|<pattern>

       • --kinddef-<LANG>=<letter>,<name>,<description>

       • --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]

       • --mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]

       Following options are for controlling loading parser definition:

       • --options=<pathname>

       • --options-maybe=<pathname>

       • --optlib-dir=[+]<directory>

       The  design  of  options and notations for defining a parser in Exuberant Ctags may focus on reducing the
       number of typing by user.  Reducing the number of typing is important for users who want  to  define  (or
       customize) a parser quickly.

       On  the  other  hand, the design in Universal Ctags focuses on maintainability. The notation of Universal
       Ctags is redundant than that of Exuberant Ctags; the newly introduced kind should be declared explicitly,
       (long) names are approved than one-letter flags specifying kinds, and naming rules are stricter.

       This man page explains only stable options and  flags.   Universal  Ctags  also  introduces  experimental
       options  and  flags which have names starting with _. For documentation on these options and flags, visit
       Universal Ctags web site at https://ctags.io/.

   Storing a parser definition to a file
       Though it is possible to define a parser from command line, you don't want to type the same command  line
       each time when you need the parser.  You can store options for defining a parser into a file.

       ctags  loads  files (preload files) listed in "FILES" section of ctags(1) at program starting up. You can
       put your parser definition needed usually to the files.

       --options=<pathname>, --options-maybe=<pathname>, and --optlib-dir=[+]<directory> are for loading  optlib
       files you need occasionally. See "Option File Options" section of ctags(1) for these options.

       As  explained  in  "FILES"  section  of ctags(1), options for defining a parser listed line by line in an
       optlib file. Prefixed white spaces are ignored. A line  starting  with  '#'  is  treated  as  a  comment.
       Escaping shell meta character is not needed.

       Use .ctags as file extension for optlib file. You can define multiple parsers in an optlib file but it is
       better to make a file for each parser definition.

       --_echo=<msg> and --_force-quit=<num> options are for debugging optlib parser.

   Overview for defining a parser
       1. Design the parser

          You  need  know  both  the target language and the ctags' concepts (definition, reference, kind, role,
          field, extra). About the concepts, ctags(1) of Universal Ctags may help you.

       2. Give a name to the parser

          Use --langdef=<name> option. <name> is referred as <LANG> in the later steps.

       3. Give a file pattern or file extension for activating the parser

          Use --map-<LANG>=[+|-]<extension>|<pattern>.

       4. Define kinds

          Use --kinddef-<LANG>=<letter>,<name>,<description> option.  Universal Ctags  introduces  this  option.
          Exuberant  Ctags  doesn't  have.  In Exuberant Ctags, a kind is defined as a side effect of specifying
          --regex-<LANG>= option. So user doesn't have a chance to recognize how  important  the  definition  of
          kind.

       5. Define patterns

          Use   --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]  option  for  a  single-line
          regular              expression.              You              can              also               use
          --mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]   option  for  a  multi-line
          regular expression.

          As     <kind-spec>,     you     can      use      the      one-letter      flag      defined      with
          --kinddef-<LANG>=<letter>,<name>,<description> option.

OPTIONS

       --langdef=<name>
              Defines  a new user-defined language, <name>, to be parsed with regular expressions. Once defined,
              <name> may be used in other options taking language names.

              <name> must consist  of  alphanumeric  characters,  '#',  or  '+'  ('[a-zA-Z0-9#+]+').  The  graph
              characters  other  than  '#'  and  '+'  are  disallowed  (or reserved). Some of them ([-=:{.]) are
              disallowed because they can make the command line parser of ctags confused. The rest of  them  are
              just reserved for future extending ctags.

              all  is an exception.  all as <name> is not acceptable. It is a reserved word. See the description
              of --kinds-(<LANG>|all)=[+|-](<kinds>|*) option in ctags(1) about how the reserved word is used.

              The names of built-in parsers are capitalized. When ctags evaluates an option in a  command  line,
              and chooses a parser, ctags uses the names of parsers in a case-insensitive way. Therefore, giving
              a  name  started from a lowercase character doesn't help you to avoid the parser name confliction.
              However, in a tags file, ctags prints parser names in a case-sensitive way;  it  prints  a  parser
              name  as specified in --langdef=<name> option.  Therefore, we recommend you to give a name started
              from a lowercase character to your private optlib parser. With this convention,  people  can  know
              where a tag entry in a tag file comes from a built-in parser or a private optlib parser.

       --kinddef-<LANG>=<letter>,<name>,<description>
              Define a kind for <LANG>.  Be not confused this with --kinds-<LANG>.

              <letter>  must  be an alphabetical character ('[a-zA-EG-Z]') other than "F". "F" has been reserved
              for representing a file since Exuberant Ctags.

              <name>  must  start  with  an  alphabetic  character,  and  the   rest   must    be   alphanumeric
              ('[a-zA-Z][a-zA-Z0-9]*').  Do  not  use  "file" as <name>. It has been reserved for representing a
              file since Exuberant Ctags.

              Note that using a number character in a <name> violates the version 2 of tags file  format  though
              ctags accepts it. For more detail, see tags(5).

              <description>  comes  from any printable ASCII characters. The exception is { and \. { is reserved
              for adding flags this option in the future. So put \ before { to include { to  a  description.  To
              include \ itself to a description, put \ before \.

              Both <letter>, <name> and their combination must be unique in a <LANG>.

              This  option  is  newly  introduced in Universal Ctags.  This option reduces the typing defining a
              regex pattern with --regex-<LANG>=, and keeps the consistency of kind definitions in a language.

              The <letter> can be used as an argument for --kinds-<LANG> option to enable or disable  the  kind.
              Unless  K  field  is  enabled, the <letter> is used as value in the "kind" extension field in tags
              output.

              The <name> surrounded by braces can be used as an argument for --kind-<LANG> option. If K field is
              enabled, the <name> is used as value in the "kind" extension field in tags output.

              The <description> and <letter> are listed in  --list-kinds  output.  All  three  elements  of  the
              kind-spec are listed in --list-kinds-full output. Don't use braces in the <description>. They will
              be used meta characters in the future.

       --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
              Define a single-line regular expression.

              The /<line_pattern>/<name_pattern>/ pair defines a regular expression replacement pattern, similar
              in  style  to  sed  substitution commands, s/regexp/replacement/, with which to generate tags from
              source files mapped to the  named  language,  <LANG>,  (case-insensitive;  either  a  built-in  or
              user-defined language).

              The  regular expression, <line_pattern>, defines an extended regular expression (roughly that used
              by egrep(1)), which is used to locate a single source line containing a tag and  may  specify  tab
              characters using \t.

              When  a  matching  line  is found, a tag will be generated for the name defined by <name_pattern>,
              which generally will contain the special back-references  \1  through  \9  to  refer  to  matching
              sub-expression groups within <line_pattern>.

              The  '/' separator characters shown in the parameter to the option can actually be replaced by any
              character. Note that whichever separator character  is  used  will  have  to  be  escaped  with  a
              backslash  ('\')  character  wherever  it  is  used  in  the  parameter  as something other than a
              separator. The regular expression defined by this option is added to the current list  of  regular
              expressions  for the specified language unless the parameter is omitted, in which case the current
              list is cleared.

              Unless modified by <flags>, <line_pattern> is interpreted as a POSIX extended regular  expression.
              The  <name_pattern> should expand for all matching lines to a non-empty string of characters, or a
              warning message will be reported unless {placeholder} regex flag is specified.

              A kind specifier (<kind-spec>) for tags matching regexp  may  follow  <name_pattern>,  which  will
              determine what kind of tag is reported in the kind extension field (see tags(5)).

              <kind-spec> has two forms: one-letter form and full form.

              The      one-letter  form  in  the  form  of <letter>. It just refers a kind <letter> defined with
              --kinddef-<LANG>. This form is recommended in Universal Ctags.

              The full form of <kind-spec> is in the form of <letter>,<name>,<description>.      Either the kind
              <name>   and/or   the    <description>    can    be    omitted.    See    the    description    of
              --kinddef-<LANG>=<letter>,<name>,<description> option about the elements.

              The  full form is supported only for keeping the compatibility with Exuberant Ctags which does not
              have --kinddef-<LANG> option. Supporting the form will be removed  from  Universal  Ctags  in  the
              future.

              About <flags>, see "FLAGS FOR --regex-<LANG> OPTION".

              For more information on the regular expressions used by ctags, see either the regex(5,7) man page,
              or the GNU info documentation for regex (e.g. "info regex").

       --list-regex-flags
              Lists the flags that can be used in --regex-<LANG> option.

       --list-mline-regex-flags
              Lists the flags that can be used in --mline-regex-<LANG> option.

       --mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
              Define a multi-line regular expression.

              This  option is similar to --regex-<LANG> option except the pattern is applied to the whole file’s
              contents, not line by line.

       --_echo=<message>
              Print <message> to the standard error stream.  This is helpful to understand  (and  debug)  optlib
              loading feature of Universal Ctags.

       --_force-quit[=<num>]
              Exits  immediately when this option is processed.  If <num> is used as exit status. The default is
              0.  This is helpful to debug optlib loading feature of Universal Ctags.

   FLAGS FOR --regex-<LANG> OPTION
       You can specify more than one flag, <letter>|{<name>}, at  the  end  of  --regex-<LANG>  to  control  how
       Universal Ctags uses the pattern.

       Exuberant  Ctags  uses  a <letter> to represent a flag. In Universal Ctags, a <name> surrounded by braces
       (name form) can be used in addition to <letter>. The name form  makes  a  user  reading  an  optlib  file
       easier.

       The  most  of  all  flags newly added in Universal Ctags don't have the one-letter representation. All of
       them have only the name representation. --list-regex-flags lists all the flags.

       basic (one-letter form b)
              The pattern is interpreted as a POSIX basic regular expression.

       exclusive (one-letter form x)
              Skip testing the other patterns if a line is matched to this pattern.  This  is  useful  to  avoid
              using CPU to parse line comments.

       extend (one-letter form e)
              The pattern is interpreted as a POSIX extended regular expression (default).

       icase (one-letter form i)
              The regular expression is to be applied in a case-insensitive manner.

       placeholder
              Don't  emit a tag captured with a regex pattern.  The replacement can be an empty string.  See the
              following description of scope=... flag about how this is useful.

       scope=(ref|push|pop|clear|set)
          Specify what to do with the internal scope stack.

          A parser programmed with --regex-<LANG> has a stack (scope stack)  internally.  You  can  use  it  for
          tracking scope information. The scope=... flag is for manipulating and utilizing the scope stack.

          If  {scope=push} is specified, a tag captured with --regex-<LANG> is pushed to the stack. {scope=push}
          implies {scope=ref}.

          You can fill the scope field of captured tag with {scope=ref}. If {scope=ref}  flag  is  given,  ctags
          attaches the tag at the top to the tag captured with --regex-<LANG> as the value for the scope: field.

          ctags  pops  the  tag  at  the top of the stack when --regex-<LANG> with {scope=pop} is matched to the
          input line.

          Specifying {scope=clear} removes all the tags in the scope.  Specifying {scope=set}  removes  all  the
          tags in the scope, and then pushes the captured tag as {scope=push} does.

          In  some  cases,  you  may  want  to  use  --regex-<LANG>  only for its side effects: using it only to
          manipulate the stack but not for capturing a tag. In such a case,  make  <name_pattern>  component  of
          --regex-<LANG>  option  empty while specifying {placeholder} as a regex flag. For example, a non-named
          tag can be put on the stack by giving a regex flag "{scope=push}{placeholder}".

          You may wonder what happens if a regex pattern with {scope=ref} flag matches an  input  line  but  the
          stack is empty, or a non-named tag is at the top. If the regex pattern contains a {scope=ref} flag and
          the stack is empty, the {scope=ref} flag is ignored and nothing is attached to the scope: field.

          If  the  top  of  the  stack contains an unnamed tag, ctags searches deeper into the stack to find the
          top-most named tag. If it reaches the bottom of the stack without finding a named tag, the {scope=ref}
          flag is ignored and nothing is attached to the scope: field.

          When a named tag on the stack is popped or cleared as the side effect of  a  pattern  matching,  ctags
          attaches the line number of the match to the end: field of the named tag.

          ctags  clears  all of the tags on the stack when it reaches the end of the input source file. The line
          number of the end is attached to the end: field of the cleared tags.

       warning=<message>
              print the given <message> at WARNING level

       fatal=<message>
              print the given <message> and exit

EXAMPLES

   Perl Pod
       This is the definition (pod.ctags) used in ctags for parsing Pod  (https://perldoc.perl.org/perlpod.html)
       file.

          --langdef=pod
          --map-pod=+.pod

          --kinddef-pod=c,chapter,chapters
          --kinddef-pod=s,section,sections
          --kinddef-pod=S,subsection,subsections
          --kinddef-pod=t,subsubsection,subsubsections

          --regex-pod=/^=head1[ \t]+(.+)/\1/c/
          --regex-pod=/^=head2[ \t]+(.+)/\1/s/
          --regex-pod=/^=head3[ \t]+(.+)/\1/S/
          --regex-pod=/^=head4[ \t]+(.+)/\1/t/

   Using scope regex flags
       Let's think about writing a parser for a very small subset of the Ruby language.

       input source file (input.srb):

          class Example
            def methodA
                  puts "in class_method"
            end
            def methodB
                  puts "in class_method"
            end
          end

       The  parser  for the input should capture Example with class kind, methodA, and methodB with method kind.
       methodA and methodB should have Example as their scope. end:  fields  of  each  tag  should  have  proper
       values.

       optlib file (sub-ruby.ctags):

          --langdef=subRuby
          --map-subRuby=.srb
          --kinddef-subRuby=c,class,classes
          --kinddef-subRuby=m,method,methods
          --regex-subRuby=/^class[ \t]+([a-zA-Z][a-zA-Z0-9]+)/\1/c/{scope=push}
          --regex-subRuby=/^end///{scope=pop}{placeholder}
          --regex-subRuby=/^[ \t]+def[ \t]+([a-zA-Z][a-zA-Z0-9_]+)/\1/m/{scope=push}
          --regex-subRuby=/^[ \t]+end///{scope=pop}{placeholder}

       command line and output:

          $ ctags --quiet --fields=+eK \
          --options=./sub-ruby.ctags -o - input.srb
          Example input.srb       /^class Example$/;"     class   end:8
          methodA input.srb       /^  def methodA$/;"     method  class:Example   end:4
          methodB input.srb       /^  def methodB$/;"     method  class:Example   end:7

AUTHOR

       Universal   Ctags   project   https://ctags.io/  (This  man  page  partially  derived  from  ctags(1)  of
       Executable-ctags)

       Darren Hiebert <dhiebert@users.sourceforge.net> http://DarrenHiebert.com/

5.9.0                                                                                            CTAGS-OPTLIB(7)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

EXAMPLES

SEE ALSO

AUTHOR