oracular (7) ctags-optlib.7.gz

Provided by: universal-ctags_5.9.20210829.0-1_amd64 bug

NAME

       ctags-optlib - Universal Ctags parser definition language

SYNOPSIS

       ctags [options] [file(s)]
       etags [options] [file(s)]

DESCRIPTION

       Exuberant  Ctags,  the  ancestor  of Universal Ctags, has provided the way to define a new
       parser from command line.  Universal Ctags extends and refines this feature. optlib parser
       is  the  name  for  such parser in Universal Ctags. "opt" intends a parser is defined with
       combination of command line options. "lib" intends an  optlib  parser  can  be  more  than
       ad-hoc personal configuration.

       This  man  page is for people who want to define an optlib parser. The readers should read
       ctags(1) of Universal Ctags first.

       Following options are for defining (or customizing) a parser:

       • --langdef=<name>--map-<LANG>=[+|-]<extension>|<pattern>--kinddef-<LANG>=<letter>,<name>,<description>--regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]--mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]

       Following options are for controlling loading parser definition:

       • --options=<pathname>--options-maybe=<pathname>--optlib-dir=[+]<directory>

       The design of options and notations for defining a parser in Exuberant Ctags may focus  on
       reducing  the  number  of  typing by user.  Reducing the number of typing is important for
       users who want to define (or customize) a parser quickly.

       On the other hand, the design in Universal Ctags focuses on maintainability. The  notation
       of  Universal  Ctags  is redundant than that of Exuberant Ctags; the newly introduced kind
       should be declared explicitly, (long) names are approved than one-letter flags  specifying
       kinds, and naming rules are stricter.

       This  man  page  explains  only stable options and flags.  Universal Ctags also introduces
       experimental options and flags which have names starting  with  _.  For  documentation  on
       these options and flags, visit Universal Ctags web site at https://ctags.io/.

   Storing a parser definition to a file
       Though  it  is  possible  to define a parser from command line, you don't want to type the
       same command line each time when you need the parser.  You can store options for  defining
       a parser into a file.

       ctags  loads  files  (preload  files)  listed  in  "FILES"  section of ctags(1) at program
       starting up. You can put your parser definition needed usually to the files.

       --options=<pathname>, --options-maybe=<pathname>, and --optlib-dir=[+]<directory> are  for
       loading  optlib files you need occasionally. See "Option File Options" section of ctags(1)
       for these options.

       As explained in "FILES" section of ctags(1), options for defining a parser listed line  by
       line  in  an  optlib  file. Prefixed white spaces are ignored. A line starting with '#' is
       treated as a comment.  Escaping shell meta character is not needed.

       Use .ctags as file extension for optlib file. You can define multiple parsers in an optlib
       file but it is better to make a file for each parser definition.

       --_echo=<msg> and --_force-quit=<num> options are for debugging optlib parser.

   Overview for defining a parser
       1. Design the parser

          You  need know both the target language and the ctags' concepts (definition, reference,
          kind, role, field, extra). About the concepts, ctags(1) of  Universal  Ctags  may  help
          you.

       2. Give a name to the parser

          Use --langdef=<name> option. <name> is referred as <LANG> in the later steps.

       3. Give a file pattern or file extension for activating the parser

          Use --map-<LANG>=[+|-]<extension>|<pattern>.

       4. Define kinds

          Use  --kinddef-<LANG>=<letter>,<name>,<description> option.  Universal Ctags introduces
          this option.  Exuberant Ctags doesn't have. In Exuberant Ctags, a kind is defined as  a
          side  effect  of  specifying  --regex-<LANG>=  option. So user doesn't have a chance to
          recognize how important the definition of kind.

       5. Define patterns

          Use --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]  option  for  a
          single-line        regular        expression.        You       can       also       use
          --mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>] option for  a
          multi-line regular expression.

          As    <kind-spec>,    you    can    use    the    one-letter    flag    defined    with
          --kinddef-<LANG>=<letter>,<name>,<description> option.

OPTIONS

       --langdef=<name>
              Defines a new user-defined language, <name>, to be parsed with regular expressions.
              Once defined, <name> may be used in other options taking language names.

              <name> must consist of alphanumeric characters, '#', or '+' ('[a-zA-Z0-9#+]+'). The
              graph characters other than '#' and '+' are disallowed (or reserved). Some of  them
              ([-=:{.])  are  disallowed  because  they can make the command line parser of ctags
              confused. The rest of them are just reserved for future extending ctags.

              all is an exception.  all as <name> is not acceptable. It is a reserved  word.  See
              the  description  of --kinds-(<LANG>|all)=[+|-](<kinds>|*) option in ctags(1) about
              how the reserved word is used.

              The names of built-in parsers are capitalized. When ctags evaluates an option in  a
              command  line,  and  chooses  a  parser,  ctags  uses  the  names  of  parsers in a
              case-insensitive way. Therefore, giving a name started from a  lowercase  character
              doesn't  help  you  to  avoid the parser name confliction. However, in a tags file,
              ctags prints parser names in a case-sensitive way;  it  prints  a  parser  name  as
              specified  in  --langdef=<name> option.  Therefore, we recommend you to give a name
              started from a lowercase  character  to  your  private  optlib  parser.  With  this
              convention,  people  can know where a tag entry in a tag file comes from a built-in
              parser or a private optlib parser.

       --kinddef-<LANG>=<letter>,<name>,<description>
              Define a kind for <LANG>.  Be not confused this with --kinds-<LANG>.

              <letter> must be an alphabetical character ('[a-zA-EG-Z]') other than "F". "F"  has
              been reserved for representing a file since Exuberant Ctags.

              <name>  must start with an alphabetic character, and the rest must  be alphanumeric
              ('[a-zA-Z][a-zA-Z0-9]*'). Do not use "file" as <name>. It  has  been  reserved  for
              representing a file since Exuberant Ctags.

              Note  that using a number character in a <name> violates the version 2 of tags file
              format though ctags accepts it. For more detail, see tags(5).

              <description> comes from any printable ASCII characters. The exception is { and  \.
              {  is  reserved  for  adding  flags this option in the future. So put \ before { to
              include { to a description. To include \ itself to a description, put \ before \.

              Both <letter>, <name> and their combination must be unique in a <LANG>.

              This option is newly introduced in Universal Ctags.  This option reduces the typing
              defining  a  regex  pattern with --regex-<LANG>=, and keeps the consistency of kind
              definitions in a language.

              The <letter> can be used as an argument for  --kinds-<LANG>  option  to  enable  or
              disable  the  kind. Unless K field is enabled, the <letter> is used as value in the
              "kind" extension field in tags output.

              The <name> surrounded by braces can  be  used  as  an  argument  for  --kind-<LANG>
              option.  If K field is enabled, the <name> is used as value in the "kind" extension
              field in tags output.

              The <description> and  <letter>  are  listed  in  --list-kinds  output.  All  three
              elements  of the kind-spec are listed in --list-kinds-full output. Don't use braces
              in the <description>. They will be used meta characters in the future.

       --regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
              Define a single-line regular expression.

              The /<line_pattern>/<name_pattern>/ pair defines a regular  expression  replacement
              pattern, similar in style to sed substitution commands, s/regexp/replacement/, with
              which to generate tags from source files mapped  to  the  named  language,  <LANG>,
              (case-insensitive; either a built-in or user-defined language).

              The  regular  expression,  <line_pattern>,  defines  an extended regular expression
              (roughly that used by egrep(1)), which is used  to  locate  a  single  source  line
              containing a tag and may specify tab characters using \t.

              When  a  matching  line  is  found, a tag will be generated for the name defined by
              <name_pattern>, which generally will contain the special back-references \1 through
              \9 to refer to matching sub-expression groups within <line_pattern>.

              The  '/'  separator characters shown in the parameter to the option can actually be
              replaced by any character. Note that whichever separator  character  is  used  will
              have  to  be  escaped  with  a backslash ('\') character wherever it is used in the
              parameter as something other than a separator. The regular  expression  defined  by
              this  option  is added to the current list of regular expressions for the specified
              language unless the parameter is  omitted,  in  which  case  the  current  list  is
              cleared.

              Unless  modified  by  <flags>,  <line_pattern>  is  interpreted as a POSIX extended
              regular expression. The <name_pattern> should expand for all matching  lines  to  a
              non-empty  string  of  characters,  or  a  warning  message will be reported unless
              {placeholder} regex flag is specified.

              A kind specifier (<kind-spec>) for tags matching regexp may follow  <name_pattern>,
              which  will determine what kind of tag is reported in the kind extension field (see
              tags(5)).

              <kind-spec> has two forms: one-letter form and full form.

              The     one-letter form in the form of <letter>. It just  refers  a  kind  <letter>
              defined with --kinddef-<LANG>. This form is recommended in Universal Ctags.

              The  full  form  of  <kind-spec>  is  in the form of <letter>,<name>,<description>.
              Either the kind <name> and/or the <description> can be omitted. See the description
              of --kinddef-<LANG>=<letter>,<name>,<description> option about the elements.

              The  full form is supported only for keeping the compatibility with Exuberant Ctags
              which does not have --kinddef-<LANG> option. Supporting the form  will  be  removed
              from Universal Ctags in the future.

              About <flags>, see "FLAGS FOR --regex-<LANG> OPTION".

              For  more  information  on  the  regular  expressions used by ctags, see either the
              regex(5,7) man page, or the GNU info documentation for regex (e.g. "info regex").

       --list-regex-flags
              Lists the flags that can be used in --regex-<LANG> option.

       --list-mline-regex-flags
              Lists the flags that can be used in --mline-regex-<LANG> option.

       --mline-regex-<LANG>=/<line_pattern>/<name_pattern>/<kind-spec>/[<flags>]
              Define a multi-line regular expression.

              This option is similar to --regex-<LANG> option except the pattern  is  applied  to
              the whole file’s contents, not line by line.

       --_echo=<message>
              Print  <message>  to the standard error stream.  This is helpful to understand (and
              debug) optlib loading feature of Universal Ctags.

       --_force-quit[=<num>]
              Exits immediately when this option is processed.  If <num> is used as exit  status.
              The  default  is  0.   This is helpful to debug optlib loading feature of Universal
              Ctags.

   FLAGS FOR --regex-<LANG> OPTION
       You can specify more than one flag, <letter>|{<name>}, at the  end  of  --regex-<LANG>  to
       control how Universal Ctags uses the pattern.

       Exuberant  Ctags  uses  a  <letter>  to  represent  a  flag.  In Universal Ctags, a <name>
       surrounded by braces (name form) can be used in addition to <letter>. The name form  makes
       a user reading an optlib file easier.

       The  most  of  all  flags  newly  added  in  Universal  Ctags  don't  have  the one-letter
       representation. All of them have only the name  representation.  --list-regex-flags  lists
       all the flags.

       basic (one-letter form b)
              The pattern is interpreted as a POSIX basic regular expression.

       exclusive (one-letter form x)
              Skip  testing  the  other  patterns  if  a line is matched to this pattern. This is
              useful to avoid using CPU to parse line comments.

       extend (one-letter form e)
              The pattern is interpreted as a POSIX extended regular expression (default).

       icase (one-letter form i)
              The regular expression is to be applied in a case-insensitive manner.

       placeholder
              Don't emit a tag captured with a regex pattern.  The replacement can  be  an  empty
              string.  See the following description of scope=... flag about how this is useful.

       scope=(ref|push|pop|clear|set)
          Specify what to do with the internal scope stack.

          A  parser  programmed with --regex-<LANG> has a stack (scope stack) internally. You can
          use it for tracking scope information. The  scope=...  flag  is  for  manipulating  and
          utilizing the scope stack.

          If  {scope=push}  is  specified,  a  tag  captured with --regex-<LANG> is pushed to the
          stack. {scope=push} implies {scope=ref}.

          You can fill the scope field of captured tag with {scope=ref}. If {scope=ref}  flag  is
          given, ctags attaches the tag at the top to the tag captured with --regex-<LANG> as the
          value for the scope: field.

          ctags pops the tag at the top of the stack  when  --regex-<LANG>  with  {scope=pop}  is
          matched to the input line.

          Specifying  {scope=clear}  removes  all  the tags in the scope.  Specifying {scope=set}
          removes all the tags in the scope, and then pushes the  captured  tag  as  {scope=push}
          does.

          In  some  cases, you may want to use --regex-<LANG> only for its side effects: using it
          only to manipulate the stack but not  for  capturing  a  tag.  In  such  a  case,  make
          <name_pattern>  component of --regex-<LANG> option empty while specifying {placeholder}
          as a regex flag. For example, a non-named tag can be put on the stack by giving a regex
          flag "{scope=push}{placeholder}".

          You  may  wonder what happens if a regex pattern with {scope=ref} flag matches an input
          line but the stack is empty, or a non-named tag is at the top.  If  the  regex  pattern
          contains a {scope=ref} flag and the stack is empty, the {scope=ref} flag is ignored and
          nothing is attached to the scope: field.

          If the top of the stack contains an unnamed tag, ctags searches deeper into  the  stack
          to find the top-most named tag. If it reaches the bottom of the stack without finding a
          named tag, the {scope=ref} flag is ignored and nothing is attached to the scope: field.

          When a named tag on the stack is popped or cleared as the  side  effect  of  a  pattern
          matching,  ctags  attaches  the line number of the match to the end: field of the named
          tag.

          ctags clears all of the tags on the stack when it reaches the end of the  input  source
          file. The line number of the end is attached to the end: field of the cleared tags.

       warning=<message>
              print the given <message> at WARNING level

       fatal=<message>
              print the given <message> and exit

EXAMPLES

   Perl Pod
       This    is   the   definition   (pod.ctags)   used   in   ctags   for   parsing   Pod   (‐
       https://perldoc.perl.org/perlpod.html) file.

          --langdef=pod
          --map-pod=+.pod

          --kinddef-pod=c,chapter,chapters
          --kinddef-pod=s,section,sections
          --kinddef-pod=S,subsection,subsections
          --kinddef-pod=t,subsubsection,subsubsections

          --regex-pod=/^=head1[ \t]+(.+)/\1/c/
          --regex-pod=/^=head2[ \t]+(.+)/\1/s/
          --regex-pod=/^=head3[ \t]+(.+)/\1/S/
          --regex-pod=/^=head4[ \t]+(.+)/\1/t/

   Using scope regex flags
       Let's think about writing a parser for a very small subset of the Ruby language.

       input source file (input.srb):

          class Example
            def methodA
                  puts "in class_method"
            end
            def methodB
                  puts "in class_method"
            end
          end

       The parser for the input should capture Example with class kind, methodA, and methodB with
       method  kind.  methodA and methodB should have Example as their scope. end: fields of each
       tag should have proper values.

       optlib file (sub-ruby.ctags):

          --langdef=subRuby
          --map-subRuby=.srb
          --kinddef-subRuby=c,class,classes
          --kinddef-subRuby=m,method,methods
          --regex-subRuby=/^class[ \t]+([a-zA-Z][a-zA-Z0-9]+)/\1/c/{scope=push}
          --regex-subRuby=/^end///{scope=pop}{placeholder}
          --regex-subRuby=/^[ \t]+def[ \t]+([a-zA-Z][a-zA-Z0-9_]+)/\1/m/{scope=push}
          --regex-subRuby=/^[ \t]+end///{scope=pop}{placeholder}

       command line and output:

          $ ctags --quiet --fields=+eK \
          --options=./sub-ruby.ctags -o - input.srb
          Example input.srb       /^class Example$/;"     class   end:8
          methodA input.srb       /^  def methodA$/;"     method  class:Example   end:4
          methodB input.srb       /^  def methodB$/;"     method  class:Example   end:7

SEE ALSO

       The official Universal Ctags web site at:

       https://ctags.io/

       ctags(1), tags(5), regex(3), regex(7), egrep(1)

AUTHOR

       Universal Ctags project https://ctags.io/ (This man page partially derived  from  ctags(1)
       of Executable-ctags)

       Darren Hiebert <dhiebert@users.sourceforge.net> http://DarrenHiebert.com/