oracular (7) ctags-universal-client-tools.7.gz

Provided by: universal-ctags_5.9.20210829.0-1_amd64 bug

NAME

       ctags-client-tools - Hints for developing a tool using ctags command and tags output

SYNOPSIS

       ctags [options] [file(s)]
       etags [options] [file(s)]

DESCRIPTION

       Client tool means a tool running the ctags command and/or reading a tags file generated by
       ctags command.  This man page gathers hints for people who develop client tools.

PSEUDO-TAGS

       Pseudo-tags, stored in a tag file, indicate how ctags generated the tags file: whether the
       tags  file  is  sorted or not, which version of tags file format is used, the name of tags
       generator, and so on. The opposite term for pseudo-tags is regular-tags. A regular-tag  is
       for  a  language object in an input file. A pseudo-tag is for the tags file itself. Client
       tools may use pseudo-tags as reference for processing regular-tags.

       A pseudo-tag is stored in a tags file in the same format as regular-tags as  described  in
       tags(5),  except that pseudo-tag names are prefixed with "!_". For the general information
       about pseudo-tags, see "TAG FILE INFORMATION" in tags(5).

       An example of a pseudo tag:

          !_TAG_PROGRAM_NAME      Universal Ctags /Derived from Exuberant Ctags/

       The value, "2", associated with the pseudo tag "TAG_PROGRAM_NAME", is used  in  the  field
       for  input file. The description, "Derived from Exuberant Ctags", is used in the field for
       pattern.

       Universal Ctags extends the naming  scheme  of  the  classical  pseudo-tags  available  in
       Exuberant Ctags for emitting language specific information as pseudo tags:

          !_{pseudo-tag-name}!{language-name}     {associated-value}      /{description}/

       The language-name is appended to the pseudo-tag name with a separator, "!".

       An example of pseudo tag with a language suffix:

          !_TAG_KIND_DESCRIPTION!C        f,function      /function definitions/

       This pseudo-tag says "the function kind of C language is enabled when generating this tags
       file." --pseudo-tags is the option for  enabling/disabling  individual  pseudo-tags.  When
       enabling/disabling   a   pseudo   tag   with   the  option,  specify  the  tag  name  only
       "TAG_KIND_DESCRIPTION", without the prefix ("!_") or the suffix ("!C").

   Options for Pseudo-tags
       --extras=+p (or --extras=+{pseudo})
              Forces writing pseudo-tags.

              ctags emits pseudo-tags by default when  writing  tags  to  a  regular  file  (e.g.
              "tags'.) However, when specifying -o - or -f - for writing tags to standard output,
              ctags doesn't  emit  pseudo-tags.  --extras=+p  or  --extras=+{pseudo}  will  force
              pseudo-tags to be written.

       --list-pseudo-tags
              Lists  available  types  of  pseudo-tags  and  shows  whether  they  are enabled or
              disabled.

              Running ctags with --list-pseudo-tags option lists available pseudo-tags.  Some  of
              pseudo-tags  newly  introduced  in Universal Ctags project are disabled by default.
              Use --pseudo-tags=... to enable them.

       --pseudo-tags=[+|-]names|*
              Specifies a list of pseudo-tag types to include in the output.

              The parameters are a set of pseudo tag names. Valid pseudo tag names can be  listed
              with   --list-pseudo-tags.  Surround  each  name  in  the  set  with  braces,  like
              "{TAG_PROGRAM_AUTHOR}". You don't have to include the "!_" pseudo tag  prefix  when
              specifying a name in the option argument for --pseudo-tags= option.

              pseudo-tags don't have a notation using one-letter flags.

              If  a  name is preceded by either the '+' or '-' characters, that tags's effect has
              been added or removed. Otherwise  the  names  replace  any  current  settings.  All
              entries are included if '*' is given.

       --fields=+E (or --fields=+{extras})
              Attach "extras:pseudo" field to pseudo-tags.

              An example of pseudo tags with the field:

                 !_TAG_PROGRAM_NAME      Universal Ctags /Derived from Exuberant Ctags/  extras:pseudo

              If  the  name  of a normal tag in a tag file starts with "!_", a client tool cannot
              distinguish whether the tag is a regular-tag or pseudo-tag.   The  fields  attached
              with this option help the tool distinguish them.

   List of notable pseudo-tags
       Running  ctags  with  --list-pseudo-tags  option lists available types of pseudo-tags with
       short descriptions. This subsection shows hints for using notable ones.

       TAG_EXTRA_DESCRIPTION (new in Universal Ctags)
              Indicates the names and descriptions of enabled extras:

                 !_TAG_EXTRA_DESCRIPTION       {extra-name}    /description/
                 !_TAG_EXTRA_DESCRIPTION!{language-name}       {extra-name}    /description/

              If your tool relies on some extra tags (extras), refer to the pseudo-tags  of  this
              type.  A  tool  can  reject the tags file that doesn't include expected extras, and
              raise an error in an early stage of processing.

              An example of the pseudo-tags:

                 $ ctags --extras=+p --pseudo-tags='{TAG_EXTRA_DESCRIPTION}' -o - input.c
                 !_TAG_EXTRA_DESCRIPTION       anonymous       /Include tags for non-named objects like lambda/
                 !_TAG_EXTRA_DESCRIPTION       fileScope       /Include tags of file scope/
                 !_TAG_EXTRA_DESCRIPTION       pseudo  /Include pseudo tags/
                 !_TAG_EXTRA_DESCRIPTION       subparser       /Include tags generated by subparsers/
                 ...

              A client tool can know "{anonymous}", "{fileScope}", "{pseudo}", and  "{subparser}"
              extras are enabled from the output.

       TAG_FIELD_DESCRIPTION (new in Universal Ctags)
              Indicates the names and descriptions of enabled fields:

                 !_TAG_FIELD_DESCRIPTION       {field-name}    /description/
                 !_TAG_FIELD_DESCRIPTION!{language-name}       {field-name}    /description/

              If  your tool relies on some fields, refer to the pseudo-tags of this type.  A tool
              can reject a tags file that doesn't include expected fields, and raise an error  in
              an early stage of processing.

              An example of the pseudo-tags:

                 $ ctags --fields-C=+'{macrodef}' --extras=+p --pseudo-tags='{TAG_FIELD_DESCRIPTION}' -o - input.c
                 !_TAG_FIELD_DESCRIPTION       file    /File-restricted scoping/
                 !_TAG_FIELD_DESCRIPTION       input   /input file/
                 !_TAG_FIELD_DESCRIPTION       name    /tag name/
                 !_TAG_FIELD_DESCRIPTION       pattern /pattern/
                 !_TAG_FIELD_DESCRIPTION       typeref /Type and name of a variable or typedef/
                 !_TAG_FIELD_DESCRIPTION!C     macrodef        /macro definition/
                 ...

              A  client tool can know "{file}", "{input}", "{name}", "{pattern}", and "{typeref}"
              fields are enabled from the  output.   The  fields  are  common  in  languages.  In
              addition  to the common fields, the tool can known "{macrodef}" field of C language
              is also enabled.

       TAG_FILE_ENCODING (new in Universal Ctags)
              TBW

       TAG_FILE_FORMAT
              See also tags(5).

       TAG_FILE_SORTED
              See also tags(5).

       TAG_KIND_DESCRIPTION (new in Universal Ctags)
              Indicates the names and descriptions of enabled kinds:

                 !_TAG_KIND_DESCRIPTION!{language-name}        {kind-letter},{kind-name}       /description/

              If your tool relies on some kinds, refer to the pseudo-tags of this type.   A  tool
              can reject the tags file that doesn't include expected kinds, and raise an error in
              an early stage of processing.

              Kinds are language specific, so a language name is  always appended to the tag name
              as suffix.

              An example of the pseudo-tags:

                 $ ctags --extras=+p --kinds-C=vfm --pseudo-tags='{TAG_KIND_DESCRIPTION}' -o - input.c
                 !_TAG_KIND_DESCRIPTION!C      f,function      /function definitions/
                 !_TAG_KIND_DESCRIPTION!C      m,member        /struct, and union members/
                 !_TAG_KIND_DESCRIPTION!C      v,variable      /variable definitions/
                 ...

              A  client  tool  can  know  "{function}",  "{member}",  and "{variable}" kinds of C
              language are enabled from the output.

       TAG_KIND_SEPARATOR (new in Universal Ctags)
              TBW

       TAG_OUTPUT_EXCMD (new in Universal Ctags)
              Indicates the specified type of EX command with --excmd option.

       TAG_OUTPUT_FILESEP (new in Universal Ctags)
              TBW

       TAG_OUTPUT_MODE (new in Universal Ctags)
              TBW

       TAG_PATTERN_LENGTH_LIMIT (new in Universal Ctags)
              TBW

       TAG_PROC_CWD (new in Universal Ctags)
              Indicates the working directory of ctags during processing.

              This pseudo-tag helps a client tool solve the absolute paths for  the  input  files
              for tag entries even when they are tagged with relative paths.

              An example of the pseudo-tags:

                 $ cat tags
                 !_TAG_PROC_CWD        /tmp/   //
                 main  input.c /^int main (void) { return 0; }$/;"     f       typeref:typename:int
                 ...

              From  the  regular  tag  for  "main",  the  client  tool  can know the "main" is at
              "input.c".  However, it is a relative path. So if the directory where ctags run and
              the directory where the client tool runs are different, the client tool cannot find
              "input.c" from the file system. In that case, TAG_PROC_CWD gives the tool  a  hint;
              "input.c" may be at "/tmp".

       TAG_PROGRAM_NAME
              TBW

       TAG_ROLE_DESCRIPTION (new in Universal Ctags)
              Indicates the names and descriptions of enabled roles:

                 !_TAG_ROLE_DESCRIPTION!{language-name}!{kind-name}    {role-name}     /description/

              If your tool relies on some roles, refer to the pseudo-tags of this type. Note that
              a role owned by a disabled kind is not listed even if the role itself is enabled.

REDUNDANT-KINDS

       TBW

MULTIPLE-LANGUAGES FOR AN INPUT FILE

       Universal ctags can run multiple parsers.  That means a parser,  which  supports  multiple
       parsers,  may  output  tags for different languages.  language/l field can be used to show
       the language for each tag.

          $ cat /tmp/foo.html
          <html>
          <script>var x = 1</script>
          <h1>title</h1>
          </html>
          $ ./ctags -o - --extras=+g /tmp/foo.html
          title   /tmp/foo.html   /^  <h1>title<\/h1>$/;" h
          x       /tmp/foo.html   /var x = 1/;"   v
          $ ./ctags -o - --extras=+g --fields=+l /tmp/foo.html
          title   /tmp/foo.html   /^  <h1>title<\/h1>$/;" h       language:HTML
          x       /tmp/foo.html   /var x = 1/;"   v       language:JavaScript

UTILIZING READTAGS

       See readtags(1) to know how to use readtags. This section is for discussing  some  notable
       topics for client tools.

   Build Filter/Sorter Expressions
       Certain  escape  sequences  in  expressions  are recognized by readtags. For example, when
       searching for a tag that matches a\?b, if using  a  filter  expression  like  '(eq?  $name
       "a\?b")',  since  \?  is  translated into a single ? by readtags, it actually searches for
       a?b.

       Another problem is if a single quote appear in filter expressions (which is  also  wrapped
       by  single  quotes),  it  terminates the expression, producing broken expressions, and may
       even cause unintended shell injection. Single quotes can be escaped using '"'"'.

       So, client tools need to:

       • Replace \ by \\

       • Replace ' by '"'"'

       inside the expressions. If the expression also contains strings, " in the strings needs to
       be replaced by \".

       Client tools written in Lisp could build the expression using lists. prin1 (in Common Lisp
       style Lisps) and write (in Scheme style Lisps) can translate the list into a  string  that
       can be directly used. For example, in EmacsLisp:

          (let ((name "hi"))
            (prin1 `(eq? $name ,name)))
          => "(eq\\? $name "hi")"

       The  "?"  is  escaped,  and  readtags  can  handle it. Scheme style Lisps should do proper
       escaping so the expression readtags gets is just the expression passed into write.  Common
       Lisp  style Lisps may produce unrecognized escape sequences by readtags, like \#. Readtags
       provides some aliases for these Lisps:

       • Use true for #t.

       • Use false for #f.

       • Use nil or () for ().

       • Use (string->regexp "PATTERN") for #/PATTERN/. Use (string->regexp "PATTERN"  :case-fold
         true)  for  #/PATTERN/i.  Notice that string->regexp doesn't require escaping "/" in the
         pattern.

       Notice that even when the client tool uses this method, ' still needs to  be  replaced  by
       '"'"' to prevent broken expressions and shell injection.

       Another  thing to notice is that missing fields are represented by #f, and applying string
       operators to them will produce an error. You should always check if  a  field  is  missing
       before  applying  string operators. See the "Filtering" section in readtags(1) to know how
       to do this. Run "readtags -H filter" to see which operators take string arguments.

   Parse Readtags Output
       In the output of readtags, tabs can appear in all field values (e.g., the tag name  itself
       could  contain  tabs),  which  makes  it  hard to split the line into fields. Client tools
       should use the -E option, which keeps the escape sequences in the tags file, so  the  only
       field that could contain tabs is the pattern field.

       The pattern field could:

       • Use a line number. It will look like number;" (e.g. 10;").

       • Use  a  search  pattern.  It will look like /pattern/;" or ?pattern?;".  Notice that the
         search pattern could contain tabs.

       • Combine these two, like number;/pattern/;" or number;?pattern?;".

       These are true for tags files using extended format, which is the default one.  The legacy
       format  (i.e.  --format=1) doesn't include the semicolons. It's old and barely used, so we
       won't discuss it here.

       Client tools could split the line using the following steps:

       • Find the first 2 tabs in the line, so we get the name and input field.

       • From the 2nd tab:

         • If a / follows, then the pattern delimiter is /.

         • If a ? follows, then the pattern delimiter is ?.

         • If a number follows, then:

           • If a ;/ follows the number, then the delimiter is /.

           • If a ;? follows the number, then the delimiter is ?.

           • If a ;" follows the number, then the field uses only line  number,  and  there's  no
             pattern  delimiter  (since there's no regex pattern). In this case the pattern field
             ends at the 3rd tab.

       • After the opening delimiter, find the next unescaped pattern delimiter, and  that's  the
         closing  delimiter.  It  will  be  followed by ;" and then a tab.  That's the end of the
         pattern field. By  "unescaped  pattern  delimiter",  we  mean  there's  an  even  number
         (including 0) of backslashes before it.

       • From here, split the rest of the line into fields by tabs.

       Then,  the  escape  sequences in fields other than the pattern field should be translated.
       See "Proposal" in tags(5) to know about all the escape sequences.

   Make Use of the Pattern Field
       The pattern field specifies how to find a tag in its source file. The code generating this
       field  seems  to  have  a  long history, so there are some pitfalls and it's a bit hard to
       handle. A client tool could simply require the  line:  field  and  jump  to  the  line  it
       specifies,  to  avoid  using  the pattern field. But anyway, we'll discuss how to make the
       best use of it here.

       You should take the words here merely as suggestions, and not  standards.  A  client  tool
       could definitely develop better (or simpler) ways to use the pattern field.

       From  the last section, we know the pattern field could contain a line number and a search
       pattern. When it only contains the line number, handling it is easy: you simply go to that
       line.

       The  search  pattern  resembles an EX command, but as we'll see later, it's actually not a
       valid one, so some manual work are required to process it.

       The search pattern could look like /pat/,  called  "forward  search  pattern",  or  ?pat?,
       called  "backward search pattern". Using a search pattern means even if the source file is
       updated, as long as the part containing the tag doesn't change, we could still locate  the
       tag correctly by searching.

       When  the  pattern  field  only  contains  the search pattern, you just search for it. The
       search direction (forward/backward) doesn't matter, as it's decided solely by whether  the
       -B  option  is enabled, and not the actual context. You could always start the search from
       say the beginning of the file.

       When both the search pattern and the line number are presented, you could make good use of
       the  line number, by going to the line first, then searching for the nearest occurrence of
       the pattern. A way to do this is to search both forward and backward for the pattern,  and
       when there is a occurrence on both sides, go to the nearer one.

       What's good about this is when there are multiple identical lines in the source file (e.g.
       the COMMON block in Fortran), this could help us find the  correct  one,  even  after  the
       source file is updated and the tag position is shifted by a few lines.

       Now  let's discuss how to search for the pattern. After you trim the / or ? around it, the
       pattern resembles a regex pattern. It should be a regex pattern, as required  by  being  a
       valid EX command, but it's actually not, as you'll see below.

       It  could  begin with a ^, which means the pattern starts from the beginning of a line. It
       could also end with an unescaped $ which means the pattern ends at  the  end  of  a  line.
       Let's keep this information, and trim them too.

       Now  the  remaining  part  is  the  actual  string containing the tag. Some characters are
       escaped:

       • \.

       • $, but only at the end of the string.

       • /, but only in forward search patterns.

       • ?, but only in backward search patterns.

       You need to unescape these to get the literal string. Now you could convert  this  literal
       string  to a regexp that matches it (by escaping, like re.escape in Python or regexp-quote
       in Elisp), and assemble it with ^ or $ if the  pattern  originally  has  it,  and  finally
       search for the tag using this regexp.

   Remark: About a Previous Format of the Pattern Field
       In  some  earlier versions of Universal Ctags, the line number in the pattern field is the
       actual line number minus one, for forward search  patterns;  or  plus  one,  for  backward
       search  patterns.  The  idea is to resemble an EX command: you go to the line, then search
       forward/backward for the pattern, and you can always find the correct one. But this denies
       the  purpose  of using a search pattern: to tolerate file updates. For example, the tag is
       at line 50, according to this scheme, the pattern field should be:

          49;/pat/;"

       Then let's assume that some code above are removed, and the tag is now at line 45. Now you
       can't find it if you search forward from line 49.

       Due  to  this  reason,  Universal Ctags turns to use the actual line number. A client tool
       could distinguish them by the TAG_OUTPUT_EXCMD pseudo tag,  it's  "combine"  for  the  old
       scheme, and "combineV2" for the present scheme. But probably there's no need to treat them
       differently, since "search for the nearest occurrence from the line" gives good result  on
       both schemes.

JSON OUTPUT

       Universal  Ctags  supports  JSON (strictly speaking JSON Lines) output format if the ctags
       executable is built with libjansson.  JSON output goes to standard output by default.

   Format
       Each JSON line represents a tag.

          $ ctags --extras=+p --output-format=json --fields=-s input.py
          {"_type": "ptag", "name": "JSON_OUTPUT_VERSION", "path": "0.0", "pattern": "in development"}
          {"_type": "ptag", "name": "TAG_FILE_SORTED", "path": "1", "pattern": "0=unsorted, 1=sorted, 2=foldcase"}
          ...
          {"_type": "tag", "name": "Klass", "path": "/tmp/input.py", "pattern": "/^class Klass:$/", "language": "Python", "kind": "class"}
          {"_type": "tag", "name": "method", "path": "/tmp/input.py", "pattern": "/^    def method(self):$/", "language": "Python", "kind": "member", "scope": "Klass", "scopeKind": "class"}
          ...

       A key not  starting  with  _  is  mapped  to  a  field  of  ctags.   "--output-format=json
       --list-fields" options list the fields.

       A  key starting with _ represents meta information of the JSON line.  Currently only _type
       key is used. If the value for the key is tag, the JSON line represents a  normal  tag.  If
       the value is ptag, the line represents a pseudo-tag.

       The  output format can be changed in the future. JSON_OUTPUT_VERSION pseudo-tag provides a
       change client-tools to handle the changes.  Current version is "0.0".  A  client-tool  can
       extract the version with path key from the pseudo-tag.

       The  JSON  output format is newly designed and has no limitation found in the default tags
       file format.

       • The values for kind key are represented in long-name flags.  No one-letter is here.

       • Scope names and scope kinds have distinguished keys:  scope  and  scopeKind.   They  are
         combined in the default tags file format.

   Data type used in a field
       Values  for  the  most  of all keys are represented in JSON string type.  However, some of
       them are represented in string, integer, and/or boolean type.

       "--output-format=json --list-fields" options show What kind of data type used in  a  field
       of JSON.

          $ ctags --output-format=json --list-fields
          #LETTER NAME           ENABLED LANGUAGE         JSTYPE FIXED DESCRIPTION
          F       input          yes     NONE             s--    no    input file
          ...
          P       pattern        yes     NONE             s-b    no    pattern
          ...
          f       file           yes     NONE             --b    no    File-restricted scoping
          ...
          e       end            no      NONE             -i-    no    end lines of various items
          ...

       JSTYPE column shows the data types.

       's'    string

       'i'    integer

       'b'    boolean (true or false)

       For an example, the value for pattern field of ctags takes a string or a boolean value.

SEE ALSO

       ctags(1), ctags-lang-python(7), ctags-incompatibilities(7), tags(5), readtags(1)