lunar (1) hxindex.1.gz

Provided by: html-xml-utils_7.7-1.1_amd64 bug

NAME

       hxindex - insert an index into an HTML document

SYNOPSIS

       hxindex  [-t]  [-x]  [-n|-N]  [-f]  [-r]  [-c class[,class...]] [-b base] [-i indexdb] [-s
       template]  [-u  phrase]   [-O   element[,element...]]   [-X   element[,element...]]   [--]
       [file-or-URL]

DESCRIPTION

       The  hxindex  looks  for terms to be indexed in a document, collects them, turns them into
       target anchors and creates a sorted index as an HTML list, which is inserted at the  place
       of a placeholder in the document. The resulting document is written to standard output.

       The index is inserted at the place of a comment of the form

           <!--index-->

       or between two comments of the form

           <!--begin-index-->
           ...
           <!--end-index-->

       In the latter case, all existing content between the two comments is removed first.

       Index  terms  are  either  elements  of  type  <dfn> or elements with a class attribute of
       "index". (For backward compatibility, also class attributes "index-inst"  and  "index-def"
       are recognized.) <dfn> elements (and class "index-def") are considered more important than
       elements with class "index" and will appear in bold in the generated index.

       The option -c adds additional classes, that are aliases for "index".

       By default, the contents of the element are  taken  as  the  index  term.   Here  are  two
       examples of occurrences of the index term "shoe":

           A <dfn>shoe</dfn> is a piece of clothing that...
           completed by a leather <span class="index">shoe</span>...

       If the term to be indexed is not equal to the contents of the element, the title attribute
       can be used to give the correct term:

           ... <dfn title="shoe">Shoes</dfn> are pieces of clothing that...
           ... with two leather <span class="index" title="shoe">shoes</span>...

       The title attribute must also be used when  the  index  term  is  a  subterm  of  another.
       Subterms  appear  indented in the index, under their head term. To define a subterm, use a
       title attribute with two exclamation marks ("!!") between the term and the  subterm,  like
       this:

           <dfn title="shoe!!leather">...</dfn>
           <dfn title="shoe!!invention of">...</dfn>
           <em class="index" title="shoe!!protective!!steel nosed">...</em>

       As the last example above shows, there can be multiple levels of sub-subterms.

       The  title  attribute  also  allows  multiple  index  terms to be associated with a single
       occurrence. The multiple terms are separated  with  a  vertical  bar  ("|").  Compare  the
       following examples with the ones above:

           <dfn title="shoe|boot">...</dfn>
           <dfn title="shoe!!invention of|inventions!!shoe">...</dfn>

       These  two  elements  both  insert  two terms into the index. Note that the second example
       above combines subterms and multiple terms.

       It is possible to run index on a file that already has an index. The  old  target  anchors
       and the old index will be removed before being re-generated.

OPTIONS

       The following options are supported:

       -t        By  default,  hxindex  adds  an  ID  attribute  to the element that contains the
                 occurrence of a term and also inserts an <a>  element  inside  it  with  a  name
                 attribute  equal  to  the  ID.  This  is  to  allow  old browsers that ignore ID
                 attributes, such as Netscape 4, to find  the  target  as  well.  The  -t  option
                 suppresses the <a> element.

       -x        This  option  turns  on  XML  syntax  conventions: empty elements will end in />
                 instead of > as in HTML.  -x implies -t.

       -i indexdb
                 hxindex can read an initial index from a file and write the merged collection of
                 index  terms  back to that file. This allows an index to span several documents.
                 The -i option is used to give the name of the file that contains the index.

       -b base   This option is useful in combination with -i to give the base URL  reference  of
                 the document. By default, hxindex will store links to occurrences in the indexdb
                 file in the form #anchor, but when  -b  is  given,  the  links  will  look  like
                 base#anchor instead.

                 When used in combination with -n, the title attributes of the links will contain
                 the title of the document that contains the term. The title is  inserted  before
                 the  template  (see  option  -s) and separated from it with a comma and a space.
                 E.g., if hxindex is called with

                     hxindex -i termdb -n -base myfile.html myfile.html

                 and the termdb already contains an entry for "foo" in  in  section  "3.1"  of  a
                 document  called  "file2.html"  with  title "The foos", then the generated index
                 will contain an entry such as this:

                     foo, <a href="file2.html#foo"
                       title="The foos, section 3.1">3.1</a>

       -c class,class,...
                 Normal index terms are recognized because they have a class of "index".  The  -c
                 option  adds  additional,  comma-separated  class  names that will be considered
                 aliases  for  "index".  E.g.,  -c   instance   will   make   sure   that   <span
                 class="instance">term</span> is recognized as a term for the index.

       -n        By  default, the index consists of links with "#" as the anchor text.  Option -n
                 causes the link text to consist of the section numbers of the sections in  which
                 the  terms  occur,  falling back to "without number" (see option -u below) if no
                 section number could be found. Section numbers are  found  by  looking  for  the
                 nearest  preceding start tag with a class of "secno" or "no-num". In the case of
                 "secno", the contents of that element are taken as the section  number.  In  the
                 case of "no-num" the section is assumed to have no number and hxindex will print
                 "without number" instead. These classes are also used  by  hxnum(1),  so  it  is
                 useful to run hxindex after hxnum, e.g.,

                     hxnum myfile.html | hxindex -n >mynewfile.html

       -N        With this option, the anchor text of the links in the index is the full title of
                 the section in which the term occurs. The title of the section  is  the  nearest
                 preceding  H1, H2, H3, H4, H5 or H6 element, or the document's title if there is
                 no preceding H* element. This option cannot be used together with -n.   If  both
                 are used, the last one specified wins.

       -s template
                 When  option  -n  is used, the link will have a title attribute and the template
                 determines what it contains. The default is "section %s",  where  the  %s  is  a
                 placeholder  for  the  section  number.  In  other words, the index will contain
                 entries like this:

                     term, <a href="#term" title="section 7.8">7.8</a>

                 Some examples:

                     hxindex -n -s 'chapter %s'
                     hxindex -n -s 'part %s'
                     hxindex -n -s 'hoofdstuk %s' -u 'zonder nummer'

                 This option is only useful in combination with -n

       -u phrase When option -n is used to display  section  numbers,  references  for  which  no
                 section number can be found are shown as phrase instead. The default is "??".

                 This option is only useful in combination with -n

       -f        Remove  title  attributes  that  were used for the index as well as the comments
                 that delimit the  inserted  index.  This  avoids  that  browsers  display  these
                 attributes.  Note  that  hxindex  cannot  be run again on its own output if this
                 option is used. (Mnemonic: "freeze" or "final".)

       -r        Do not ignore trailing punctuation when sorting index terms. E.g., if two  terms
                 are written as

                     <dfn>foo,</dfn>... <span class=index>foo</span>

                 hxindex will normally ignore the comma and treat them as the same term, but with
                 -r, they are treated as different. This affects trailing commas (,),  semicolons
                 (;),  colons (:), exclamations mark (!), question marks (?)  and full stops (.).
                 A final full stop is never ignored if there are two or  more  in  the  term,  to
                 protect  abbreviations  ("B.C.")  and ellipsis ("more..."). This does not affect
                 how the index term is printed (it is always printed as it appears in the  text),
                 only how it is compared to similar terms. (Mnemonic: "raw".)

       -O element,element,...
                 If -O is present, only elements with the given names will be indexed. E.g.,

                     hxindex -O span,i,em

                 means  that  hxindex  will  only  look  for  class="index"  (and  other classes,
                 according to -c) on the elements span, i and em.  The argument of -O must  be  a
                 comma-separated  list  of  element  names.   Note  that this does not affect the
                 element dfn.  It will always be indexed as a defining instance.

       -X element,element,...
                 The option -X excludes the given elements from being indexed. E.g.,

                     hxindex -X ul,ol

                 makes sure that ul and ol  elements  are  not  indexed,  even  if  they  have  a
                 class="index"  attribute.  This  does  not  exclude  their  children  from being
                 indexed. E.g.,

                     <ul class=index>
                      <li class=index>foo
                      <li class=index>bar
                      <li>baz
                     </ul>

                 will add foo and bar to the index, but not the whole content of the  ul  element
                 (foo  bar  baz).   If  both  -O  and  -X are given and an element occurs in both
                 options, it will be excluded. E.g.,

                     hxindex -X p,h1,ul -O em,span,h1,h2

                 will cause hxindex to only look for class attributes on em, span and h2, because
                 h1 is excluded.

OPERANDS

       The following operand is supported:

       file-or-URL
                 The  name of an HTML or XML file or the URL of one. If absent, or if the file is
                 "-", standard input is read instead.

EXIT STATUS

       The following exit values are returned:

       0         Successful completion.

       >0        An error occurred in parsing the HTML file.

ENVIRONMENT

       The input is assumed to be in UTF-8, but the current  locale  is  used  to  determine  the
       sorting  order  of  the  index  terms.  I.e.,  hxindex  looks  at  the LANG, LC_ALL and/or
       LC_COLLATE environment variables. See locale(1).

       To use a proxy to retrieve remote files,  set  the  environment  variables  http_proxy  or
       ftp_proxy.  E.g., http_proxy="http://localhost:8080/"

BUGS

       Assumes  UTF-8  as input. Doesn't expand character entities (apart from the standard ones:
       "&amp;", "&lt;", "&gt" and "&quot"). Instead, pipe the input through  hxunent(1)  and,  if
       needed, asc2xml(1) to convert it to UTF-8.

       Remote  files  (specified  with  a  URL)  are currently only supported for HTTP. Password-
       protected files or files that depend on HTTP "cookies" are not handled. (You can use tools
       such as curl(1) or wget(1) to retrieve such files.)

       The accessibility of an index, even when generated with option -n, is poor.

SEE ALSO

       asc2xml(1),   hxnormalize(1),  hxnum(1),  hxprune(1),  hxtoc(1),  hxunent(1),  xml2asc(1),
       locale(1), UTF-8 (RFC 2279)