bionic (1) lookup.1.gz

Provided by: lookup_1.08b-11build1_amd64 bug

NAME

   lookup - interactive file search and display

SYNOPSIS

   lookup [ args ] [ file ...  ]

DESCRIPTION

   Lookup  allows  the  quick  interactive search of text files.  It supports ASCII, JIS-ROMAN, and Japanese EUC
   Packed formated text, and has an integrated romaji¢ªkana converter.

THIS MANUAL

   Lookup is flexible for a variety of applications. This manual will, however,  focus  on  the  application  of
   searching  Jim Breen's edict (Japanese-English dictionary) and kanjidic (kanji database). Being familiar with
   the content and format of these files would be helpful. See the INFO section near the end of this manual  for
   information on how to obtain these files and their documentation.

OVERVIEW OF MAJOR FEATURES

   The following just mentions some major features to whet your appetite to actually read the whole manual (-:

   Romaji-to-Kana Converter
      Lookup can convert romaji to kana for you, even¡Èon the fly¡Éas you type.

   Fuzzy Searching
      Searches can be a bit¡Èvague¡Éor¡Èfuzzy¡É, so that you'll be able to find¡ÈÅìµþ¡Éeven if you try to search
      for¡È¤È¤¤ç¡É(the proper yomikata being¡È¤È¤¦¤¤ç¤¦¡É).

   Regular Expressions
      Uses the powerful and expressive regular expression for searching. One can easily specify complex searches
      that  affect¡ÈI  want  lines  that look like such-and-such, but not like this-and-that, but that also have
      this particular characteristic....¡É

   Wildcard ``Glob'' Patterns
      Optionally, can use well-known filename wildcard patterns instead of full-fledged regular expressions.

   Filters
      You can have lookup not list certain lines that would otherwise match your search, yet can optionally save
      them  for  quick review. For example, you could have all name-only entries from edict filtered from normal
      output.

   Automatic Modifications
      Similarly, you can do a standard search-and-replace on lines just before they  print,  perhaps  to  remove
      information  you  don't  care  to see on most searches. For example, if you're generally not interested in
      kanjidic's info on Chinese readings, you can have them removed from lines before printing.

   Smart Word-Preference Mode
      You can have lookup list only entries with whole words that match your search (as opposed to  an  embedded
      match,  such  as finding¡Èthe¡Éinside¡Èthem¡É), but if no whole-word matches exist, will go ahead and list
      any entry that matches the search.

   Handy Features
      Other handy features include a dynamically settable and parameterized prompt,  automatic  highlighting  of
      that  part  of  the  line  that  matches your search, an output pager, readline-like input with horizontal
      scrolling for long input lines, a¡È.lookup¡Éstartup file, automated programability, and  much  more.  Read
      on!

REGULAR EXPRESSIONS

   Lookup  makes  liberal  use of regular expressions (or regex for short) in controlling various aspects of the
   searches. If you are not familiar with the important concepts of regexes, read the tutorial appendix of  this
   manual before continuing.

JAPANESE CHARACTER ENCODING METHODS

   Internally,  lookup works with Japanese packed-format EUC, and all files loaded must be encoded similarly. If
   you have files encoded in JIS or Shift-JIS, you must first convert them to EUC before loading (see  the  INFO
   section for programs that can do this).

   Interactive  input  and output encoding, however, may be be selected via the -jis, -sjis, and -euc invocation
   flags (default is -euc), or by various commands to the program (described later).

   Make sure to use the encoding appropriate for your system.  If you're using kterm under the X Window  System,
   you  can  use  lookup's  -jis  flag  to  match  kterm's  default JIS encoding. Or, you might use kterm's¡È-km
   euc¡Éstartup option (or menu selection) to put kterm into EUC mode. Also,  I  have  found  kterm's  scrollbar
   (¡È-sb -sl 500¡É) to be quite useful.

   With  many¡ÈEnglish¡Éfonts  in Japan, the character that normally prints as a backslash (halfwidth version of
   ¡À) in The States appears as a yen symbol (the half-width version of ¡ï). How it will appear on  your  system
   is  a  function  of what font you use and what output encoding method you choose, which may be different from
   the font and method that was used to print this manual (both of which may be different from what's printed on
   your keyboard's appropriate key).  Make sure to keep this in mind while reading.

STARTUP

   Let's assume that your copy of edict is in ~/lib/edict. You can start the program simply with

           lookup ~/lib/edict

   You'll note that lookup spends some time building an index before the default¡Èlookup> ¡Éprompt appears.

   Lookup  gains much of its search speed by constructing an index of the file(s) to be searched. Since building
   the index can be time consuming itself, you can have lookup write the built index  to  a  file  that  can  be
   quickly  loaded  the  next  time  you  run the program.  Index files will be given a¡È.jin¡É(Jeffrey's Index)
   ending.

   Let's build the indices for edict and kanjidic now:

           lookup -write ~/lib/edict ~/lib/kanjidic

   This will create the index files
          ~/lib/edict.jin
          ~/lib/kanjidic.jin
   and exit.

   You can now re-start lookup , automatically using the pre-computed index files as:

          lookup ~/lib/edict ~/lib/kanjidic

   You should then be presented with the prompt without having to wait for the index to be constructed (but  see
   the section on Operating System concerns for possible reasons of delay).

INPUT

   There  are  basically  two  types of input: searches and commands.  Commands do such things as tell lookup to
   load more files or set flags. Searches report lines of a file that match some search specifier  (where  lines
   to search for are specified by one or more regular expressions).

   The  input  syntax  may perhaps at first seem odd, but has been designed to be powerful and concise. A bit of
   time invested to learn it well will pay off greatly when you need it.

BRIEF EXAMPLE

   Assuming you've started lookup with edict and kanjidic as noted above, let's try a  few  searches.  In  these
   examples, the
       ¡Èsearch [edict]> ¡É
   is the prompt.  Note that the space after the¡Æ>¡Çis part of the prompt.

   Given the input:

     search [edict]> tranquil

   lookup  will  report  all  lines  with  the string¡Ètranquil¡Éin them. There are currently about a dozen such
   lines, two of which look like:

     °Â¤é¤« [¤ä¤¹¤é¤«] /peaceful (an)/tranquil/calm/restful/
     °Â¤é¤® [¤ä¤¹¤é¤®] /peace/tranquility/

   Notice that lines with¡Ètranquil¡Éand¡Ètranquility¡Ématched? This is because¡Ètranquil¡Éwas embedded  in  the
   word¡Ètranquility¡É.    You  could  restrict  the  search  to  only  the  word¡Ètranquil¡Éby  prepending  the
   special¡Èstart of word¡Ésymbol¡Æ<¡Çand appending the special¡Èend of word¡Ésymbol¡Æ>¡Çto the regex, as in:

     search [edict]> <tranquil>

   This is the regular expression that says¡Èthe beginning of a word, followed by a¡Æt¡Ç,¡Ær¡Ç, ...,¡Æl¡Ç, which
   is at the end of a word.¡ÉThe current version of edict has just three matching entries.

   Let's try another:

     search [edict]> fukushima

   This  is  a  search  for  the¡ÈEnglish¡Éfukushima -- ways to search for kana or kanji will be explored later.
   Note that among the several lines selected and printed are:

     ÉûÅç [¤Õ¤¯¤·¤Þ] /Fukushima (pn,pl)/
     ÌÚÁ¾Ê¡Åç [¤¤½¤Õ¤¯¤·¤Þ] /Kisofukushima (pl)/

   By default, searches are done in a case-insensitive manner --¡ÆF¡Çand¡Æf¡Çare treated the same by lookup,  at
   least so far as the matching goes.  This is called case folding.

   Let's  give  a  command to turn this option off, so that¡Æf¡Çand¡ÆF¡Çwon't be considered the same.  Here's an
   odd point about lookup's input syntax: the default setting is that all command lines must begin with a space.
   The  space  is  the  (default)  command-introduction character and tells the input parser to expect a command
   rather than a search regular expression.  It is a common mistake at first to forget the  leading  space  when
   issuing a command.  Be careful.

   Try  the  command¡È fold¡Éto  report the current status of case-folding.  Notice that as soon as you type the
   space, the prompt changes to
     ¡Èlookup command> ¡É
   as a reminder that now you're typing a command rather than a search specification.

     lookup command>  fold

   The reply should be¡Èfile #0's case folding is on¡É

   You can actually turn it off with¡È fold off¡É.  Now try the search for¡Èfukushima¡Éagain. Notice  that  this
   time  the  entries  with¡ÈFukushima¡Éaren't  listed?  Now  try the search string¡ÈFukushima¡Éand see that the
   entries with¡Èfukushima¡Éaren't listed.

   Case folding is usually very convenient (it also makes corresponding katakana and hiragana match  the  same),
   so don't forget to turn it back on:

     lookup command>  fold on

JAPANESE INPUT

   Lookup  has  an  automatic  romaji¢ªkana  converter.  A  leading¡Æ/¡Çindicates  that romaji is to follow. Try
   typing¡È/tokyo¡Éand you'll see it convert to¡È/¤È¤¤ç¡Éas you type. When you hit return, lookup will list  all
   lines  that  have a¡È¤È¤¤ç¡Ésomewhere in them. Well, sort of.  Look carefully at the lines which match. Among
   them (if you had case folding back on) you'll see:

     ¥¥ê¥¹¥È¶µ [¥¥ê¥¹¥È¤¤ç¤¦] /Christianity/
     Åìµþ [¤È¤¦¤¤ç¤¦] /Toukyou (pl)/Tokyo/current capital of Japan/
     Æ̶À [¤È¤Ã¤¤ç¤¦] /convex lens/

   The first one has¡È¤È¤¤ç¡Éin it (as¡È¥È¤¤ç¡É, where the katakana¡È¥È¡Ématches in a case-insensitive manner to
   the  hiragana¡È¤È¡É),  but  you  might consider the others unexpected, since they don't have¡È¤È¤¤ç¡Éin them.
   They're close (¡È¤È¤¦¤¤ç¡Éand¡È¤È¤Ã¤¤ç¡É), but not exact. This is the  result  of  lookup's¡Èfuzzification¡É.
   Try  the command¡È fuzz¡É(again, don't forget the command-introduction space).  You'll see that fuzzification
   is turned on.  Turn it off with¡È fuzz off¡Éand try¡È/tokyo¡É(which will convert as you  type)  again.   This
   time  you  only  get  the lines which have¡È¤È¤¤ç¡Éexactly (well, case folding is still on, so it might match
   katakana as well).

   In a fuzzy search, length of vowels is ignored --¡È¤È¡Éis considered the same as¡È¤È¤¦¡É, for example.  Also,
   the  presence  or  absence of any¡È¤Ã¡Écharacter is ignored, and the pairs ¤¸ ¤Â, ¤º ¤Å, ¤¨ ¤ñ, and ¤ª ¤ò are
   considered identical in a fuzzy search.

   It might be convenient to  consider  a  fuzzy  search  to  be  a¡Èpronunciation  search¡É.     Special  note:
   fuzzification will not be performed if a regular expression¡È*¡É,¡È+¡É,or¡È?¡Émodifies a non-ASCII character.
   This is not an issue when input patterns are filename-like wildcard patterns (discussed below).

   In addition to kana fuzziness, there's one special case for kanji when fuzziness is on.  The  kanji  repeater
   mark¡È¡¹¡Éwill be recognized such that¡È»þ¡¹¡Éand¡È»þ»þ¡Éwill match each-other.

   Turn  fuzzification  back  on  (¡Èfuzz  on¡É), and search for all whole words which sound like¡Ètokyo¡É. That
   search would be specified as:

     search [edict]> /<tokyo>

   (again, the¡Ètokyo¡Éwill be converted to¡È¤È¤¤ç¡Éas you type).  My copy of edict has the three lines

     Åìµþ [¤È¤¦¤¤ç¤¦] /Toukyou (pl)/Tokyo/current capital of Japan/
     Æõö [¤È¤Ã¤¤ç] /special permission/patent/
     Æ̶À [¤È¤Ã¤¤ç¤¦] /convex lens/

   This kind of whole-word romaji-to-kana search  is  so  common,  there's  a  special  short  cut.  Instead  of
   typing¡È/<tokyo>¡É,  you  can  type¡È[tokyo]¡É.   The  leading¡Æ[¡Çmeans¡Èstart romaji¡Éand¡Èstart of word¡É.
   Were you to type¡È<tokyo>¡Éinstead (without a leading¡Æ/¡Çor¡Æ[¡Çto indicate romaji-to-kana conversion),  you
   would get all lines with the English whole-word¡Ètokyo¡Éin them.  That would be a reasonable request as well,
   but not what we want at the moment.

   Besides the kana conversion, you can use any cut-and-paste that your windowing system might  provide  to  get
   Japanese  text  onto the search line. Cut¡È¤È¤¤ç¡Éfrom somewhere and paste onto the search line. When hitting
   enter to run the search, you'll notice that it is done without fuzzification (even if the fuzzification  flag
   was¡Èon¡É).   That's  because  there's no leading¡Æ/¡Ç. Not only does a leading¡Æ/¡Çndicate that you want the
   romaji-to-kana conversion, but that you want it done fuzzily.

   So, if you'd like fuzzy cut-and-paste, just type a leading¡Æ/¡Çefore pasting (or  go  back  and  prepend  one
   after pasting).

   These  examples  have  all been pretty simple, but you can use all the power that regexes have to offer. As a
   slightly   more   complex   example,   the   search¡È<gr[ea]y>¡Éwould   look   for   all   lines   with   the
   words¡Ègrey¡Éor¡Ègray¡Éin  them.   Since  the¡Æ[¡Çisn't the first character of the line, it doesn't mean what
   was   mentioned   above   (start-of-word   romaji).    In    this    case,    it's    just    the    regular-
   expression¡Èclass¡Éindicator.

   If  you  feel  more  comfortable  using  filename-like¡È*.txt¡Éwildcard  patterns,  you can use the¡Èwildcard
   on¡Écommand to have patterns be considered this way.

   This has been a quick introduction to the basics of lookup.

   It can be very powerful and much more complex. Below is a detailed  description  of  its  various  parts  and
   features.

READLINE INPUT

   The  actual keystrokes are read by a readline-ish package that is pretty standard. In addition to just typing
   away, the following keystrokes are available:

     ^B  / ^F     move left/right one character on the line
     ^A  / ^E     move to the start/end of the line
     ^H  / ^G     delete one character to the left/right of the cursor
     ^U  / ^K     delete all characters to the left/right of the cursor
     ^P  / ^N     previous/next lines on the history list
     ^L or ^R     redraw the line
     ^D           delete char under the cursor, or EOF if line is empty
     ^space       force romaji conversion (^@ on some systems)

   If automatic romaji-to-kana conversion is turned on (as it is by default), there are certain situations where
   the  conversion  will be done, as we saw above. Lower-case romaji will be converted to hiragana, while upper-
   case romaji to katakana.  This usually won't matter, though, as case folding will treat hiragana and katakana
   the same in the searches.

   In  exactly what situations the automatic conversion will be done is intended to be rather intuitive once the
   basic idea is learned.  However, at any time, one can use control-space to convert the ASCII to the  left  of
   the  cursor to kana. This can be particularly useful when needing to enter kana on a command line (where auto
   conversion is never done; see below)

ROMAJI FLAVOR

   Most flavors of romaji are recognized. Special or  non-obvious  items  are  mentioned  below.  Lowercase  are
   converted to hiragana, uppercase to katakana.

   Long vowels can be entered by repeating the vowel, or with¡Æ-¡Çor¡Æ^¡Ç.

   In  situations  where  an¡Èn¡Écould  be  vague,  as in¡Èna¡Ébeing ¤Ê or ¤ó¤¢, use a single quote to force ¤ó.
   Therefore,¡Ökenichi¡×¢ª¤±¤Ë¤Á while¡Öken'ichi¡×¢ª¤±¤ó¤¤¤Á.

   The romaji has been richly extended with many non-standard combinations such  as  ¤Õ¤¡  or  ¤Á¤§,  which  are
   represented in intuitive ways:¡Öfa¡×¢ª¤Õ¤¡,¡Öche¡×¢ª¤Á¤§. etc.

   Various other mappings of interest:

     wo ¢ª¤ò     we¢ª¤ñ      wi¢ª¤ð
     VA ¢ª¥ô¥¡   VI¢ª¥ô¥£    VU¢ª¥ô      VE¢ª¥ô¥§    VO¢ª¥ô¥©
     di ¢ª¤Â     dzi¢ª¤Â     dya¢ª¤Â¤ã   dyu¢ª¤Â¤å   dyo¢ª¤Â¤ç
     du ¢ª¤Å     tzu¢ª¤Å     dzu¢ª¤Å

   (the following kana are all smaller versions of the regular kana)

     xa ¢ª¤¡     xi¢ª¤£      xu¢ª¤¥      xe¢ª¤§      xo¢ª¤©
     xu ¢ª¤¥     xtu¢ª¤Ã     xwa¢ª¤î     xka¢ª¥õ     xke¢ª¥ö
     xya¢ª¤ã     xyu¢ª¤å     xyo¢ª¤ç

INPUT SYNTAX

   Any  input  line beginning with a space (or whichever character is set as the command-introduction character)
   is processed as a command to lookup rather than a search spec.  Automatic kana conversion is  never  done  on
   these lines (but forced conversion with control-space may be done at any time).

   Other lines are taken as search regular expressions, with the following special cases:

   ?  A  line  consisting  of a single question mark will report the current command-introduction character (the
      default is a space, but can be changed with the¡Ècmdchar¡Écommand).

   =  If a line begins with¡Æ=¡Ç, the line (without the¡Æ=¡Ç) is taken as a search regular  expression,  and  no
      automatic  (or  internal  --  see  below)  kana  conversion  is done anywhere on the line (although again,
      conversion can always be forced with control-space).  This can be used to  initiate  a  search  where  the
      beginning  of  the  regex  is the command-introduction character, or in certain situations where automatic
      kana conversion is temporarily not desired.

   /  A line beginning with¡Æ/¡Çindicates romaji input for the whole line.   If  automatic  kana  conversion  is
      turned  on,  the  conversion  will be done in real-time, as the romaji is typed. Otherwise it will be done
      internally once the line is entered.  Regardless, the presence of the leading¡Æ/¡Çindicates that any  kana
      (either converted or cut-and-pasted in) should be¡Èfuzzified¡Éif fuzzification is turned on.

      As an addition to the above, if the line doesn't begin with¡Æ=¡Çor the command-introduction character (and
      automatic conversion is turned on),¡Æ/¡Ç anywhere on the  line  initiates  automatic  conversion  for  the
      following word.

   [  A  line  beginning  with¡Æ[¡Çis  taken to be romaji (just as a line beginning with¡Æ/¡Ç, and the converted
      romaji is subject to  fuzzification  (if  turned  on).   However,  if¡Æ[¡Çis  used  rather  than¡Æ/¡Ç,  an
      implied¡Æ<¡Ç¡Èbeginning of word¡Éis prepended to the resulting kana regex.  Also, any ending¡Æ]¡Çon such a
      line is converted to the¡Èending of word¡Éspecifier¡Æ>¡Çin the resulting regex.

   In addition to the above, lines may have certain prefixes and suffixes to control aspects of  the  search  or
   command:

   !  Various  flags can be toggled for the duration of a particular search by prepending a¡È!!¡Ésequence to the
      input line.

      Sequences are shown below, along with commands related to each:

       !F! ¡Ä  Filtration is toggled for this line (filter)
       !M! ¡Ä  Modification is toggled for this line (modify)
       !w! ¡Ä  Word-preference mode is toggled for this line (word)
       !c! ¡Ä  Case folding is toggled for this line (fold)
       !f! ¡Ä  Fuzzification is toggled for this line (fuzz)
       !W! ¡Ä  Wildcard-pattern mode is toggled for this line (wildcard)
       !r! ¡Ä  Raw. Force fuzzification off for this line
       !h! ¡Ä  Highlighting is toggled for this line (highlight)
       !t! ¡Ä  Tagging is toggled for this line (tag)
       !d! ¡Ä  Displaying is on for this line (display)

      The letters can be combined, as in¡È!cf!¡É.

      The final¡Æ!¡Ç can be omitted if the first character after the sequence is not an ASCII letter.

      If no letters are given (¡È!!¡É).¡È!f!¡Éis the default.

      These last two points can be conveniently combined in the common case  of¡È!/romaji¡Éwhich  would  be  the
      same as¡È!f!/romaji¡É.

      The special sequence¡È!?¡Élists the above, as well as indicates which are currently turned on.

      Note that the letters accepted in a¡È!!¡Ésequence are many of the indicators shown by the¡Èfiles¡Écommand.

   +  A¡Æ+¡Çprepended  to  anything above will cause the final search regex to be printed. This can be useful to
      see when and what kind of fuzzification and/or internal kana conversion is happening. Consider:

        search [edict]> +/¤ï¤«¤ë
        a match is¡È¤ï[¤¡¤¢¡¼]*¤Ã?¤«[¤¡¤¢¡¼]*¤ë[¤¥¤¦¤ª¤©¡¼]*¡É

      Due to the¡Èleading¡É/ the kana is fuzzified, which explains the somewhat  complex  resulting  regex.  For
      comparison, note:

        search [edict]> +¤ï¤«¤ë
        a match is¡È¤ï¤«¤ë¡É
        search [edict]> +!/¤ï¤«¤ë
        a match is¡È¤ï¤«¤ë¡É

      As   the¡Æ+¡Çshows,   these  are  not  fuzzified.  The  first  one  has  no  leading¡Æ/¡Çor¡Æ[¡Çto  induce
      fuzzification, while the second has the¡Æ!¡Çline prefix (which is the default  version  of¡È!f!¡É),  which
      toggles fuzzification mode to¡Èoff¡Éfor that line.

   ,  The  default  of  all  searches  and  most  commands is to work with the first file loaded (edict in these
      examples). One can change this default (see the¡Èselect¡Écommand) or, by appending a comma+digit  sequence
      at  the  end  of  an  input  line,  force  that  line  to  work  with  another  previously-loaded file. An
      appended¡È,1¡Éworks with first extra file loaded (in these examples,  kanjidic).   An  appended¡È,2¡Éworks
      with the 2nd extra file loaded, etc.

      An  appended¡È,0¡Éworks  with  the  original  first  file  (and can be useful if the default file has been
      changed via the¡Èselect¡Écommand).

      The following sequence shows a common usage:

        search [edict]> [¤È¤¤ç¤È]
        ÅìµþÅÔ [¤È¤¦¤¤ç¤¦¤È] /Tokyo Metropolitan area/

      cutting and pasting the ÅÔ from above, and adding a¡È,1¡Éto search kanjidic:

        search [edict]> ÅÔ,1
        ÅÔ 4554 N4769 S11  ..... ¥È ¥Ä ¤ß¤ä¤³ {metropolis} {capital}

FILENAME-LIKE WILDCARD MATCHING

   When wildcard-pattern mode is selected, patterns are considered as extended.Q "*.txt" "-like" patterns.  This
   is  often  more  convenient  for  users  not familiar with regular expressions. To have this mode selected by
   default, put

      default wildcard on

   into your¡È.lookup¡Éfile (see¡ÈSTARTUP FILE¡Ébelow).

   When  wildcard  mode  is  on,  only  ¡È*¡É,¡È?¡É,¡È+¡É,and¡È.¡É,are  effected.   See  the   entry   for   the
   ¡Èwildcard¡Écommand below for details.

   Other  features,  such  as  the  multiple-pattern  searches  (described  below)  and other regular-expression
   metacharacters are available.

MULTIPLE-PATTERN SEARCHES

   You can put multiple patterns in a single search specifier.  For example consider

     search [edict]> china||japan

   The first part (¡Èchina¡É) will select all lines that have¡Èchina¡Éin them. Then, from among those lines, the
   second  part  will  select  lines  that  have¡Èjapan¡Éin  them.  The¡È||¡Éis not part of any pattern -- it is
   lookup's¡Èpipe¡Émechanism.

   The above example is very different from the single pattern ¡Èchina|japan¡Éwhich would select any  line  that
   had  either¡Èchina¡Éor¡Èjapan¡É.   With¡Èchina||japan¡É,  you  get  lines  that  have¡Èchina¡Éand  then  also
   have¡Èjapan¡Éas well.

   Note  that  it  is   also   different   from   the   regular   expression¡Èchina.*japan¡É(or   the   wildcard
   pattern¡Èchina*japan¡É)which  would  select  lines  having¡Èchina,  then maybe some stuff, then japan¡É.  But
   consider the case when¡Èjapan¡Écomes on the line before¡Èchina¡É. Just for  your  comparison,  the  multiple-
   pattern     specifier¡Èchina||japan¡Éis     pretty     much     the    same    as    the    single    regular
   expression¡Èchina.*japan|japan.*china¡É.

   If you use¡È|!|¡Éinstead of¡È||¡É, it will mean¡È...and then lines not matching...¡É.

   Consider a way to find all lines of kanjidic that do have a Halpern number, but don't have a Nelson number:

       search [edict]> <H\d+>|!|<N\d+>

   If  you  then  wanted  to   restrict   the   listing   to   those   that   also   had   a¡Èjinmeiyou¡Émarking
   (kanjidic's¡ÈG9¡Éfield) and had a reading of ¤¢¤, you could make it:

       search [edict]> <H\d+>|!|<N\d+>||<G9>||<¤¢¤>

   A prepended¡Æ+¡Çwould explain:

       a match is¡È<H\d+>¡É
       and not¡È<N\d+>¡É
       and¡È<G9>¡É
       and¡È<¤¢¤>¡É

   The¡È|!|¡Éand¡È||¡Écan  be  used  to  make  up  to  ten  separate  regular  expressions  in  any  one  search
   specification.

   Again, it is important to stress that¡È||¡Édoes not mean¡Èor¡É(as it does in  a  C  program,  or  as¡Æ|¡Çdoes
   within   a   regular   expression).   You  might  find  it  convenient  to  read¡È||¡Éas¡Èand  also¡É,  while
   reading¡È|!|¡Éas¡Èbut not¡É.

   It is also important to stress that any whitespace around the¡È||¡Éand¡È|!|¡Éconstruct is  not  ignored,  but
   kept as part of the regex on either side.

COMBINATION SLOTS

   Each  file,  when  loaded, is assigned to a¡Èslot¡Évia which subsequent references to the file are then made.
   The slot may then be searched, have filters and flags set, etc.

   A special kind of slot, called a¡Ècombination slot¡É,rather than representing a single  file,  can  represent
   multiple  previously-loaded slots. Searches against a combination slot (or¡Ècombo slot¡Éfor short) search all
   those previously-loaded slots associated with it (called¡Ècomponent slots¡É).  Combo slots are  set  up  with
   the combine command.

   A Combo slot has no filter or modify spec, but can have a local prompt and flags just like normal file slots.
   The flags, however, have special meanings with combo slots. Most combo-slot flags act as a mask  against  the
   component-slot  flags;  when acted upon as a member of the combo, a component-slot's flag will be disabled if
   the corresponding combo-slot's flag is disabled.

   Exceptions to this are the autokana, fuzz, and tag flags.

   The autokana and fuzz flags governs a combo slot exactly the same as a regular file slot.   When  a  slot  is
   searched  as  a  component  of  a  combination  slot, the component slot's fuzz (and autokana) flags, or lack
   thereof, are ignored.

   The tag flag is quite different altogether; see the tag command for complete information.

   Consider the following output from the files command:

     ¨®¨¬¨³¨¬¨¬¨¬¨¬¨¸¨¬¨¬¨³¨¬¨¬¨¬¨³¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬
     ¨ 0¨F wcfh d¨¢a I ¨ 2762k¨/usr/jfriedl/lib/edict
     ¨ 1¨FM cf  d¨¢a I ¨  705k¨/usr/jfriedl/lib/kanjidic
     ¨ 2¨F  cfh@d¨¢a   ¨    1k¨/usr/jfriedl/lib/local.words
     ¨*3¨FM cfhtd¨¢a   ¨ combo¨kotoba (#2, #0)
     ¨±¨¬¨µ¨¬¨¬¨¬¨¬¨º¨¬¨¬¨µ¨¬¨¬¨¬¨µ¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬

   See the discussion of the files command below for basic explanation of the output.

   As can be seen, slot #3 is a combination slot with the name¡Èkotoba¡Éwith component slots two and zero.  When
   a  search  is  initiated on this slot, first slot #2¡Èlocal.words¡Éwill be searched, then slot #0¡Èedict¡É.
   Because the combo slot's filter flag is on, the component slots'  filter  flag  will  remain  on  during  the
   search.   The  combo  slot's  word flag is off, however, so slot #0's word flag will be forced off during the
   search.

   See the combine command for information about creating combo slots.

PAGER

   Lookup has a built in pager (a'la more).  Upon filling a screen with text, the string
       --MORE [space,return,c,q]--
   is shown. A space will allow another screen of text; a return will allow one more  line.  A¡Æc¡Ç  will  allow
   output text to continue unpaged until the next command. A¡Æq¡Ç will flush output of the current command.

   If supported by the OS, lookup's idea of the screen size is automatically set upon startup and window resize.
   Lookup must know the width of the screen in doing both the horizontal input-line scrolling, and  for  knowing
   when a long line wraps on the screen.

   The pager parameters can be set manually with the¡Èpager¡Écommand.

COMMANDS

   Any line intended to be a command must begin with the command-introduction character (the default is a space,
   but can be set via the¡Ècmdchar¡Écommand).  However, that character is not part of  the  command  itself  and
   won't be shown in the following list of commands.

   There  are  a  number  of  commands  that work with the selected file or selected slot (both meaning the same
   thing).  The selected file is the one indicated by an appended comma+digit, as mentioned above.  If  no  such
   indication  is  given,  the  default selected file is used (usually the first file loaded, but can be changed
   with the¡Èselect¡Écommand).

   Some commands accept  a  boolean  argument,  such  as  to  turn  a  flag  on  or  off.  In  all  such  cases,
   a¡È1¡Éor¡Èon¡Émeans  to  turn  the flag on, while a¡È0¡Éor¡Èoff¡Éis used to turn it off.  Some flags are per-
   file (¡Èfuzz¡É,¡Èfold¡É, etc.), and a command to set such a flag normally sets the flag for the selected file
   only. However, the default value inherited by subsequently loaded files can be set by prepending¡Èdefault¡Éto
   the command. This is particularly useful in the startup file before any files are  loaded  (see  the  section
   STARTUP FILE).

   Items separated by¡Æ|¡Çare mutually exclusive possibilities (i.e. a boolean argument is¡È1|on|0|off¡É).

   Items  shown  in  brackets (¡Æ[¡Çand¡Æ]¡Ç) are optional. All commands that accept a boolean argument to set a
   flag or mode do so optionally -- with no argument the command will report the current status of the  mode  or
   flag.

   Any command that allows an argument in quotes (such as load, etc.)  allow the use of single or double quotes.

   The commands:

   [default] autokana [boolean]
      Automatic  romaji  ¢ª kana conversion for the selected file is turned on or off (default is on).  However,
      if¡Èdefault¡Éis specified, the value to be inherited as the default by subsequently-loaded  files  is  set
      (or reported).

      Can be temporarily disabled by a prepended¡Æ=¡Ç,as described in the INPUT SYNTAX section.

   clear|cls
      Attempts  to  clear  the  screen.  If  you're  using a kterm it'll just output the appropriate tty control
      sequence. Otherwise it'll try to run the¡Èclear¡Écommand.

   cmdchar ['one-byte-char']
      The default command-introduction character is a space, but it may be changed via this command. The  single
      quotes surrounding the character are required. If no argument is given, the current value is printed.

      An  input line consisting of a single question mark will also print the current value (useful for when you
      don't know the current value).

      Woe to the one that sets the command-introduction  character  to  one  of  the  other  special  input-line
      characters, such as¡Æ+¡Ç,¡Æ/¡Ç, etc.

   combine ["name"] [ num += ] slotnum ...
      Creates  or  adds  file  slots  to  a  combination  slot  (see  the  COMBINATION SLOTS section for general
      information).  Note that¡Ècombo¡Émay be used as the command as well.

      Assuming for this example that slots 0-2 are loaded with the files curly, moe, and larry, we can create  a
      combination slot that will reference all three:

        combo "three stooges" 2, 0, 1

      The command will report

        creating combo slot #3 (three stooges): 2 0 1

      The  name is optional, and will appear in the files list, and also maybe be used to specify the slot as an
      argument to the select command.

      A search via the newly created combo slot would search in the order specified on the combo  command  line:
      first larry, then curly, and finally moe.

      If  you  later  load  another  file  (say, jeffrey to slot #4), you can then add it to the previously made
      combo:

        combo 3 += 4

      (the¡È+=¡Éwording comes from the C programming language  where  it  means¡Èadd  on  to¡É).   Adding  to  a
      combination always adds slots to the end of the list.

      You can take the opportunity of adding the slot to also change the name, if you like:

        combo "four stooges" 3 += 4

      The reply would be
        adding to combo slot #3(four stooges): 4

      A  file  slot  can  be  a component of any particular combo slot only once.  When reporting the created or
      added slot numbers, the number will appear in parenthesis if it had already been a member of the list.

      Furthermore, only file slots can be component members of combo slots. Attempting to combine combo  slot  X
      to combo slot Y will result in having X's component file slots (rater than the combo slot itself) added to
      Y.

   command debug [boolean]
      Sets the internal command parser debugging flag on or off (default is off).

   debug [boolean]
      Sets the internal general-debugging flag on or off (default is off).

   describe specifier
      This command will tell you how a character (or each character in a  string)  is  encoded  in  the  various
      encoding methods:

          lookup command>  describe "µ¤"
          ¡Èµ¤¡Éas  EUC  is 0xb5a4 (181 164; 265 \244)
                as  JIS  is 0x3524 ( 53  36;  65 \044 "5$")
                as KUTEN is   2104 ( 0x1504;  25 \004)
                as S-JIS is 0x8b1f (139  31; 213 \037)

      The  quotes  surrounding  the  character  or string to describe are optional.  You can also give a regular
      ASCII character and have the double-width version of  the  character  described....  indicating¡ÈA¡É,  for
      example,  would  describe¡È£Á¡É.     Specifier  can  also  be  a four-digit kuten value, in which case the
      character with that kuten will be described.

      If a four-digit specifier has a hex digit in it, or if it is preceded by¡È0x¡É, the value is  taken  as  a
      JIS  code. You can precede the value by¡Èjis¡É,¡Èsjis¡É,¡Èeuc¡É, or¡Èkuten¡Éto force interpretation to the
      requested code.

      Finally, specifier can be a string of stripped JIS (JIS w/o the kanji-in and kanji-out codes, or with  the
      codes but without the escape characters in them).  For example¡ÈF|K\¡Éwould describe the two characters Æü
      and ËÜ.

   encoding [euc|sjis|jis]
      The same as the -euc, -jis, and -sjis command-line options, sets the encoding method for interactive input
      and output (or reports the current status).  More detail over the output encoding can be achieved with the
      output encoding command. A separate encoding for input can be set with the input encoding command.

   files [ - | long ]
      Lists what files are loaded in what slots, and some status information about them, as with:

      ¨*0¨F wcfh d¨¢a I ¨ 3749k¨/usr/jeff/lib/edict
      ¨ 1¨FM cf  d¨¢a I ¨  754k¨/usr/jeff/lib/kanjidic

        ¨®¨¬¨³¨¬¨¬¨¬¨¬¨¬¨¸¨¬¨¬¨³¨¬¨¬¨¬¨³¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬
        ¨ 0¨F wcf h d ¨¢a I ¨ 2762k¨/usr/jfriedl/lib/edict
        ¨ 1¨FM cf   d ¨¢a I ¨  705k¨/usr/jfriedl/lib/kanjidic
        ¨ 2¨F  cfWh@d ¨¢a   ¨    1k¨/usr/jfriedl/lib/local.words
        ¨*3¨FM cf htd ¨¢a   ¨ combo¨kotoba (#2, #0)
        ¨ 4¨   cf   d ¨¢a   ¨  205k¨/usr/dict/words
        ¨±¨¬¨µ¨¬¨¬¨¬¨¬¨¬¨º¨¬¨¬¨µ¨¬¨¬¨¬¨µ¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬

      The first section is the slot number, with a¡È*¡Ébeside the default slot (as set by the select command).

      The second section shows per-slot flags and status. Letters are shown if the flag is on, omitted  if  off.
      In the list below, related commands are given for each item:

        F ¡Ä if there is a filter {but '#' if disabled}. (filter)
        M ¡Ä if there is a modify spec {but '%' if disabled}. (modify)
        w ¡Ä if word-preference mode is turned on. (word)
        c ¡Ä if case folding is turned on. (fold)
        f ¡Ä if fuzzification is turned on. (fuzz)
        W ¡Ä if wildcard-pattern mode is turned on (wildcard)
        h ¡Ä if highlighting is turned on. (highlight)
        t ¡Ä if there is a tag {but @ if disabled} (tag)
        d ¡Ä if found lines should be displayed (display)
        ¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡
        a ¡Ä if autokana is turned on (autokana)
        P ¡Ä if there is a file-specific local prompt (prompt)
        I ¡Ä if the file is loaded with a precomputed index (load)
        d ¡Ä if the display flag is on (display)
      Note  that  the letters in the upper section directly correspond to the¡È!!¡Ésequence characters described
      in the INPUT SYNTAX section.

      If there is a digit at the end of the flag section, it indicates that only #/10 of the  file  is  actually
      loaded  into  memory (as opposed to the file having been completely loaded). Unloaded files will be loaded
      while lookup is idle, or when first used.

      If the slot is a combination slot (as slot #3 is in the  example  above),  that  is  noted  in  the  third
      section,  and  the  combination  name  and  component  slot  numbers  are  noted  in the fourth. Also, for
      combination slots (which have no filter or modify specifications, only the flags), F and/or M are shown if
      the corresponding mode is allowed during searches via the combo slot. See the tag command for info about t
      with respect to combination slots.

      If an argument (either¡È-¡Éor¡Èlong¡Éwill work) is given to the command, a short message  about  what  the
      flags mean is also printed.

   filter ["label"] [!] /regex/[i]
      Sets  the  filter for the selected slot (which must contain a file and not a combination).  If a filter is
      set and active for a file, any line matching the given regex is filtered from the  output  (if  the¡Æ!¡Çis
      put  before  the  regex,  any line not matching the regex is filtered).  The label , which isn't required,
      merely acts as documentation in various diagnostics.

      As an example, consider that edict lines often have¡È(pn)¡Éon them to indicate that the given English is a
      place  name.  Often  these  place names can be a bother, so it would be nice to elide them from the output
      unless specifically requested.  Consider the example:

        lookup command>  filter "name" /(pn)/
        search [edict]> [¤¤Î]
        µ¡Ç½ [¤¤Î¤¦] /function/faculty/
        µ¢Ç¼ [¤¤Î¤¦] /inductive/
        ºòÆü [¤¤Î¤¦] /yesterday/
        ¢ã3 "name" lines filtered¢ä

      In the example,¡Æ/¡Çcharacters are used to delimit the start and stop of the regex (as is common with many
      programs).  However,  any character can be used. A final¡Æi¡Ç, if present, indicates that the regex should
      be applied in a case-insensitive manner.

      The filter, once set, can be enabled or disabled with the other form  of  the¡Èfilter¡Écommand  (described
      below).  It  can also be temporarily turned off (or, if disabled, temporarily turned on) by the¡È!F!¡Éline
      prefix.

      Filtered lines can optionally be saved  and  then  displayed  if  you  so  desire.   See  the¡Èsaved  list
      size¡Éand¡Èshow¡Écommands.

      Note  that if you have saving enabled and only one line would be filtered, it is simply printed at the end
      (rather than print a one line message about how one line was filtered).

      By the way, a better¡Èname¡Éfilter for edict would be:

        filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#

      as it would filter all entries that had only one English section, that section being a name.  It  is  also
      an example of using something other than¡Æ/¡Çto delimit a regex, as it makes things a bit easier to read.

   filter [boolean]
      Enables  or  disables  the  filter  for  the selected slot.  If no argument is given, displays the current
      filter and status.

   [default] fold [boolean]
      The selected slot's case folding is turned on or off (default is on), or reported if  no  argument  given.
      However,  if¡Èdefault¡Éis specified, the value to be inherited as the default by subsequently-loaded files
      is set (or reported).

      Can be temporarily toggled by the¡È!c!¡Éline prefix.

   [default] fuzz [boolean]
      The selected slot's fuzzification is turned on or off (default is on), or reported if no  argument  given.
      However,  if¡Èdefault¡Éis specified, the value to be inherited as the default by subsequently-loaded files
      is set (or reported).

      Can be temporarily toggled by the¡È!f!¡Éline prefix.

   help [regex]
      Without an argument gives a short help list. With an argument, lists only commands whose  help  string  is
      picked up by the given regex.

   [default] highlight [boolean]
      Sets  matched-string  highlighting  on  or off for the selected slot (default off), or reports the current
      status if no argument is given.  However, if¡Èdefault¡Éis specified, the value  to  be  inherited  as  the
      default by subsequently-loaded files is set (or reported).

      If  on,  shows  in bold or reverse video (see below) that part of the line which was matched by the search
      regex.  If multiple regexes were given, that part matched by the first regex is show.

      Note that a regex might match a portion of a line which is later removed by a modify  parameter.  In  this
      case, no highlighting is done.

      Can be temporarily toggled by the¡È!h!¡Éline prefix.

   highlight style [bold | inverse | standout | <___>]
      Sets  the  style  of highlighting for when highlighting is done.  Inverse (inverse video) and standout are
      the same. The default is bold.  You can also give an HTML tag, such as¡È<BOLD>¡Éand items will be  wrapped
      by  <BOLD>...</BOLD>.  This would be particularly useful when the output is going to a CGI, as when lookup
      has been built in a server configuration.

      Note that the highlighting is affected by using raw VT100/xterm control sequences. This isn't particularly
      very nice if your terminal doesn't understand them. Sorry.

   if {expression} command...

      If the evaluated expression is non-zero, the command will be executed.

      Note that {} rather than () surround the expression.

      Expression  may  be comprised of numbers, operators, parenthesis, etc.  In addition to the normal +, -, *,
      and /, are:

         !x  ¡Ä yields 0 if x is non-zero, 1 if x is zero.
         x && y ¡Ä
         !x    ¡Ä¡Ænot¡ÇYields 1 if x is zero, 0 if non-zero.
         x & y ¡Ä¡Æand¡ÇYields 1 if both x and y are non-zero, 0 otherwise.
         x | y ¡Ä¡Æor¡Ç Yields 1 if x or y (or both) is non-zero, 0 otherwise

      There may also be the special tokens true and false which are 1 and 0 respectively.

      There are also checked, matched, printed, nonword, and filtered which correspond to the values printed  by
      the stats command.

      An example use might be the following kind of thing in an computer-generated script:

        !d!expect this line
        if {!printed} msg Oops! couldn't find "expect this line"

   input encoding [ euc | sjis ]
      Used  to  set  (or  report)  what encoding to use when 8-bit bytes are found in the interactive input (all
      flavors of JIS are always recognized).  Also see the encoding and output encoding commands.

   limit [value]
      Sets the number of lines to print during any search before aborting (or reports the current number  if  no
      value given). Default is 100.

      Output limiting is disabled if set to zero.

   log [ to [+] file ]
      Begins  logging  the  program  output  to  file (the Japanese encoding method being the same as for screen
      output).  If¡È+¡Éis given, the log is appended to any text that might have previously  been  in  file,  in
      which case a leading dashed line is inserted into the file.

      If no arguments are given, reports the current logging status.

   log  - | off
      If only¡È-¡Éor off is given, any currently-opened log file is closed.

   load [-now|-whenneeded] "filename"
      Loads  the  named file to the next available slot.  If a precomputed index is found (as¡Èfilename.jin¡É)it
      is loaded as well.  Otherwise, an index is generated internally.

      The file to be loaded (and the index, if loaded) will be loaded during idle times. This allows  a  startup
      file  to  list  many  files to be loaded, but not have to wait for each of them to load in turn. Using the
      ¡È-now¡Éflag causes the load  to  happen  immediately,  while  using  the  ¡È-whenneeded¡Éoption  (can  be
      shortened to ¡È-wn¡É)causes the load to happen only when the slot is first accessed.

      Invoke lookup as
         % lookup -writeindex filename
      to generate and write an index file, which will then be automatically used in the future.

      If  the file has already been loaded, the file is not re-read, but the previously-read file is shared. The
      new slot will, however, have its own separate flags, prompt, filter, etc.

   modify /regex/replace/[ig]
      Sets the modify parameter for the selected file.  If a file has a modify  parameter  associated  with  it,
      each  line  selected during a search will have that part of the line which matches regex (if any) replaced
      by the replacement string before being printed.

      Like the filter command, the  delimiter  need  not  be¡Æ/¡Ç;  any  non-space  character  is  fine.   If  a
      final¡Æi¡Çis  given,  the  regex  is  applied  in  a case-insensitive manner. If a final¡Æg¡Çis given, the
      replacement is done to all matches in the line, not just the first part that might match regex.

      The replacement may have embedded¡È1¡É, etc. in it to refer to parts of the matched text (see the tutorial
      on regular expressions).

      The  modify  parameter,  once  set,  may  be enabled or disabled with the other form of the modify command
      (described below).  It may also be temporarily toggled via the¡È!m!¡Éline prefix.

      A silly example for the ultra-nationalist might be:
        modify /<Japan>/Dainippon Teikoku/g
      So that a line such as
        Æü¶ä [¤Ë¤Á¤®¤ó] /Bank of Japan/
      would come out as
        Æü¶ä [¤Ë¤Á¤®¤ó] /Bank of Dainippon Teikoku/

      As a real example of the modify command with kanjidic,  consider  that  it  is  likely  that  one  is  not
      interested  in all the various fields each entry has.  The following can be used to remove the info on the
      U, N, Q, M, E, B, C, and Y fields from the output:

        modify /( [UNQMECBY]\S+)+//g,1

      It's sort of complex, but works.  Note that here the replacement part is empty,  meaning  to  just  remove
      those parts which matched.  The result of such a search of Æü would normally print

          Æü 467c U65e5 N2097 B72 B73 S4 G1 H3027 F1 Q6010.0 MP5.0714 ¡À
          MN13733 E62 Yri4 P3-3-1 ¥Ë¥Á ¥¸¥Ä ¤Ò -¤Ó -¤« {day}

      but with the above modify spec, appears more simply as

          Æü 467c S4 G1 H3027 F1 P3-3-1 ¥Ë¥Á ¥¸¥Ä ¤Ò -¤Ó -¤« {day}

   modify [boolean]
      Enables  or  disables  the  modify  parameter  for  the  selected file, or report the current status if no
      argument is given.

   msg string
      The given string is printed.

      Most likely used in a script as the target command of an if command.

   output encoding [ euc | sjis | jis...]
      Used to set exactly what kind of encoding should be used for program output (also see the  input  encoding
      command). Used when the encoding command is not detailed enough for one's needs.

      If  no  argument  is  given, reports the current output encoding.  Otherwise, arguments can usually be any
      reasonable dash-separated combination of:

        euc
           Selects EUC for the output encoding.

        sjis
           Selects Shift-JIS for the output encoding.

        jis[78|83|90][-ascii|-roman]
           Selects JIS for the output encoding.  If no year (78, 83, or 90) given, 78 is  used.  Can  optionally
           specify  that¡ÈEnglish¡Éshould be encoded as regular ASCII (the default when JIS selected) or as JIS-
           ROMAN.

        212
           Indicates that JIS X0212-1990 should be supported (ignored for Shift-JIS output).

        no212
           Indicates that JIS X0212-1990 should  be  not  be  supported  (default  setting).   This  places  JIS
           X0212-1990 characters under the domain of disp, nodisp, code, or mark (described below).

        hwk
           Indicates that half width kana should be left as-is (default setting).

        nohwk
           Indicates that half width kana should be stripped from the output.  (not yet implemented).

        foldhwk
           Indicates  that  half  width  kana  should  be  folded  to  their  full-width counterparts.  (not yet
           implemented).

        disp
           Indicates that non-displayable characters (such as JIS X0212-1990 while the output encoding method is
           Shift-JIS) should be passed along anyway (most likely resulting in screen garbage).

        nodisp
           Indicates that non-displayable characters should be quietly stripped from the output.

        code
           Indicates that non-displayable characters should be printed as their octal codes (default setting).

        mark
           Indicates that non-displayable characters should be printed as¡È¡ú¡É.

        Of  course,  not all options make sense in all combinations, or at all times.  When the current (or new)
        output encoding is reported, a complete and exact specifier representing the output  encoding  selected.
        An example might be¡Èjis78-ascii-no212-hwk-code¡É.

   pager [ boolean | size ]
      Turns on or off an output pager, sets it's idea of the screen size, or reports the current status.

      Size  can be a single number indicating the number of lines to be printed between¡ÈMORE?¡Éprompts (usually
      a few lines less than the total screen height, the default being 20 lines). It can also be two numbers  in
      the  form¡È#x#¡Éwhere  the first number is the width (in half-width characters; default 80) and the second
      is the lines-per-page as above.

      If the pager is on, every page of output will result in a¡ÈMORE?¡Éprompt, at which there are four possible
      responses.  A  space  will  allow  one  more  full  page  to  print.  A  return  will allow one more line.
      A¡Æc¡Ç(for¡Ècontinue¡É) will all the rest of the output (for  the  current  command)  to  proceed  without
      pause, while a¡Æq¡Ç(for¡Èquit¡É) will flush the output for the current command.

      If  supported by the OS, the pager size parameters are set appropriately from the window size upon startup
      or window resize.

      The default pager status is¡Èoff¡É.

   [local] prompt "string"
      Sets the prompt string.  If¡Èlocal¡Éis indicated, sets the prompt  string  for  the  selected  slot  only.
      Otherwise, sets the global default prompt string.

      Prompt strings may have the special %-sequences shown below, with related commands given in parenthesis:

         %N ¡Ä the default slot's file or combo name.
         %n ¡Ä like %N, but any leading path is not shown if a filename.
         %# ¡Ä the default slot's number.
         %S ¡Ä the¡Ècommand-introduction¡Écharacter (cmdchar)
         %0 ¡Ä the running program's name
         %F='string' ¡Ä string shown if filtering enabled (filter)
         %M='string' ¡Ä string shown if modification enabled (modify)
         %w='string' ¡Ä string shown if word mode on (word)
         %c='string' ¡Ä string shown if case folding on (fold)
         %f='string' ¡Ä string shown if fuzzification on (fuzz).
         %W='string' ¡Ä string shown if wildcard-pat. mode on (wildcard).
         %d='string' ¡Ä string shown if displaying on (display).
         %C='string' ¡Ä string shown if currently entering a command.
         %l='string' ¡Ä string shown if logging is on (log).
         %L ¡Ä the name of the current output log, if any (log)

      For the tests (%f, etc), you can put¡Æ!¡Çjust after the¡Æ%¡Çto reverse the sense of the test (i.e. %!f="no
      fuzz").  The reverse of %F is if a filter is installed but disabled (i.e.  string will never be  shown  if
      there is no filter for the default file).  The modify %M works comparably.

      Also,  you  can  use  an alternative form for the items that take an argument string. Replacing the quotes
      with parentheses will treat string as a recursive prompt specifier. For example, the specifier

           %C='command'%!C(%f='fuzzy 'search:)

      would result in a¡Ècommand¡Éprompt if entering a  command,  while  it  would  result  in  either  a¡Èfuzzy
      search:¡Éor a¡Èsearch:¡Éprompt if not entering a command.  The parenthesized constructs may be nested.

      Note  that the letters of the test constructs are the same as the letters for the¡È!!¡Ésequences described
      in INPUT SYNTAX.

      An example of a nice prompt command might be:

              prompt "%C(%0 command)%!C(%w'*'%!f'raw '%n)> "

      With this prompt specification, the prompt would normally appear as¡Èfilename> ¡Ébut when fuzzification is
      turned  off as¡Èraw filename> ¡É.  And if word-preference mode is on, the whole thing has a¡È*¡Éprepended.
      However if a command is being entered, the prompt would then become¡Èname command¡É, where  name  was  the
      program's name (system dependent, but most likely¡Èlookup¡É).

      The default prompt format string is¡È%C(%0 command)%!C(search [%n])> ¡É.

   regex debug [boolean]
      Sets  the  internal  regex  debugging  flag (turn on if you want billions of lines of stuff spewed to your
      screen).

   saved list size [value]
      During a search, lines that match might be elided from the output due to filters or word-preference  mode.
      This  command sets the number of such lines to remember during any one search, such that they may be later
      displayed (before the next search) by the show command.

      The default is 100.

   select [ num | name | . ]
      If num is given, sets the default slot to that slot number.  If name is given, sets the  default  slot  to
      the  first  slot  found  with  a  file  (or  combination)  loaded with that name.  The incantation¡Èselect
      .¡Émerely sets the default slot to itself, which can be useful in script files where you want to  indicate
      that  any  subsequent  flags changes should work with whatever file was the default at the time the script
      was sourced.

      If no argument is given, simply reports the current default slot (also see the files command).

      In command files loaded via the source command, or as the startup file,  commands  dealing  with  per-slot
      items  (flags,  local  prompt,  filters,  etc.)   work with the file or slot last selected.  The last such
      selected slot remains selected once the load is complete.

      Interactively, the default slot will become the selected slot for subsequent searches  and  commands  that
      aren't augmented with an appended¡È,#¡É(as described in the INPUT SYNTAX section).

   show
      Shows any lines elided from the previous search (either due to a filter or word-preference mode).

      Will apply any modifications (see the¡Èmodify¡Écommand) if modifications are enabled for the file. You can
      use the¡È!m!¡Éline prefix as well with this command (in  this  case,  put  the¡È!m!¡Ébefore  the  command-
      indicator character).

      The length of the list is controlled by the¡Èsaved list size¡Écommand.

   source "filename"
      Commands are read from filename and executed.

      In  the  file,  all  lines beginning with¡È#¡Éare ignored as comments (note that comments must appear on a
      line by themselves, as¡È#¡Éis a reasonable character to have within commands).

      Lines whose first non-blank characters is¡È=¡É,¡È!¡É,or¡È+¡Éare considered searches, while all other  non-
      blank  lines  are  considered  lookup  commands.   Therefore, there is no need for lines to begin with the
      command-introduction character. However, leading whitespace is always OK.

      For search lines, take care that any trailing whitespace is deleted if undesired, as  trailing  whitespace
      (like all non-leading whitespace) is kept as part of the regular expression.

      Within  a  command  file,  commands that modify per-file flags and such always work with the most-recently
      loaded (or selected) file. Therefore, something along the lines of

        load "my.word.list"
        set word on

        load "my.kanji.list"
        set word off
        set local prompt "enter kanji> "

      would word as might make intuitive sense.

      Since a script file must have a load, or select before any per-slot flag is set, one can use¡Èselect .¡Éto
      facilitate command scripts that are to work with¡Èthe current slot¡É.

   spinner [value]
      Set  the  value of the spinner (A silly little feature).  If set to a non-zero value, will cause a spinner
      to spin while a file is being checked, one increment per value lines in the file actually checked  against
      the search specifier.  Default is off (i.e. zero).

   stats
      Shows  information  about  how many lines of the text file were checked against the last search specifier,
      and how many lines matched and were printed.

   tag [boolean] ["string"]
      Enable, disable, or set the tag for the selected slot.

      If the slot is not a combination slot, a tag string may be set (the quotes are required).

      If a tag string is set and enabled for a file, the string  is  prepended  to  each  matching  output  line
      printed.

      Unlike  the  filter and modify commands which automatically enable the function when a parameter is set, a
      tag is not automatically enabled when set.  It can be enabled while being  set  via¡È'tag¡Éonor  could  be
      enabled  subsequently  via  just¡Ètag  on¡É  If  the  selected  slot  is  a  combination  slot,  only  the
      enable/disable status may be changed (on by default). No tag string may be set.

      The reason for the special treatment lies in the special nature of  how  tags  work  in  conjunction  with
      combination files.

      During  a  search  when  the  selected  slot  is  a  combination  slot, each file which is a member of the
      combination has its per-file flags disabled if their  corresponding  flag  is  disabled  in  the  original
      combination  slot.  This allows the combination slot's flags to act as a¡Èmask¡Éto blot out each component
      file's per-file flags.

      The tag flag, however, is special in that the component file's tag flag is turned on  if  the  combination
      slot's tag flag is turned on (and, of course, the component file has a tag string registered).

      The  intended  use  of  this is that one might set a (disabled) tag to a file, yet direct searches against
      that file will have no prepended tag.  However, if the file is searched as part of a combination slot (and
      the  combination slot's tag flag is on), the tag will be prepended, allowing one to easily understand from
      which file an output line comes.

   verbose [boolean]
      Sets verbose mode on or off, or reports the current status (default  on).   Many  commands  reply  with  a
      confirmation if verbose mode is turned on.

   version
      Reports the current version of the program.

   [default] wildcard [boolean]
      The  selected  slot's patterns are considerd wildcard patterns if turned on, regular expressions if turned
      off. The current status is reported  if  no  argument  given.   However,  if¡Èdefault¡Éis  specified,  the
      pattern-type to be inherited as the default by subsequently-loaded files is set (or reported).

      Can be temporarily toggled by the¡È!W!¡Éline prefix.

      When    wildcard    patterns    are    selected,    the    changed    metacharacters   are:¡È*¡Émeans¡Èany
      stuff¡É,¡È?¡Émeans¡Èany  one  character¡É,while¡È+¡Éand¡È.¡Ébecome  unspecial.  Other  regex  items   such
      as¡È|¡É,¡È(¡É,¡È[¡É,etc. are unchanged.

      What¡È*¡Éand¡È?¡Éwill  actually  match  depends  upon  the  status of word-mode, as well as on the pattern
      itself.  If word-mode is on, or if the pattern begins with the  start-of-word¡È<¡Éor¡È[¡É,only  non-spaces
      will be matched. Otherwise, any character will be matched.

      In summary,when wildcard mode is on, the input pattern is effected in the following ways:

         * is changed to the regular expression .* or
         ? is changed to the regular expression . or    + is changed to the regular expression +
         . is changed to the regular expression .

      Because  filename  patterns  are  often  called¡Èfilename  globs¡É,the command¡Èglob¡Écan be used in place
      of¡Èwildcard¡É.

   [default] word|wordpreference [boolean]
      The selected file's word-preference mode is turned on or off (default is  off),  or  reports  the  current
      setting if no argument is specified.  However, if¡Èdefault¡Éis specified, the value to be inherited as the
      default by subsequently-loaded files is set (or reported).

      In word-preference mode, entries are searched  for  as  if  the  search  regex  had  a  leading¡Æ<¡Çand  a
      trailing¡Æ>¡Ç, resulting in a list of entries with a whole-word match of the regex.  However, if there are
      none, but there are non-word entries, the non-word entries are shown (the¡Èsaved list¡Éis used for this --
      see  that  command).  This  make  it  an¡Èif  there  are whole words like this, show me, otherwise show me
      whatever you've got¡Émode.

      If there are both word and non-word entries, the non-word entries are remembered in the saved list (rather
      than any possible filtered entries being remembered there).

      One  caveat:  if  a search matches a line in more than one place, and the first is not a whole-word, while
      one  of  the  others  is,  the  line  will  be  listed  considered  non-whole  word.   For  example,   the
      search¡Öjapan¡×with  word-preference  mode  on  will  not  list  an  entry  such as¡È/Japanese/language in
      Japan/¡É, as the first¡ÈJapan¡Éis part of¡ÈJapanese¡Éand not a whole word.  If you really need just whole-
      word entries, use the¡Æ<¡Çand¡Æ>¡Çyourself.

      The mode may be temporarily toggled via the¡È!w!¡Éline prefix.

      The  rules  defining  what  lines  are  filtered, remembered, discarded, and shown for each permutation of
      search are rather complex, but the end result is rather intuitive.

   quit | leave | bye  | exit
      Exits the program.

STARTUP FILE

   If the file¡È~/.lookup¡Éis present, commands are read from it during lookup startup.

   The file is read in the same way as the source command reads files (see that entry for  more  information  on
   file format, etc.)

   However,  if there had been files loaded via command-line arguments, commands within the startup file to load
   files (and their associated commands such as to set per-file flags) are ignored.

   Similarly, any use of the command-line flags -euc, -jis, or -sjis  will  disable  in  the  startup  file  the
   commands dealing with setting the input and/or output encodings.

   The  special treatment mentioned in the above two paragraphs only applies to commands within the startup file
   itself, and does not apply to commands in command-files that might be sourced from within the startup file.

   The following is a reasonable example of a startup file:
     ## turn verbose mode off during startup file processing
     verbose off

     prompt "%C([%#]%0)%!C(%w'*'%!f'raw '%n)> "
     spinner 200
     pager on

     ## The filter for edict will hit for entries that
     ## have only one English part, and that English part
     ## having a pl or pn designation.
     load ~/lib/edict
     filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
     highlight on
     word on

     ## The filter for kanjidic will hit for entries without a
     ## frequency-of-use number.  The modify spec will remove
     ## fields with the named initial code (U,N,Q,M,E, and Y)
     load ~/lib/kanjidic
     filter "uncommon" !/<F\d+>/
     modify /( [UNQMEY])+//g

     ## Use the same filter for my local word file,
     ## but turn off by default.
     load ~/lib/local.words
     filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
     filter off
     highlight on
     word on
     ## Want a tag for my local words, but only when
     ## accessed via the combo below
     tag off "¡Õ"

     combine "words" 2 0
     select words

     ## turn verbosity back on for interactive use.
     verbose on

COMMAND-LINE ARGUMENTS

   With the use of a startup file, command-line arguments are rarely needed.  In practical use,  they  are  only
   needed to create an index file, as in:

       lookup -write textfile

   Any  command  line arguments that aren't flags are taken to be files which are loaded in turn during startup.
   In this case, any¡Èload¡É,¡Èfilter¡É, etc.  commands in the startup file are ignored.

   The following flags are supported:

   -help
      Reports a short help message and exits.

   -write  Creates index files for the named files and exits. No
      startup file is read.

   -euc
      Sets the input and  output  encoding  method  to  EUC  (currently  the  default).   Exactly  the  same  as
      the¡Èencoding euc¡Écommand.

   -jis
      Sets the input and output encoding method to JIS.  Exactly the same as the¡Èencoding jis¡Écommand.

   -sjis
      Sets the input and output encoding method to Shift-JIS.  Exactly the same as the¡Èencoding sjis¡Écommand.

   -v -version
      Prints the version string and exits.

   -norc
      Indicates that the startup file should not be read.

   -rc file
      The  named file is used as the startup file, rather than the default¡È~/.lookup¡É.  It is an error for the
      file not to exist.

   -percent num
      When an index is built, letters that appear on more than num percent (default 50) of the lines are  elided
      from  the  index.   The thought is that if a search will have to check most of the lines in a file anyway,
      one may as well save the large amount of space in the index file needed to represent that information, and
      the time/space tradeoff shifts, as the indexing of oft-occurring letters provides a diminishing return.

      Smaller indexes can be made by using a smaller number.

   -noindex
      Indicates  that any files loaded via the command line should not be loaded with any precomputed index, but
      recalculated on the fly.

   -verbose
      Has metric tons of stats spewed whenever an index is created.

   -port ###
      For the (undocumented) server configuration only, tells which port to listen on.

OPERATING SYSTEM CONSIDERATIONS

   I/O primitives and behaviors vary with the operating system. On my operating system, I can¡Èread¡Éa  file  by
   mapping  it into memory, which is a pretty much instant procedure regardless of the size of the file.  When I
   later access that memory, the appropriate sections of the file are automatically  read  into  memory  by  the
   operating system as needed.

   This  results  in  lookup starting up and presenting a prompt very quickly, but causes the first few searches
   that need to check a lot of lines in the file to go more slowly (as lots of the file will  need  to  be  read
   in).  However,  once  the bulk of the file is in, searches will go very fast. The win here is that the rather
   long file-load times are amortized over the first few (or few dozen, depending upon the  situation)  searches
   rather than always faced right at command startup time.

   On  the  other hand, on an operating system without the mapping ability, lookup would start up very slowly as
   all the files and indexes are read into memory, but would then search quickly from  the  beginning,  all  the
   file already having been read.

   To  get around the slow startup, particularly when many files are loaded, lookup uses lazy loading if it can:
   a file is not actually read into memory at the time the load command is given. Rather, it will be  read  when
   first  actually  accessed.  Furthermore, files are loaded while lookup is idle, such as when waiting for user
   input. See the files command for more information.

REGULAR EXPRESSIONS, A BRIEF TUTORIAL

   Regular expressions (¡Èregex¡Éfor short) are a¡Ècode¡Éused to indicate what kind of text you're looking  for.
   They're how one searches for things in the editors¡Èvi¡É,¡Èstevie¡É,¡Èmifes¡Éetc., or with the grep commands.
   There are differences among the various regex flavors in use -- I'll describe the flavor used by lookup here.
   Also, in order to be clear for the common case, I might tell a few lies, but nothing too heinous.

   The regex¡Öa¡×means¡Èany line with an¡Æa¡Çin it.¡É Simple enough.

   The regex¡Öab¡×means¡Èany line with an¡Æa¡Çimmediately followed by a¡Æb¡Ç¡É.  So the line
       I am feeling flabby
   would¡Èmatch¡Éthe regex¡Öab¡×because, indeed, there's an¡Èab¡Éon that line. But it wouldn't match the line

       this line has no a followed _immediately_ by a b

   because, well, what the lines says is true.

   In  most cases, letters and numbers in a regex just mean that you're looking for those letters and numbers in
   the order given. However, there are some special characters used within a regex.

   A simple example would be a period. Rather than indicate that you're looking  for  a  period,  it  means¡Èany
   character¡É.   So  the silly regex¡Ö.¡×would mean¡Èany line that has any character on it.¡ÉWell, maybe not so
   silly... you can use it to find non-blank lines.

   But more commonly it's used as part of a larger regex. Consider the regex¡Ögray¡×. It wouldn't match the line

       The sky was grey and cloudy.

   because  of  the  different  spelling  (grey  vs.  gray).   But  the  regex¡Ögr.y¡×asks  for¡Èany  line  with
   a¡Æg¡Ç,¡Ær¡Ç, some character, and then a¡Æy¡Ç¡É.  So this would get¡Ègrey¡Éand¡Ègray¡É.   A special construct
   somewhat similar to¡Æ.¡Çwould be the character class.  A character class  starts  with  a¡Æ[¡Çand  ends  with
   a¡Æ]¡Ç, and will match any character given in between. An example might be

       gr[ea]y

   which  would match lines with a¡Æg¡Ç,¡Ær¡Ç, an¡Æe¡Çor an¡Æa¡Ç, and then a¡Æy¡Ç.  Inside a character class you
   can list as many characters as you want to.

   For example the simple regex¡Öx[0123456789]y¡×would match any line with a digit sandwiched between an¡Æx¡Çand
   a¡Æy¡Ç.

   The  order  of  the characters within the character class doesn't really matter...¡Ö[513467289]¡×would be the
   same as¡Ö[0123456789]¡×.

   But as a short cut, you could put¡Ö[0-9]¡×instead of¡Ö[0123456789]¡×.  So the  character  class¡Ö[a-z]¡×would
   match any lower-case letter, while the character class¡Ö[a-zA-Z0-9]¡×would match any letter or digit.

   The  character¡Æ-¡Çis  special  within  a  character  class,  but  only  if it's not the first thing. Another
   character that's special in a character class is¡Æ^¡Ç, if it is the first thing.  It¡Èinverts¡Éthe  class  so
   that  it  will  match  any character not listed. The class¡Ö[^a-zA-Z0-9]¡×would match any line with spaces or
   punctuation on them.

   There   are   some   special   short-hand   sequences   for    some    common    character    classes.    The
   sequence¡Ö\d¡×means¡Èdigit¡É,  and  is  the  same  as¡Ö[0-9]¡×.  ¡Ö\w¡×means¡Èword  element¡Éand  is the same
   as¡Ö[0-9a-zA-Z_]¡×. ¡Ö\s¡×means¡Èspace-type thing¡Éand is the same as¡Ö[ \t]¡×(¡Ö\t¡×means tab).

   You can also use¡Ö\D¡×,¡Ö\W¡×, and¡Ö\S¡×to mean things not a digit, word element, or space-type thing.

   Another special character would be¡Æ?¡Ç. This means¡Èmaybe one of whatever was just before it,  not  is  fine
   too¡É.   In  the  regex ¡Öbikes? for rent¡×, the¡Èwhatever¡Éwould be the¡Æs¡Ç, so this would match lines with
   either¡Èbikes for rent¡Éor¡Èbike for rent¡É.

   Parentheses are also special, and can group things together.  In the regex

   big (fat harry)? deal

   the¡Èwhatever¡Éfor the¡Æ?¡Çwould be¡Èfat harry¡É.  But be careful to pay attention to details...  this  regex
   would match
       I don't see what the big fat harry deal is!
   but not
       I don't see what the big deal is!

   That's because if you take away the¡Èwhatever¡Éof the¡Æ?¡Ç, you end up with
       big  deal
   Notice  that  there  are two spaces between the words, and the regex didn't allow for that.  The regex to get
   either line above would be
       big (fat harry )?deal
   or
       big( fat harry)? deal
   Do you see how they're essentially the same?

   Similar to¡Æ?¡Çis¡Æ*¡Ç, which means¡Èany number, including none, of whatever's right in front¡É.  It more  or
   less means that whatever is tagged with¡Æ*¡Çis allowed, but not required, so something like
       I (really )*hate peas
   would match¡ÈI hate peas¡É,¡ÈI really hate peas!¡É,¡ÈI really really hate peas¡É, etc.

   Similar  to  both¡Æ?¡Çand¡Æ*¡Çis¡Æ+¡Ç,  which means¡Èat least one of whatever just in front, but more is fine
   too¡É.  The regex¡Ömis+pelling¡×would  match¡Èmispelling¡É,¡Èmisspelling¡É,¡Èmissspelling¡É,  etc.  Actually,
   it's  just  the  same  as¡Ömiss*pelling¡×but more simple to type. The regex¡Öss*¡×means¡Èan¡Æs¡Ç, followed by
   zero or more¡Æs¡Ç¡É, while¡Ös+¡×means¡Èone or more¡Æs¡Ç¡É.  Both really the same.

   The special character¡Æ|¡Çmeans¡Èor¡É.  Unlike¡Æ+¡Ç,¡Æ*¡Ç, and¡Æ?¡Çwhich act on the thing immediately before,
   the¡Æ|¡Çis more¡Èglobal¡É.
       give me (this|that) one
   Would match lines that had¡Ègive me this one¡Éor¡Ègive me that one¡Éin them.

   You can even combine more than two:
       give me (this|that|the other) one

   How about:
       [Ii]t is a (nice |sunny |bright |clear )*day

   Here, the¡Èwhatever¡Éimmediately before the¡Æ*¡Çis
       (nice |sunny |bright |clear )
   So this regex would match all the following lines:
      It is a day.
      I think it is a nice day.
      It is a clear sunny day today.
      If it is a clear sunny nice sunny sunny sunny bright day then....
   Notice how the¡Ö[Ii]t¡×matches either¡ÈIt¡Éor¡Èit¡É?

   Note that the above regex would also match
      fruit is a day
   because  it  indeed  fulfills  all  requirements  of  the  regex,  even though the¡Èit¡Éis really part of the
   word¡Èfruit¡É.  To answer concerns like this, which are common, are¡Æ<¡Çand¡Æ>¡Ç, which  mean¡Èword  break¡É.
   The   regex¡Ö<it¡×would  match  any  line  with¡Èit¡Ébeginning  a  word,  while¡Öit>¡×would  match  any  line
   with¡Èit¡Éending a word.  And, of course,¡Ö<it>¡×would match any line with the word¡Èit¡Éin it.

   Going back to the regex to find grey/gray, that would make more sense, then, as
       <gr[ae]y>
   which  would  match  only  the  words¡Ègrey¡Éand¡Ègray¡É.      Somewhat   similar   are¡Æ^¡Çand¡Æ$¡Ç,   which
   mean¡Èbeginning of line¡Éand¡Èend of line¡É, respectively (but, not in a character class, of course).  So the
   regex¡Ö^fun¡×would find any line that begins with the letters¡Èfun¡É, while¡Ö^fun>¡×would find any line  that
   begins with the word¡Èfun¡É.  ¡Ö^fun$¡×would find any line that was exactly¡Èfun¡É.

   Finally,¡Ö^\s*fun\s*$¡×would  match any line that¡Èfun¡Éexactly, but perhaps also had leading and/or trailing
   whitespace.

   That's pretty much it. There are more complex things, some of which I'll mention in the list below, but  even
   with these few simple constructs one can specify very detailed and complex patterns.

   Let's summarize some of the special things in regular expressions:

   Items that are basic units:
     char      any non-special character matches itself.
     \char     special chars, when proceeded by \, become non-special.
     .         Matches any one character (except \n).
     \n        Newline
     \t        Tab.
     \r        Carriage Return.
     \f        Formfeed.
     \d        Digit. Just a short-hand for [0-9].
     \w        Word element. Just a short-hand for [0-9a-zA-Z_].
     \s        Whitespace. Just a short-hand for [\t \n\r\f].
     \## \###  Two or three digit octal number indicating a single byte.
     [chars]   Matches a character if it's one of the characters listed.
     [^chars]  Matches a character if it's not one of the ones listed.

     The \char items above can be used within a character class,
     but not the items below.

     \D        Anything not \d.
     \W        Anything not \w.
     \S        Anything not \s.
     \a        Any ASCII character.
     \A        Any multibyte character.
     \k        Any (not half-width) katakana character (including ¡¼).
     \K        Any character not \k (except \n).
     \h        Any hiragana character.
     \H        Any character not \h (except \n).
     (regex)   Parens make the regex one unit.
     (?:regex)   [from perl5] Grouping-only parens -- can't use for \# (below)
     \c        Any JISX0208 kanji (kuten rows 16-84)
     \C        Any character not \c (except \n).
     \#        Match whatever was matched by the #th paren from the left.

   With¡È¡ù¡Éto indicate one¡Èunit¡Éas above, the following may be used:

     ¡ù?       A ¡ù allowed, but not required.
     ¡ù+       At least one ¡ù required, but more ok.
     ¡ù*       Any number of ¡ù ok, but none required.

   There are also ways to match¡Èsituations¡É:

     \b        A word boundary.
     <         Same as \b.
     >         Same as \b.
     ^         Matches the beginning of the line.
     $         Matches the end of the line.

   Finally, the¡Èor¡Éis

     reg1|reg2 Match if either reg1 or reg2 match.

   Note that¡È\k¡Éand the like aren't allowed in character classes, so
   something such as¡Ö[\k\h]¡×to try to get all kana won't work.
   Use ¡Ö(\k|\h)¡×instead.

BUGS

   Needs full support for half-width katakana and JIS X 0212-1990.
   Non-EUC (JIS & SJIS) items not tested well.
   Probably won't work on non-UNIX systems.
   Screen control codes (for clear and highlight commands) are hard-coded for ANSI/VT100/kterm.

AUTHOR

   Jeffrey Friedl (jfriedl@nff.ncl.omron.co.jp)

INFO

   Jim  Breen's  text  files  edict  and  kanjidic  and  their  documentation  can  be found in¡Èpub/nihongo¡Éon
   ftp.cc.monash.edu.au (130.194.1.106

   Information on input and output encoding and codes  can  be  found  in  Ken  Lunde's  Understanding  Japanese
   Information  Processing (ÆüËܸì¾ðÊó½èÍý) published by O'Reilly and Associates.  ISBN 1-56592-043-0.  There is
   also a Japanese edition published by SoftBank.

   A program to convert files among the various encoding methods is Dr. Ken  Lunde'sjconv,  which  can  also  be
   found  on ftp.cc.monash.edu.au.  Jconv is also useful for converting halfwidth katakana (which lookup doesn't
   yet support well) to full-width.

                                                                                                       LOOKUP(1)