Provided by: lookup_1.08b-11_amd64 bug

NAME

   lookup - interactive file search and display

SYNOPSIS

   lookup [ args ] [ file ...  ]

DESCRIPTION

   Lookup  allows  the quick interactive search of text files.  It supports ASCII, JIS-ROMAN, and
   Japanese EUC Packed formated text, and has an integrated romaji¢ªkana converter.

THIS MANUAL

   Lookup is flexible for a variety of applications. This manual  will,  however,  focus  on  the
   application  of  searching Jim Breen's edict (Japanese-English dictionary) and kanjidic (kanji
   database). Being familiar with the content and format of these files would be helpful. See the
   INFO  section  near  the  end  of this manual for information on how to obtain these files and
   their documentation.

OVERVIEW OF MAJOR FEATURES

   The following just mentions some major features to whet your appetite  to  actually  read  the
   whole manual (-:

   Romaji-to-Kana Converter
      Lookup can convert romaji to kana for you, even¡Èon the fly¡Éas you type.

   Fuzzy Searching
      Searches  can  be  a bit¡Èvague¡Éor¡Èfuzzy¡É, so that you'll be able to find¡ÈÅìµþ¡Éeven if
      you try to search for¡È¤È¤¤ç¡É(the proper yomikata being¡È¤È¤¦¤¤ç¤¦¡É).

   Regular Expressions
      Uses the powerful and expressive regular expression for searching. One can  easily  specify
      complex searches that affect¡ÈI want lines that look like such-and-such, but not like this-
      and-that, but that also have this particular characteristic....¡É

   Wildcard ``Glob'' Patterns
      Optionally, can use well-known filename wildcard patterns instead of  full-fledged  regular
      expressions.

   Filters
      You  can have lookup not list certain lines that would otherwise match your search, yet can
      optionally save them for quick review. For example, you could have  all  name-only  entries
      from edict filtered from normal output.

   Automatic Modifications
      Similarly,  you  can  do  a  standard  search-and-replace  on lines just before they print,
      perhaps to remove information you don't care to see  on  most  searches.  For  example,  if
      you're  generally  not interested in kanjidic's info on Chinese readings, you can have them
      removed from lines before printing.

   Smart Word-Preference Mode
      You can have lookup list only entries with whole words that match your search  (as  opposed
      to  an  embedded match, such as finding¡Èthe¡Éinside¡Èthem¡É), but if no whole-word matches
      exist, will go ahead and list any entry that matches the search.

   Handy Features
      Other handy features include a dynamically settable  and  parameterized  prompt,  automatic
      highlighting  of that part of the line that matches your search, an output pager, readline-
      like input with horizontal  scrolling  for  long  input  lines,  a¡È.lookup¡Éstartup  file,
      automated programability, and much more. Read on!

REGULAR EXPRESSIONS

   Lookup  makes  liberal  use of regular expressions (or regex for short) in controlling various
   aspects of the searches. If you are not familiar with the important concepts of regexes,  read
   the tutorial appendix of this manual before continuing.

JAPANESE CHARACTER ENCODING METHODS

   Internally, lookup works with Japanese packed-format EUC, and all files loaded must be encoded
   similarly. If you have files encoded in JIS or Shift-JIS, you must first convert them  to  EUC
   before loading (see the INFO section for programs that can do this).

   Interactive  input  and  output encoding, however, may be be selected via the -jis, -sjis, and
   -euc invocation flags (default is -euc), or by various  commands  to  the  program  (described
   later).

   Make  sure to use the encoding appropriate for your system.  If you're using kterm under the X
   Window System, you can use lookup's -jis flag to match kterm's default JIS encoding.  Or,  you
   might  use  kterm's¡È-km  euc¡Éstartup  option (or menu selection) to put kterm into EUC mode.
   Also, I have found kterm's scrollbar (¡È-sb -sl 500¡É) to be quite useful.

   With many¡ÈEnglish¡Éfonts in  Japan,  the  character  that  normally  prints  as  a  backslash
   (halfwidth  version  of  ¡À)  in The States appears as a yen symbol (the half-width version of
   ¡ï). How it will appear on your system is a function of what font  you  use  and  what  output
   encoding  method  you choose, which may be different from the font and method that was used to
   print this manual (both of which may be different  from  what's  printed  on  your  keyboard's
   appropriate key).  Make sure to keep this in mind while reading.

STARTUP

   Let's assume that your copy of edict is in ~/lib/edict. You can start the program simply with

           lookup ~/lib/edict

   You'll    note    that    lookup   spends   some   time   building   an   index   before   the
   default¡Èlookup> ¡Éprompt appears.

   Lookup gains much of its search speed by constructing an index of the file(s) to be  searched.
   Since  building  the  index  can be time consuming itself, you can have lookup write the built
   index to a file that can be quickly loaded the next time you run  the  program.   Index  files
   will be given a¡È.jin¡É(Jeffrey's Index) ending.

   Let's build the indices for edict and kanjidic now:

           lookup -write ~/lib/edict ~/lib/kanjidic

   This will create the index files
          ~/lib/edict.jin
          ~/lib/kanjidic.jin
   and exit.

   You can now re-start lookup , automatically using the pre-computed index files as:

          lookup ~/lib/edict ~/lib/kanjidic

   You  should  then  be  presented  with  the  prompt without having to wait for the index to be
   constructed (but see the section on Operating System concerns for possible reasons of delay).

INPUT

   There are basically two types of input: searches and commands.  Commands  do  such  things  as
   tell  lookup  to load more files or set flags. Searches report lines of a file that match some
   search specifier (where lines to search for are specified by one or more regular expressions).

   The input syntax may perhaps at first seem odd, but has  been  designed  to  be  powerful  and
   concise. A bit of time invested to learn it well will pay off greatly when you need it.

BRIEF EXAMPLE

   Assuming  you've  started  lookup  with  edict  and  kanjidic  as noted above, let's try a few
   searches. In these examples, the
       ¡Èsearch [edict]> ¡É
   is the prompt.  Note that the space after the¡Æ>¡Çis part of the prompt.

   Given the input:

     search [edict]> tranquil

   lookup will report all lines with the string¡Ètranquil¡Éin them. There are currently  about  a
   dozen such lines, two of which look like:

     °Â¤é¤« [¤ä¤¹¤é¤«] /peaceful (an)/tranquil/calm/restful/
     °Â¤é¤® [¤ä¤¹¤é¤®] /peace/tranquility/

   Notice  that  lines  with¡Ètranquil¡Éand¡Ètranquility¡Ématched? This is because¡Ètranquil¡Éwas
   embedded  in  the  word¡Ètranquility¡É.   You  could  restrict  the   search   to   only   the
   word¡Ètranquil¡Éby   prepending  the  special¡Èstart  of  word¡Ésymbol¡Æ<¡Çand  appending  the
   special¡Èend of word¡Ésymbol¡Æ>¡Çto the regex, as in:

     search [edict]> <tranquil>

   This is the regular expression that says¡Èthe beginning of a word, followed  by  a¡Æt¡Ç,¡Ær¡Ç,
   ...,¡Æl¡Ç,  which  is  at  the  end  of  a  word.¡ÉThe current version of edict has just three
   matching entries.

   Let's try another:

     search [edict]> fukushima

   This is a search for the¡ÈEnglish¡Éfukushima -- ways to search  for  kana  or  kanji  will  be
   explored later.  Note that among the several lines selected and printed are:

     ÉûÅç [¤Õ¤¯¤·¤Þ] /Fukushima (pn,pl)/
     ÌÚÁ¾Ê¡Åç [¤¤½¤Õ¤¯¤·¤Þ] /Kisofukushima (pl)/

   By default, searches are done in a case-insensitive manner --¡ÆF¡Çand¡Æf¡Çare treated the same
   by lookup, at least so far as the matching goes.  This is called case folding.

   Let's give a command to turn this option off,  so  that¡Æf¡Çand¡ÆF¡Çwon't  be  considered  the
   same.   Here's  an  odd  point  about  lookup's  input syntax: the default setting is that all
   command lines must begin with a  space.   The  space  is  the  (default)  command-introduction
   character  and  tells  the  input  parser  to  expect  a  command rather than a search regular
   expression.  It is a common mistake at first to  forget  the  leading  space  when  issuing  a
   command.  Be careful.

   Try  the command¡È fold¡Éto report the current status of case-folding.  Notice that as soon as
   you type the space, the prompt changes to
     ¡Èlookup command> ¡É
   as a reminder that now you're typing a command rather than a search specification.

     lookup command>  fold

   The reply should be¡Èfile #0's case folding is on¡É

   You can actually turn it off with¡È fold off¡É.  Now  try  the  search  for¡Èfukushima¡Éagain.
   Notice  that  this  time  the  entries  with¡ÈFukushima¡Éaren't  listed?  Now  try  the search
   string¡ÈFukushima¡Éand see that the entries with¡Èfukushima¡Éaren't listed.

   Case folding is usually very convenient (it also makes  corresponding  katakana  and  hiragana
   match the same), so don't forget to turn it back on:

     lookup command>  fold on

JAPANESE INPUT

   Lookup  has  an  automatic  romaji¢ªkana  converter. A leading¡Æ/¡Çindicates that romaji is to
   follow. Try typing¡È/tokyo¡Éand you'll see it convert to¡È/¤È¤¤ç¡Éas you type.  When  you  hit
   return, lookup will list all lines that have a¡È¤È¤¤ç¡Ésomewhere in them. Well, sort of.  Look
   carefully at the lines which match. Among them (if you had case folding back on) you'll see:

     ¥¥ê¥¹¥È¶µ [¥¥ê¥¹¥È¤¤ç¤¦] /Christianity/
     Åìµþ [¤È¤¦¤¤ç¤¦] /Toukyou (pl)/Tokyo/current capital of Japan/
     Æ̶À [¤È¤Ã¤¤ç¤¦] /convex lens/

   The first one has¡È¤È¤¤ç¡Éin it (as¡È¥È¤¤ç¡É,  where  the  katakana¡È¥È¡Ématches  in  a  case-
   insensitive manner to the hiragana¡È¤È¡É), but you might consider the others unexpected, since
   they don't have¡È¤È¤¤ç¡Éin them.  They're close (¡È¤È¤¦¤¤ç¡Éand¡È¤È¤Ã¤¤ç¡É),  but  not  exact.
   This  is the result of lookup's¡Èfuzzification¡É. Try the command¡È fuzz¡É(again, don't forget
   the command-introduction space).  You'll see that fuzzification is turned  on.   Turn  it  off
   with¡È fuzz  off¡Éand try¡È/tokyo¡É(which will convert as you type) again.  This time you only
   get the lines which have¡È¤È¤¤ç¡Éexactly (well, case folding is still on, so  it  might  match
   katakana as well).

   In  a fuzzy search, length of vowels is ignored --¡È¤È¡Éis considered the same as¡È¤È¤¦¡É, for
   example. Also, the presence or absence of any¡È¤Ã¡Écharacter is ignored, and the pairs ¤¸  ¤Â,
   ¤º ¤Å, ¤¨ ¤ñ, and ¤ª ¤ò are considered identical in a fuzzy search.

   It  might  be convenient to consider a fuzzy search to be a¡Èpronunciation search¡É.   Special
   note: fuzzification will not be performed if a regular expression¡È*¡É,¡È+¡É,or¡È?¡Émodifies a
   non-ASCII  character.  This  is  not  an  issue when input patterns are filename-like wildcard
   patterns (discussed below).

   In addition to kana fuzziness, there's one special case for kanji when fuzziness  is  on.  The
   kanji repeater mark¡È¡¹¡Éwill be recognized such that¡È»þ¡¹¡Éand¡È»þ»þ¡Éwill match each-other.

   Turn  fuzzification  back  on  (¡Èfuzz  on¡É),  and  search  for  all  whole words which sound
   like¡Ètokyo¡É. That search would be specified as:

     search [edict]> /<tokyo>

   (again, the¡Ètokyo¡Éwill be converted to¡È¤È¤¤ç¡Éas you type).  My copy of edict has the three
   lines

     Åìµþ [¤È¤¦¤¤ç¤¦] /Toukyou (pl)/Tokyo/current capital of Japan/
     Æõö [¤È¤Ã¤¤ç] /special permission/patent/
     Æ̶À [¤È¤Ã¤¤ç¤¦] /convex lens/

   This  kind  of  whole-word  romaji-to-kana  search  is so common, there's a special short cut.
   Instead  of  typing¡È/<tokyo>¡É,  you  can  type¡È[tokyo]¡É.    The   leading¡Æ[¡Çmeans¡Èstart
   romaji¡Éand¡Èstart    of    word¡É.    Were   you   to   type¡È<tokyo>¡Éinstead   (without   a
   leading¡Æ/¡Çor¡Æ[¡Çto indicate romaji-to-kana conversion), you would get all  lines  with  the
   English  whole-word¡Ètokyo¡Éin them.  That would be a reasonable request as well, but not what
   we want at the moment.

   Besides the kana conversion, you can use any cut-and-paste that your  windowing  system  might
   provide  to  get Japanese text onto the search line. Cut¡È¤È¤¤ç¡Éfrom somewhere and paste onto
   the search line. When hitting enter to run the search, you'll notice that it is  done  without
   fuzzification  (even  if  the  fuzzification  flag  was¡Èon¡É).   That's  because  there's  no
   leading¡Æ/¡Ç. Not only does a leading¡Æ/¡Çndicate that you want the romaji-to-kana conversion,
   but that you want it done fuzzily.

   So,  if  you'd like fuzzy cut-and-paste, just type a leading¡Æ/¡Çefore pasting (or go back and
   prepend one after pasting).

   These examples have all been pretty simple, but you can use all the power that regexes have to
   offer.  As  a  slightly  more complex example, the search¡È<gr[ea]y>¡Éwould look for all lines
   with the words¡Ègrey¡Éor¡Ègray¡Éin them.  Since the¡Æ[¡Çisn't the first character of the line,
   it  doesn't mean what was mentioned above (start-of-word romaji).  In this case, it's just the
   regular-expression¡Èclass¡Éindicator.

   If you feel more  comfortable  using  filename-like¡È*.txt¡Éwildcard  patterns,  you  can  use
   the¡Èwildcard on¡Écommand to have patterns be considered this way.

   This has been a quick introduction to the basics of lookup.

   It  can be very powerful and much more complex. Below is a detailed description of its various
   parts and features.

READLINE INPUT

   The actual keystrokes are read by a readline-ish package that is pretty standard. In  addition
   to just typing away, the following keystrokes are available:

     ^B  / ^F     move left/right one character on the line
     ^A  / ^E     move to the start/end of the line
     ^H  / ^G     delete one character to the left/right of the cursor
     ^U  / ^K     delete all characters to the left/right of the cursor
     ^P  / ^N     previous/next lines on the history list
     ^L or ^R     redraw the line
     ^D           delete char under the cursor, or EOF if line is empty
     ^space       force romaji conversion (^@ on some systems)

   If  automatic  romaji-to-kana conversion is turned on (as it is by default), there are certain
   situations where the conversion will be done, as we  saw  above.  Lower-case  romaji  will  be
   converted  to  hiragana,  while  upper-case  romaji  to  katakana.  This usually won't matter,
   though, as case folding will treat hiragana and katakana the same in the searches.

   In exactly what situations the automatic conversion will be done  is  intended  to  be  rather
   intuitive  once the basic idea is learned.  However, at any time, one can use control-space to
   convert the ASCII to the left of the cursor to kana. This  can  be  particularly  useful  when
   needing to enter kana on a command line (where auto conversion is never done; see below)

ROMAJI FLAVOR

   Most  flavors  of  romaji  are  recognized.  Special or non-obvious items are mentioned below.
   Lowercase are converted to hiragana, uppercase to katakana.

   Long vowels can be entered by repeating the vowel, or with¡Æ-¡Çor¡Æ^¡Ç.

   In situations where an¡Èn¡Écould be vague, as in¡Èna¡Ébeing ¤Ê or ¤ó¤¢, use a single quote  to
   force ¤ó.  Therefore,¡Ökenichi¡×¢ª¤±¤Ë¤Á while¡Öken'ichi¡×¢ª¤±¤ó¤¤¤Á.

   The  romaji has been richly extended with many non-standard combinations such as ¤Õ¤¡ or ¤Á¤§,
   which are represented in intuitive ways:¡Öfa¡×¢ª¤Õ¤¡,¡Öche¡×¢ª¤Á¤§. etc.

   Various other mappings of interest:

     wo ¢ª¤ò     we¢ª¤ñ      wi¢ª¤ð
     VA ¢ª¥ô¥¡   VI¢ª¥ô¥£    VU¢ª¥ô      VE¢ª¥ô¥§    VO¢ª¥ô¥©
     di ¢ª¤Â     dzi¢ª¤Â     dya¢ª¤Â¤ã   dyu¢ª¤Â¤å   dyo¢ª¤Â¤ç
     du ¢ª¤Å     tzu¢ª¤Å     dzu¢ª¤Å

   (the following kana are all smaller versions of the regular kana)

     xa ¢ª¤¡     xi¢ª¤£      xu¢ª¤¥      xe¢ª¤§      xo¢ª¤©
     xu ¢ª¤¥     xtu¢ª¤Ã     xwa¢ª¤î     xka¢ª¥õ     xke¢ª¥ö
     xya¢ª¤ã     xyu¢ª¤å     xyo¢ª¤ç

INPUT SYNTAX

   Any input line beginning with  a  space  (or  whichever  character  is  set  as  the  command-
   introduction  character)  is  processed  as  a  command  to  lookup rather than a search spec.
   Automatic kana conversion is never done on these lines (but forced  conversion  with  control-
   space may be done at any time).

   Other lines are taken as search regular expressions, with the following special cases:

   ?  A  line  consisting  of a single question mark will report the current command-introduction
      character (the default is a space, but can be changed with the¡Ècmdchar¡Écommand).

   =  If a line begins with¡Æ=¡Ç, the line (without  the¡Æ=¡Ç)  is  taken  as  a  search  regular
      expression, and no automatic (or internal -- see below) kana conversion is done anywhere on
      the line (although again, conversion can always be forced with control-space).  This can be
      used  to  initiate  a  search  where the beginning of the regex is the command-introduction
      character, or in certain situations where automatic  kana  conversion  is  temporarily  not
      desired.

   /  A  line  beginning  with¡Æ/¡Çindicates  romaji input for the whole line.  If automatic kana
      conversion is turned on, the conversion will be done in real-time, as the romaji is  typed.
      Otherwise it will be done internally once the line is entered.  Regardless, the presence of
      the leading¡Æ/¡Çindicates that any kana (either  converted  or  cut-and-pasted  in)  should
      be¡Èfuzzified¡Éif fuzzification is turned on.

      As an addition to the above, if the line doesn't begin with¡Æ=¡Çor the command-introduction
      character (and automatic conversion is turned on),¡Æ/¡Ç  anywhere  on  the  line  initiates
      automatic conversion for the following word.

   [  A  line  beginning  with¡Æ[¡Çis taken to be romaji (just as a line beginning with¡Æ/¡Ç, and
      the converted romaji is subject to fuzzification (if turned on).  However,  if¡Æ[¡Çis  used
      rather  than¡Æ/¡Ç,  an  implied¡Æ<¡Ç¡Èbeginning of word¡Éis prepended to the resulting kana
      regex.   Also,  any  ending¡Æ]¡Çon  such  a   line   is   converted   to   the¡Èending   of
      word¡Éspecifier¡Æ>¡Çin the resulting regex.

   In  addition  to the above, lines may have certain prefixes and suffixes to control aspects of
   the search or command:

   !  Various flags can be toggled  for  the  duration  of  a  particular  search  by  prepending
      a¡È!!¡Ésequence to the input line.

      Sequences are shown below, along with commands related to each:

       !F! ¡Ä  Filtration is toggled for this line (filter)
       !M! ¡Ä  Modification is toggled for this line (modify)
       !w! ¡Ä  Word-preference mode is toggled for this line (word)
       !c! ¡Ä  Case folding is toggled for this line (fold)
       !f! ¡Ä  Fuzzification is toggled for this line (fuzz)
       !W! ¡Ä  Wildcard-pattern mode is toggled for this line (wildcard)
       !r! ¡Ä  Raw. Force fuzzification off for this line
       !h! ¡Ä  Highlighting is toggled for this line (highlight)
       !t! ¡Ä  Tagging is toggled for this line (tag)
       !d! ¡Ä  Displaying is on for this line (display)

      The letters can be combined, as in¡È!cf!¡É.

      The  final¡Æ!¡Ç  can  be  omitted if the first character after the sequence is not an ASCII
      letter.

      If no letters are given (¡È!!¡É).¡È!f!¡Éis the default.

      These last two points can be conveniently combined in the common  case  of¡È!/romaji¡Éwhich
      would be the same as¡È!f!/romaji¡É.

      The  special sequence¡È!?¡Élists the above, as well as indicates which are currently turned
      on.

      Note that the letters accepted in a¡È!!¡Ésequence are  many  of  the  indicators  shown  by
      the¡Èfiles¡Écommand.

   +  A¡Æ+¡Çprepended to anything above will cause the final search regex to be printed. This can
      be useful to see when and what kind of fuzzification and/or  internal  kana  conversion  is
      happening. Consider:

        search [edict]> +/¤ï¤«¤ë
        a match is¡È¤ï[¤¡¤¢¡¼]*¤Ã?¤«[¤¡¤¢¡¼]*¤ë[¤¥¤¦¤ª¤©¡¼]*¡É

      Due to the¡Èleading¡É/ the kana is fuzzified, which explains the somewhat complex resulting
      regex. For comparison, note:

        search [edict]> +¤ï¤«¤ë
        a match is¡È¤ï¤«¤ë¡É
        search [edict]> +!/¤ï¤«¤ë
        a match is¡È¤ï¤«¤ë¡É

      As the¡Æ+¡Çshows, these are not fuzzified.  The  first  one  has  no  leading¡Æ/¡Çor¡Æ[¡Çto
      induce  fuzzification,  while  the  second  has  the¡Æ!¡Çline  prefix (which is the default
      version of¡È!f!¡É), which toggles fuzzification mode to¡Èoff¡Éfor that line.

   ,  The default of all searches and most commands is to work with the first file loaded  (edict
      in these examples). One can change this default (see the¡Èselect¡Écommand) or, by appending
      a comma+digit sequence at the end of an input line, force that line to  work  with  another
      previously-loaded  file.  An  appended¡È,1¡Éworks  with  first  extra file loaded (in these
      examples, kanjidic).  An appended¡È,2¡Éworks with the 2nd extra file loaded, etc.

      An appended¡È,0¡Éworks with the original first file (and can be useful if the default  file
      has been changed via the¡Èselect¡Écommand).

      The following sequence shows a common usage:

        search [edict]> [¤È¤¤ç¤È]
        ÅìµþÅÔ [¤È¤¦¤¤ç¤¦¤È] /Tokyo Metropolitan area/

      cutting and pasting the ÅÔ from above, and adding a¡È,1¡Éto search kanjidic:

        search [edict]> ÅÔ,1
        ÅÔ 4554 N4769 S11  ..... ¥È ¥Ä ¤ß¤ä¤³ {metropolis} {capital}

FILENAME-LIKE WILDCARD MATCHING

   When  wildcard-pattern mode is selected, patterns are considered as extended.Q "*.txt" "-like"
   patterns. This is often more convenient for users not familiar with  regular  expressions.  To
   have this mode selected by default, put

      default wildcard on

   into your¡È.lookup¡Éfile (see¡ÈSTARTUP FILE¡Ébelow).

   When wildcard mode is on, only ¡È*¡É,¡È?¡É,¡È+¡É,and¡È.¡É,are effected.  See the entry for the
   ¡Èwildcard¡Écommand below for details.

   Other features, such as the multiple-pattern searches (described  below)  and  other  regular-
   expression metacharacters are available.

MULTIPLE-PATTERN SEARCHES

   You can put multiple patterns in a single search specifier.  For example consider

     search [edict]> china||japan

   The  first  part (¡Èchina¡É) will select all lines that have¡Èchina¡Éin them. Then, from among
   those lines, the second part will select lines that  have¡Èjapan¡Éin  them.   The¡È||¡Éis  not
   part of any pattern -- it is lookup's¡Èpipe¡Émechanism.

   The  above example is very different from the single pattern ¡Èchina|japan¡Éwhich would select
   any line that  had  either¡Èchina¡Éor¡Èjapan¡É.   With¡Èchina||japan¡É,  you  get  lines  that
   have¡Èchina¡Éand then also have¡Èjapan¡Éas well.

   Note  that  it  is  also different from the regular expression¡Èchina.*japan¡É(or the wildcard
   pattern¡Èchina*japan¡É)which would select lines having¡Èchina, then  maybe  some  stuff,  then
   japan¡É.   But consider the case when¡Èjapan¡Écomes on the line before¡Èchina¡É. Just for your
   comparison, the multiple-pattern specifier¡Èchina||japan¡Éis  pretty  much  the  same  as  the
   single regular expression¡Èchina.*japan|japan.*china¡É.

   If you use¡È|!|¡Éinstead of¡È||¡É, it will mean¡È...and then lines not matching...¡É.

   Consider  a  way to find all lines of kanjidic that do have a Halpern number, but don't have a
   Nelson number:

       search [edict]> <H\d+>|!|<N\d+>

   If you then wanted to restrict the  listing  to  those  that  also  had  a¡Èjinmeiyou¡Émarking
   (kanjidic's¡ÈG9¡Éfield) and had a reading of ¤¢¤, you could make it:

       search [edict]> <H\d+>|!|<N\d+>||<G9>||<¤¢¤>

   A prepended¡Æ+¡Çwould explain:

       a match is¡È<H\d+>¡É
       and not¡È<N\d+>¡É
       and¡È<G9>¡É
       and¡È<¤¢¤>¡É

   The¡È|!|¡Éand¡È||¡Écan  be  used  to  make  up  to ten separate regular expressions in any one
   search specification.

   Again, it is important to stress that¡È||¡Édoes not mean¡Èor¡É(as it does in a C  program,  or
   as¡Æ|¡Çdoes  within  a regular expression).  You might find it convenient to read¡È||¡Éas¡Èand
   also¡É, while reading¡È|!|¡Éas¡Èbut not¡É.

   It is also important to stress that any whitespace around the¡È||¡Éand¡È|!|¡Éconstruct is  not
   ignored, but kept as part of the regex on either side.

COMBINATION SLOTS

   Each  file,  when  loaded, is assigned to a¡Èslot¡Évia which subsequent references to the file
   are then made.  The slot may then be searched, have filters and flags set, etc.

   A special kind of slot, called a¡Ècombination slot¡É,rather than representing a  single  file,
   can represent multiple previously-loaded slots. Searches against a combination slot (or¡Ècombo
   slot¡Éfor   short)   search   all   those   previously-loaded   slots   associated   with   it
   (called¡Ècomponent slots¡É).  Combo slots are set up with the combine command.

   A  Combo  slot  has  no filter or modify spec, but can have a local prompt and flags just like
   normal file slots.  The flags, however, have special meanings with combo  slots.  Most  combo-
   slot  flags act as a mask against the component-slot flags; when acted upon as a member of the
   combo, a component-slot's flag will be disabled if  the  corresponding  combo-slot's  flag  is
   disabled.

   Exceptions to this are the autokana, fuzz, and tag flags.

   The  autokana  and  fuzz  flags  governs a combo slot exactly the same as a regular file slot.
   When a slot is searched as a component of a combination slot, the component slot's  fuzz  (and
   autokana) flags, or lack thereof, are ignored.

   The tag flag is quite different altogether; see the tag command for complete information.

   Consider the following output from the files command:

     ¨®¨¬¨³¨¬¨¬¨¬¨¬¨¸¨¬¨¬¨³¨¬¨¬¨¬¨³¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬
     ¨ 0¨F wcfh d¨¢a I ¨ 2762k¨/usr/jfriedl/lib/edict
     ¨ 1¨FM cf  d¨¢a I ¨  705k¨/usr/jfriedl/lib/kanjidic
     ¨ 2¨F  cfh@d¨¢a   ¨    1k¨/usr/jfriedl/lib/local.words
     ¨*3¨FM cfhtd¨¢a   ¨ combo¨kotoba (#2, #0)
     ¨±¨¬¨µ¨¬¨¬¨¬¨¬¨º¨¬¨¬¨µ¨¬¨¬¨¬¨µ¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬

   See the discussion of the files command below for basic explanation of the output.

   As  can be seen, slot #3 is a combination slot with the name¡Èkotoba¡Éwith component slots two
   and zero. When a search is  initiated  on  this  slot,  first  slot  #2¡Èlocal.words¡Éwill  be
   searched,  then  slot #0¡Èedict¡É.   Because the combo slot's filter flag is on, the component
   slots' filter flag will remain on during the search.  The  combo  slot's  word  flag  is  off,
   however, so slot #0's word flag will be forced off during the search.

   See the combine command for information about creating combo slots.

PAGER

   Lookup has a built in pager (a'la more).  Upon filling a screen with text, the string
       --MORE [space,return,c,q]--
   is shown. A space will allow another screen of text; a return will allow one more line. A¡Æc¡Ç
   will allow output text to continue unpaged until the next command. A¡Æq¡Ç will flush output of
   the current command.

   If supported by the OS, lookup's idea of the screen size is automatically set upon startup and
   window resize.  Lookup must know the width of the screen in doing both the  horizontal  input-
   line scrolling, and for knowing when a long line wraps on the screen.

   The pager parameters can be set manually with the¡Èpager¡Écommand.

COMMANDS

   Any  line  intended  to  be  a command must begin with the command-introduction character (the
   default is a space, but can be set via the¡Ècmdchar¡Écommand).  However, that character is not
   part of the command itself and won't be shown in the following list of commands.

   There are a number of commands that work with the selected file or selected slot (both meaning
   the same thing).  The selected file is the  one  indicated  by  an  appended  comma+digit,  as
   mentioned  above.  If  no such indication is given, the default selected file is used (usually
   the first file loaded, but can be changed with the¡Èselect¡Écommand).

   Some commands accept a boolean argument, such as to turn a flag on or off. In all such  cases,
   a¡È1¡Éor¡Èon¡Émeans  to  turn  the flag on, while a¡È0¡Éor¡Èoff¡Éis used to turn it off.  Some
   flags are per-file (¡Èfuzz¡É,¡Èfold¡É, etc.), and a command to set such a flag  normally  sets
   the  flag  for  the  selected  file only. However, the default value inherited by subsequently
   loaded files can be set by prepending¡Èdefault¡Éto the command. This is particularly useful in
   the startup file before any files are loaded (see the section STARTUP FILE).

   Items   separated  by¡Æ|¡Çare  mutually  exclusive  possibilities  (i.e.  a  boolean  argument
   is¡È1|on|0|off¡É).

   Items shown in brackets (¡Æ[¡Çand¡Æ]¡Ç) are optional.  All  commands  that  accept  a  boolean
   argument  to  set  a flag or mode do so optionally -- with no argument the command will report
   the current status of the mode or flag.

   Any command that allows an argument in quotes (such as load, etc.)  allow the use of single or
   double quotes.

   The commands:

   [default] autokana [boolean]
      Automatic  romaji  ¢ª kana conversion for the selected file is turned on or off (default is
      on).  However, if¡Èdefault¡Éis specified, the value to  be  inherited  as  the  default  by
      subsequently-loaded files is set (or reported).

      Can be temporarily disabled by a prepended¡Æ=¡Ç,as described in the INPUT SYNTAX section.

   clear|cls
      Attempts to clear the screen. If you're using a kterm it'll just output the appropriate tty
      control sequence. Otherwise it'll try to run the¡Èclear¡Écommand.

   cmdchar ['one-byte-char']
      The default command-introduction character is a space, but  it  may  be  changed  via  this
      command. The single quotes surrounding the character are required. If no argument is given,
      the current value is printed.

      An input line consisting of a single question  mark  will  also  print  the  current  value
      (useful for when you don't know the current value).

      Woe  to  the  one  that sets the command-introduction character to one of the other special
      input-line characters, such as¡Æ+¡Ç,¡Æ/¡Ç, etc.

   combine ["name"] [ num += ] slotnum ...
      Creates or adds file slots to a combination slot (see the  COMBINATION  SLOTS  section  for
      general information).  Note that¡Ècombo¡Émay be used as the command as well.

      Assuming  for  this example that slots 0-2 are loaded with the files curly, moe, and larry,
      we can create a combination slot that will reference all three:

        combo "three stooges" 2, 0, 1

      The command will report

        creating combo slot #3 (three stooges): 2 0 1

      The name is optional, and will appear in the files list, and also maybe be used to  specify
      the slot as an argument to the select command.

      A  search via the newly created combo slot would search in the order specified on the combo
      command line: first larry, then curly, and finally moe.

      If you later load another file (say, jeffrey to slot #4),  you  can  then  add  it  to  the
      previously made combo:

        combo 3 += 4

      (the¡È+=¡Éwording  comes  from  the  C  programming  language where it means¡Èadd on to¡É).
      Adding to a combination always adds slots to the end of the list.

      You can take the opportunity of adding the slot to also change the name, if you like:

        combo "four stooges" 3 += 4

      The reply would be
        adding to combo slot #3(four stooges): 4

      A file slot can be a component of any particular combo slot only once.  When reporting  the
      created or added slot numbers, the number will appear in parenthesis if it had already been
      a member of the list.

      Furthermore, only file slots can be component members of combo slots. Attempting to combine
      combo slot X to combo slot Y will result in having X's component file slots (rater than the
      combo slot itself) added to Y.

   command debug [boolean]
      Sets the internal command parser debugging flag on or off (default is off).

   debug [boolean]
      Sets the internal general-debugging flag on or off (default is off).

   describe specifier
      This command will tell you how a character (or each character in a string)  is  encoded  in
      the various encoding methods:

          lookup command>  describe "µ¤"
          ¡Èµ¤¡Éas  EUC  is 0xb5a4 (181 164; 265 \244)
                as  JIS  is 0x3524 ( 53  36;  65 \044 "5$")
                as KUTEN is   2104 ( 0x1504;  25 \004)
                as S-JIS is 0x8b1f (139  31; 213 \037)

      The quotes surrounding the character or string to describe are optional.  You can also give
      a regular ASCII character and have the double-width version of the character  described....
      indicating¡ÈA¡É,  for  example,  would describe¡È£Á¡É.   Specifier can also be a four-digit
      kuten value, in which case the character with that kuten will be described.

      If a four-digit specifier has a hex digit in it, or if it is preceded by¡È0x¡É,  the  value
      is taken as a JIS code. You can precede the value by¡Èjis¡É,¡Èsjis¡É,¡Èeuc¡É, or¡Èkuten¡Éto
      force interpretation to the requested code.

      Finally, specifier can be a string of stripped JIS (JIS  w/o  the  kanji-in  and  kanji-out
      codes,   or   with   the   codes   but   without  the  escape  characters  in  them).   For
      example¡ÈF|K\¡Éwould describe the two characters Æü and ËÜ.

   encoding [euc|sjis|jis]
      The same as the -euc, -jis, and -sjis command-line options, sets the  encoding  method  for
      interactive  input and output (or reports the current status).  More detail over the output
      encoding can be achieved with the output encoding command. A separate  encoding  for  input
      can be set with the input encoding command.

   files [ - | long ]
      Lists what files are loaded in what slots, and some status information about them, as with:

      ¨*0¨F wcfh d¨¢a I ¨ 3749k¨/usr/jeff/lib/edict
      ¨ 1¨FM cf  d¨¢a I ¨  754k¨/usr/jeff/lib/kanjidic

        ¨®¨¬¨³¨¬¨¬¨¬¨¬¨¬¨¸¨¬¨¬¨³¨¬¨¬¨¬¨³¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬
        ¨ 0¨F wcf h d ¨¢a I ¨ 2762k¨/usr/jfriedl/lib/edict
        ¨ 1¨FM cf   d ¨¢a I ¨  705k¨/usr/jfriedl/lib/kanjidic
        ¨ 2¨F  cfWh@d ¨¢a   ¨    1k¨/usr/jfriedl/lib/local.words
        ¨*3¨FM cf htd ¨¢a   ¨ combo¨kotoba (#2, #0)
        ¨ 4¨   cf   d ¨¢a   ¨  205k¨/usr/dict/words
        ¨±¨¬¨µ¨¬¨¬¨¬¨¬¨¬¨º¨¬¨¬¨µ¨¬¨¬¨¬¨µ¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬¨¬

      The  first  section  is  the slot number, with a¡È*¡Ébeside the default slot (as set by the
      select command).

      The second section shows per-slot flags and status. Letters are shown if the  flag  is  on,
      omitted if off. In the list below, related commands are given for each item:

        F ¡Ä if there is a filter {but '#' if disabled}. (filter)
        M ¡Ä if there is a modify spec {but '%' if disabled}. (modify)
        w ¡Ä if word-preference mode is turned on. (word)
        c ¡Ä if case folding is turned on. (fold)
        f ¡Ä if fuzzification is turned on. (fuzz)
        W ¡Ä if wildcard-pattern mode is turned on (wildcard)
        h ¡Ä if highlighting is turned on. (highlight)
        t ¡Ä if there is a tag {but @ if disabled} (tag)
        d ¡Ä if found lines should be displayed (display)
        ¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡¨¡
        a ¡Ä if autokana is turned on (autokana)
        P ¡Ä if there is a file-specific local prompt (prompt)
        I ¡Ä if the file is loaded with a precomputed index (load)
        d ¡Ä if the display flag is on (display)
      Note  that  the  letters  in  the  upper  section  directly correspond to the¡È!!¡Ésequence
      characters described in the INPUT SYNTAX section.

      If there is a digit at the end of the flag section, it indicates that only #/10 of the file
      is  actually  loaded  into  memory  (as opposed to the file having been completely loaded).
      Unloaded files will be loaded while lookup is idle, or when first used.

      If the slot is a combination slot (as slot #3 is in the example above), that  is  noted  in
      the  third  section,  and  the combination name and component slot numbers are noted in the
      fourth. Also, for combination slots (which have no filter or  modify  specifications,  only
      the  flags),  F and/or M are shown if the corresponding mode is allowed during searches via
      the combo slot. See the tag command for info about t with respect to combination slots.

      If an argument (either¡È-¡Éor¡Èlong¡Éwill work) is given to the command,  a  short  message
      about what the flags mean is also printed.

   filter ["label"] [!] /regex/[i]
      Sets  the  filter  for the selected slot (which must contain a file and not a combination).
      If a filter is set and active for a file, any line matching the  given  regex  is  filtered
      from  the  output  (if  the¡Æ!¡Çis put before the regex, any line not matching the regex is
      filtered).  The label , which isn't required,  merely  acts  as  documentation  in  various
      diagnostics.

      As  an  example,  consider  that edict lines often have¡È(pn)¡Éon them to indicate that the
      given English is a place name. Often these place names can be a bother, so it would be nice
      to elide them from the output unless specifically requested.  Consider the example:

        lookup command>  filter "name" /(pn)/
        search [edict]> [¤¤Î]
        µ¡Ç½ [¤¤Î¤¦] /function/faculty/
        µ¢Ç¼ [¤¤Î¤¦] /inductive/
        ºòÆü [¤¤Î¤¦] /yesterday/
        ¢ã3 "name" lines filtered¢ä

      In  the  example,¡Æ/¡Çcharacters are used to delimit the start and stop of the regex (as is
      common with many programs). However, any character can be used. A final¡Æi¡Ç,  if  present,
      indicates that the regex should be applied in a case-insensitive manner.

      The   filter,   once   set,   can   be   enabled   or  disabled  with  the  other  form  of
      the¡Èfilter¡Écommand (described below). It can also  be  temporarily  turned  off  (or,  if
      disabled, temporarily turned on) by the¡È!F!¡Éline prefix.

      Filtered lines can optionally be saved and then displayed if you so desire.  See the¡Èsaved
      list size¡Éand¡Èshow¡Écommands.

      Note that if you have saving enabled and only one line would  be  filtered,  it  is  simply
      printed at the end (rather than print a one line message about how one line was filtered).

      By the way, a better¡Èname¡Éfilter for edict would be:

        filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#

      as  it  would  filter  all  entries that had only one English section, that section being a
      name.  It is also an example of using something other than¡Æ/¡Çto delimit a  regex,  as  it
      makes things a bit easier to read.

   filter [boolean]
      Enables  or  disables  the filter for the selected slot.  If no argument is given, displays
      the current filter and status.

   [default] fold [boolean]
      The selected slot's case folding is turned on or off (default is on),  or  reported  if  no
      argument  given.   However,  if¡Èdefault¡Éis  specified,  the  value to be inherited as the
      default by subsequently-loaded files is set (or reported).

      Can be temporarily toggled by the¡È!c!¡Éline prefix.

   [default] fuzz [boolean]
      The selected slot's fuzzification is turned on or off (default is on), or  reported  if  no
      argument  given.   However,  if¡Èdefault¡Éis  specified,  the  value to be inherited as the
      default by subsequently-loaded files is set (or reported).

      Can be temporarily toggled by the¡È!f!¡Éline prefix.

   help [regex]
      Without an argument gives a short help list. With an argument, lists  only  commands  whose
      help string is picked up by the given regex.

   [default] highlight [boolean]
      Sets  matched-string highlighting on or off for the selected slot (default off), or reports
      the current status if no argument is given.  However, if¡Èdefault¡Éis specified, the  value
      to be inherited as the default by subsequently-loaded files is set (or reported).

      If  on,  shows in bold or reverse video (see below) that part of the line which was matched
      by the search regex.  If multiple regexes were given, that part matched by the first  regex
      is show.

      Note  that  a  regex  might  match  a  portion of a line which is later removed by a modify
      parameter. In this case, no highlighting is done.

      Can be temporarily toggled by the¡È!h!¡Éline prefix.

   highlight style [bold | inverse | standout | <___>]
      Sets the style of highlighting for when highlighting is done.  Inverse (inverse video)  and
      standout  are  the  same.  The  default  is  bold.   You  can  also  give an HTML tag, such
      as¡È<BOLD>¡Éand items will be wrapped  by  <BOLD>...</BOLD>.  This  would  be  particularly
      useful  when  the  output  is  going  to  a  CGI, as when lookup has been built in a server
      configuration.

      Note that the highlighting is affected by using raw  VT100/xterm  control  sequences.  This
      isn't particularly very nice if your terminal doesn't understand them. Sorry.

   if {expression} command...

      If the evaluated expression is non-zero, the command will be executed.

      Note that {} rather than () surround the expression.

      Expression  may  be  comprised of numbers, operators, parenthesis, etc.  In addition to the
      normal +, -, *, and /, are:

         !x  ¡Ä yields 0 if x is non-zero, 1 if x is zero.
         x && y ¡Ä
         !x    ¡Ä¡Ænot¡ÇYields 1 if x is zero, 0 if non-zero.
         x & y ¡Ä¡Æand¡ÇYields 1 if both x and y are non-zero, 0 otherwise.
         x | y ¡Ä¡Æor¡Ç Yields 1 if x or y (or both) is non-zero, 0 otherwise

      There may also be the special tokens true and false which are 1 and 0 respectively.

      There are also checked, matched, printed, nonword, and filtered  which  correspond  to  the
      values printed by the stats command.

      An example use might be the following kind of thing in an computer-generated script:

        !d!expect this line
        if {!printed} msg Oops! couldn't find "expect this line"

   input encoding [ euc | sjis ]
      Used  to set (or report) what encoding to use when 8-bit bytes are found in the interactive
      input (all flavors of JIS are  always  recognized).   Also  see  the  encoding  and  output
      encoding commands.

   limit [value]
      Sets the number of lines to print during any search before aborting (or reports the current
      number if no value given). Default is 100.

      Output limiting is disabled if set to zero.

   log [ to [+] file ]
      Begins logging the program output to file (the Japanese encoding method being the  same  as
      for  screen  output).   If¡È+¡Éis  given,  the  log is appended to any text that might have
      previously been in file, in which case a leading dashed line is inserted into the file.

      If no arguments are given, reports the current logging status.

   log  - | off
      If only¡È-¡Éor off is given, any currently-opened log file is closed.

   load [-now|-whenneeded] "filename"
      Loads the named file to  the  next  available  slot.   If  a  precomputed  index  is  found
      (as¡Èfilename.jin¡É)it is loaded as well.  Otherwise, an index is generated internally.

      The  file  to  be  loaded (and the index, if loaded) will be loaded during idle times. This
      allows a startup file to list many files to be loaded, but not have to  wait  for  each  of
      them  to  load in turn. Using the ¡È-now¡Éflag causes the load to happen immediately, while
      using the ¡È-whenneeded¡Éoption (can be shortened to ¡È-wn¡É)causes the load to happen only
      when the slot is first accessed.

      Invoke lookup as
         % lookup -writeindex filename
      to generate and write an index file, which will then be automatically used in the future.

      If  the file has already been loaded, the file is not re-read, but the previously-read file
      is shared. The new slot will, however, have its own separate flags, prompt, filter, etc.

   modify /regex/replace/[ig]
      Sets the modify parameter for the  selected  file.   If  a  file  has  a  modify  parameter
      associated  with  it,  each  line  selected during a search will have that part of the line
      which matches regex (if any) replaced by the replacement string before being printed.

      Like the filter command, the delimiter need not be¡Æ/¡Ç; any non-space character  is  fine.
      If  a  final¡Æi¡Çis  given,  the  regex  is  applied  in  a  case-insensitive  manner. If a
      final¡Æg¡Çis given, the replacement is done to all matches in the line, not just the  first
      part that might match regex.

      The  replacement  may  have embedded¡È1¡É, etc. in it to refer to parts of the matched text
      (see the tutorial on regular expressions).

      The modify parameter, once set, may be enabled or disabled  with  the  other  form  of  the
      modify  command  (described  below).  It may also be temporarily toggled via the¡È!m!¡Éline
      prefix.

      A silly example for the ultra-nationalist might be:
        modify /<Japan>/Dainippon Teikoku/g
      So that a line such as
        Æü¶ä [¤Ë¤Á¤®¤ó] /Bank of Japan/
      would come out as
        Æü¶ä [¤Ë¤Á¤®¤ó] /Bank of Dainippon Teikoku/

      As a real example of the modify command with kanjidic, consider that it is likely that  one
      is  not  interested in all the various fields each entry has.  The following can be used to
      remove the info on the U, N, Q, M, E, B, C, and Y fields from the output:

        modify /( [UNQMECBY]\S+)+//g,1

      It's sort of complex, but works.  Note that here the replacement part is empty, meaning  to
      just  remove  those  parts which matched.  The result of such a search of Æü would normally
      print

          Æü 467c U65e5 N2097 B72 B73 S4 G1 H3027 F1 Q6010.0 MP5.0714 ¡À
          MN13733 E62 Yri4 P3-3-1 ¥Ë¥Á ¥¸¥Ä ¤Ò -¤Ó -¤« {day}

      but with the above modify spec, appears more simply as

          Æü 467c S4 G1 H3027 F1 P3-3-1 ¥Ë¥Á ¥¸¥Ä ¤Ò -¤Ó -¤« {day}

   modify [boolean]
      Enables or disables the modify parameter for the  selected  file,  or  report  the  current
      status if no argument is given.

   msg string
      The given string is printed.

      Most likely used in a script as the target command of an if command.

   output encoding [ euc | sjis | jis...]
      Used  to  set exactly what kind of encoding should be used for program output (also see the
      input encoding command). Used when the encoding command is not detailed  enough  for  one's
      needs.

      If  no  argument  is  given, reports the current output encoding.  Otherwise, arguments can
      usually be any reasonable dash-separated combination of:

        euc
           Selects EUC for the output encoding.

        sjis
           Selects Shift-JIS for the output encoding.

        jis[78|83|90][-ascii|-roman]
           Selects JIS for the output encoding.  If no year (78, 83, or 90) given,  78  is  used.
           Can  optionally specify that¡ÈEnglish¡Éshould be encoded as regular ASCII (the default
           when JIS selected) or as JIS-ROMAN.

        212
           Indicates that JIS X0212-1990 should be supported (ignored for Shift-JIS output).

        no212
           Indicates that JIS X0212-1990 should be not  be  supported  (default  setting).   This
           places  JIS  X0212-1990  characters  under  the  domain of disp, nodisp, code, or mark
           (described below).

        hwk
           Indicates that half width kana should be left as-is (default setting).

        nohwk
           Indicates that half  width  kana  should  be  stripped  from  the  output.   (not  yet
           implemented).

        foldhwk
           Indicates  that  half  width  kana  should be folded to their full-width counterparts.
           (not yet implemented).

        disp
           Indicates that non-displayable characters (such as JIS  X0212-1990  while  the  output
           encoding  method is Shift-JIS) should be passed along anyway (most likely resulting in
           screen garbage).

        nodisp
           Indicates that non-displayable characters should be quietly stripped from the output.

        code
           Indicates that non-displayable characters should  be  printed  as  their  octal  codes
           (default setting).

        mark
           Indicates that non-displayable characters should be printed as¡È¡ú¡É.

        Of  course,  not  all  options make sense in all combinations, or at all times.  When the
        current (or new) output encoding is reported, a complete and exact specifier representing
        the output encoding selected.  An example might be¡Èjis78-ascii-no212-hwk-code¡É.

   pager [ boolean | size ]
      Turns  on or off an output pager, sets it's idea of the screen size, or reports the current
      status.

      Size  can  be  a  single  number  indicating  the   number   of   lines   to   be   printed
      between¡ÈMORE?¡Éprompts (usually a few lines less than the total screen height, the default
      being 20 lines). It can also be two numbers in the form¡È#x#¡Éwhere the first number is the
      width (in half-width characters; default 80) and the second is the lines-per-page as above.

      If  the  pager  is on, every page of output will result in a¡ÈMORE?¡Éprompt, at which there
      are four possible responses. A space will allow one more full page to print. A return  will
      allow  one  more  line.   A¡Æc¡Ç(for¡Ècontinue¡É)  will all the rest of the output (for the
      current command) to proceed without pause, while a¡Æq¡Ç(for¡Èquit¡É) will flush the  output
      for the current command.

      If  supported  by  the  OS, the pager size parameters are set appropriately from the window
      size upon startup or window resize.

      The default pager status is¡Èoff¡É.

   [local] prompt "string"
      Sets the prompt string.  If¡Èlocal¡Éis indicated, sets the prompt string for  the  selected
      slot only. Otherwise, sets the global default prompt string.

      Prompt strings may have the special %-sequences shown below, with related commands given in
      parenthesis:

         %N ¡Ä the default slot's file or combo name.
         %n ¡Ä like %N, but any leading path is not shown if a filename.
         %# ¡Ä the default slot's number.
         %S ¡Ä the¡Ècommand-introduction¡Écharacter (cmdchar)
         %0 ¡Ä the running program's name
         %F='string' ¡Ä string shown if filtering enabled (filter)
         %M='string' ¡Ä string shown if modification enabled (modify)
         %w='string' ¡Ä string shown if word mode on (word)
         %c='string' ¡Ä string shown if case folding on (fold)
         %f='string' ¡Ä string shown if fuzzification on (fuzz).
         %W='string' ¡Ä string shown if wildcard-pat. mode on (wildcard).
         %d='string' ¡Ä string shown if displaying on (display).
         %C='string' ¡Ä string shown if currently entering a command.
         %l='string' ¡Ä string shown if logging is on (log).
         %L ¡Ä the name of the current output log, if any (log)

      For the tests (%f, etc), you can put¡Æ!¡Çjust after the¡Æ%¡Çto reverse  the  sense  of  the
      test  (i.e.  %!f="no  fuzz").   The  reverse of %F is if a filter is installed but disabled
      (i.e.  string will never be shown if there is no filter for the default file).  The  modify
      %M works comparably.

      Also, you can use an alternative form for the items that take an argument string. Replacing
      the quotes with parentheses will treat string as a recursive prompt specifier. For example,
      the specifier

           %C='command'%!C(%f='fuzzy 'search:)

      would  result  in a¡Ècommand¡Éprompt if entering a command, while it would result in either
      a¡Èfuzzy search:¡Éor a¡Èsearch:¡Éprompt if  not  entering  a  command.   The  parenthesized
      constructs may be nested.

      Note   that  the  letters  of  the  test  constructs  are  the  same  as  the  letters  for
      the¡È!!¡Ésequences described in INPUT SYNTAX.

      An example of a nice prompt command might be:

              prompt "%C(%0 command)%!C(%w'*'%!f'raw '%n)> "

      With this prompt specification, the prompt would normally appear  as¡Èfilename> ¡Ébut  when
      fuzzification  is  turned off as¡Èraw filename> ¡É.  And if word-preference mode is on, the
      whole thing has a¡È*¡Éprepended.  However if a command is being entered, the  prompt  would
      then  become¡Èname command¡É, where name was the program's name (system dependent, but most
      likely¡Èlookup¡É).

      The default prompt format string is¡È%C(%0 command)%!C(search [%n])> ¡É.

   regex debug [boolean]
      Sets the internal regex debugging flag (turn on if you want  billions  of  lines  of  stuff
      spewed to your screen).

   saved list size [value]
      During  a  search, lines that match might be elided from the output due to filters or word-
      preference mode.  This command sets the number of such lines to  remember  during  any  one
      search, such that they may be later displayed (before the next search) by the show command.

      The default is 100.

   select [ num | name | . ]
      If  num  is  given,  sets the default slot to that slot number.  If name is given, sets the
      default slot to the first slot found with a file (or combination) loaded  with  that  name.
      The  incantation¡Èselect  .¡Émerely sets the default slot to itself, which can be useful in
      script files where you want to indicate that any subsequent flags changes should work  with
      whatever file was the default at the time the script was sourced.

      If  no  argument  is  given,  simply  reports  the current default slot (also see the files
      command).

      In command files loaded via the source command, or as the startup  file,  commands  dealing
      with  per-slot  items (flags, local prompt, filters, etc.)  work with the file or slot last
      selected.  The last such selected slot remains selected once the load is complete.

      Interactively, the default slot will become the selected slot for subsequent  searches  and
      commands  that  aren't  augmented  with  an appended¡È,#¡É(as described in the INPUT SYNTAX
      section).

   show
      Shows any lines elided from the previous search (either due to a filter or  word-preference
      mode).

      Will  apply  any  modifications (see the¡Èmodify¡Écommand) if modifications are enabled for
      the file. You can use the¡È!m!¡Éline prefix as well with this command (in  this  case,  put
      the¡È!m!¡Ébefore the command-indicator character).

      The length of the list is controlled by the¡Èsaved list size¡Écommand.

   source "filename"
      Commands are read from filename and executed.

      In  the file, all lines beginning with¡È#¡Éare ignored as comments (note that comments must
      appear on a line by themselves, as¡È#¡Éis a reasonable character to have within commands).

      Lines whose first non-blank characters is¡È=¡É,¡È!¡É,or¡È+¡Éare considered searches,  while
      all  other non-blank lines are considered lookup commands.  Therefore, there is no need for
      lines to begin with the command-introduction  character.  However,  leading  whitespace  is
      always OK.

      For  search  lines,  take  care  that  any  trailing whitespace is deleted if undesired, as
      trailing whitespace (like all non-leading whitespace)  is  kept  as  part  of  the  regular
      expression.

      Within  a  command  file, commands that modify per-file flags and such always work with the
      most-recently loaded (or selected) file. Therefore, something along the lines of

        load "my.word.list"
        set word on

        load "my.kanji.list"
        set word off
        set local prompt "enter kanji> "

      would word as might make intuitive sense.

      Since a script file must have a load, or select before any per-slot flag is  set,  one  can
      use¡Èselect .¡Éto facilitate command scripts that are to work with¡Èthe current slot¡É.

   spinner [value]
      Set  the  value  of the spinner (A silly little feature).  If set to a non-zero value, will
      cause a spinner to spin while a file is being checked, one increment per value lines in the
      file actually checked against the search specifier.  Default is off (i.e. zero).

   stats
      Shows  information  about  how  many  lines  of the text file were checked against the last
      search specifier, and how many lines matched and were printed.

   tag [boolean] ["string"]
      Enable, disable, or set the tag for the selected slot.

      If the slot is not a combination slot, a tag string may be set (the quotes are required).

      If a tag string is set and enabled for a file, the string is  prepended  to  each  matching
      output line printed.

      Unlike  the  filter  and  modify  commands  which  automatically enable the function when a
      parameter is set, a tag is not automatically enabled when set.  It  can  be  enabled  while
      being  set via¡È'tag¡Éonor could be enabled subsequently via just¡Ètag on¡É If the selected
      slot is a combination slot, only the enable/disable status may be changed (on by  default).
      No tag string may be set.

      The  reason  for  the  special  treatment  lies  in  the special nature of how tags work in
      conjunction with combination files.

      During a search when the selected slot is a combination slot, each file which is  a  member
      of  the combination has its per-file flags disabled if their corresponding flag is disabled
      in the original combination slot. This allows  the  combination  slot's  flags  to  act  as
      a¡Èmask¡Éto blot out each component file's per-file flags.

      The tag flag, however, is special in that the component file's tag flag is turned on if the
      combination slot's tag flag is turned on (and, of course, the  component  file  has  a  tag
      string registered).

      The  intended  use  of  this  is  that one might set a (disabled) tag to a file, yet direct
      searches against that file will have no prepended tag.  However, if the file is searched as
      part  of  a  combination  slot (and the combination slot's tag flag is on), the tag will be
      prepended, allowing one to easily understand from which file an output line comes.

   verbose [boolean]
      Sets verbose mode on or off, or reports the current status  (default  on).   Many  commands
      reply with a confirmation if verbose mode is turned on.

   version
      Reports the current version of the program.

   [default] wildcard [boolean]
      The  selected  slot's  patterns  are  considerd  wildcard  patterns  if  turned on, regular
      expressions if turned off. The current status is reported if no argument  given.   However,
      if¡Èdefault¡Éis specified, the pattern-type to be inherited as the default by subsequently-
      loaded files is set (or reported).

      Can be temporarily toggled by the¡È!W!¡Éline prefix.

      When  wildcard  patterns  are  selected,  the  changed  metacharacters  are:¡È*¡Émeans¡Èany
      stuff¡É,¡È?¡Émeans¡Èany  one  character¡É,while¡È+¡Éand¡È.¡Ébecome  unspecial.  Other regex
      items such as¡È|¡É,¡È(¡É,¡È[¡É,etc. are unchanged.

      What¡È*¡Éand¡È?¡Éwill actually match depends upon the status of word-mode, as  well  as  on
      the  pattern  itself.   If  word-mode  is  on,  or if the pattern begins with the start-of-
      word¡È<¡Éor¡È[¡É,only non-spaces will be matched. Otherwise, any character will be matched.

      In summary,when wildcard mode is on, the input pattern is effected in the following ways:

         * is changed to the regular expression .* or
         ? is changed to the regular expression . or    + is changed to the regular expression +
         . is changed to the regular expression .

      Because filename patterns are often called¡Èfilename globs¡É,the command¡Èglob¡Écan be used
      in place of¡Èwildcard¡É.

   [default] word|wordpreference [boolean]
      The  selected  file's word-preference mode is turned on or off (default is off), or reports
      the current setting if no argument is specified.  However, if¡Èdefault¡Éis  specified,  the
      value to be inherited as the default by subsequently-loaded files is set (or reported).

      In  word-preference  mode,  entries  are  searched  for  as  if  the  search  regex  had  a
      leading¡Æ<¡Çand a trailing¡Æ>¡Ç, resulting in a list of entries with a whole-word match  of
      the  regex.   However,  if  there  are  none,  but there are non-word entries, the non-word
      entries are shown (the¡Èsaved list¡Éis used for this -- see that  command).  This  make  it
      an¡Èif  there  are  whole  words  like  this,  show  me,  otherwise show me whatever you've
      got¡Émode.

      If there are both word and non-word entries, the non-word entries  are  remembered  in  the
      saved list (rather than any possible filtered entries being remembered there).

      One  caveat:  if  a  search  matches  a line in more than one place, and the first is not a
      whole-word, while one of the others is, the line will be listed considered non-whole  word.
      For  example,  the  search¡Öjapan¡×with word-preference mode on will not list an entry such
      as¡È/Japanese/language in Japan/¡É, as the first¡ÈJapan¡Éis part  of¡ÈJapanese¡Éand  not  a
      whole word.  If you really need just whole-word entries, use the¡Æ<¡Çand¡Æ>¡Çyourself.

      The mode may be temporarily toggled via the¡È!w!¡Éline prefix.

      The  rules  defining  what  lines  are  filtered, remembered, discarded, and shown for each
      permutation of search are rather complex, but the end result is rather intuitive.

   quit | leave | bye  | exit
      Exits the program.

STARTUP FILE

   If the file¡È~/.lookup¡Éis present, commands are read from it during lookup startup.

   The file is read in the same way as the source command reads files (see that  entry  for  more
   information on file format, etc.)

   However,  if  there  had  been  files  loaded  via command-line arguments, commands within the
   startup file to load files (and their associated commands such as to set per-file  flags)  are
   ignored.

   Similarly,  any use of the command-line flags -euc, -jis, or -sjis will disable in the startup
   file the commands dealing with setting the input and/or output encodings.

   The special treatment mentioned in the above two paragraphs only applies  to  commands  within
   the startup file itself, and does not apply to commands in command-files that might be sourced
   from within the startup file.

   The following is a reasonable example of a startup file:
     ## turn verbose mode off during startup file processing
     verbose off

     prompt "%C([%#]%0)%!C(%w'*'%!f'raw '%n)> "
     spinner 200
     pager on

     ## The filter for edict will hit for entries that
     ## have only one English part, and that English part
     ## having a pl or pn designation.
     load ~/lib/edict
     filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
     highlight on
     word on

     ## The filter for kanjidic will hit for entries without a
     ## frequency-of-use number.  The modify spec will remove
     ## fields with the named initial code (U,N,Q,M,E, and Y)
     load ~/lib/kanjidic
     filter "uncommon" !/<F\d+>/
     modify /( [UNQMEY])+//g

     ## Use the same filter for my local word file,
     ## but turn off by default.
     load ~/lib/local.words
     filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
     filter off
     highlight on
     word on
     ## Want a tag for my local words, but only when
     ## accessed via the combo below
     tag off "¡Õ"

     combine "words" 2 0
     select words

     ## turn verbosity back on for interactive use.
     verbose on

COMMAND-LINE ARGUMENTS

   With the use of a startup file, command-line arguments are rarely needed.  In  practical  use,
   they are only needed to create an index file, as in:

       lookup -write textfile

   Any  command  line  arguments that aren't flags are taken to be files which are loaded in turn
   during startup.  In this case, any¡Èload¡É,¡Èfilter¡É, etc.  commands in the startup file  are
   ignored.

   The following flags are supported:

   -help
      Reports a short help message and exits.

   -write  Creates index files for the named files and exits. No
      startup file is read.

   -euc
      Sets the input and output encoding method to EUC (currently the default).  Exactly the same
      as the¡Èencoding euc¡Écommand.

   -jis
      Sets the input and output encoding method  to  JIS.   Exactly  the  same  as  the¡Èencoding
      jis¡Écommand.

   -sjis
      Sets  the input and output encoding method to Shift-JIS.  Exactly the same as the¡Èencoding
      sjis¡Écommand.

   -v -version
      Prints the version string and exits.

   -norc
      Indicates that the startup file should not be read.

   -rc file
      The named file is used as the startup file, rather than the default¡È~/.lookup¡É.  It is an
      error for the file not to exist.

   -percent num
      When  an  index  is built, letters that appear on more than num percent (default 50) of the
      lines are elided from the index.  The thought is that if a search will have to  check  most
      of  the lines in a file anyway, one may as well save the large amount of space in the index
      file needed to represent that information, and  the  time/space  tradeoff  shifts,  as  the
      indexing of oft-occurring letters provides a diminishing return.

      Smaller indexes can be made by using a smaller number.

   -noindex
      Indicates  that  any  files  loaded  via  the  command  line  should not be loaded with any
      precomputed index, but recalculated on the fly.

   -verbose
      Has metric tons of stats spewed whenever an index is created.

   -port ###
      For the (undocumented) server configuration only, tells which port to listen on.

OPERATING SYSTEM CONSIDERATIONS

   I/O primitives and behaviors vary with  the  operating  system.  On  my  operating  system,  I
   can¡Èread¡Éa  file  by  mapping  it  into  memory,  which  is  a pretty much instant procedure
   regardless of the size of the file.  When I later access that memory, the appropriate sections
   of the file are automatically read into memory by the operating system as needed.

   This  results in lookup starting up and presenting a prompt very quickly, but causes the first
   few searches that need to check a lot of lines in the file to go more slowly (as lots  of  the
   file will need to be read in). However, once the bulk of the file is in, searches will go very
   fast. The win here is that the rather long file-load times are amortized over  the  first  few
   (or  few  dozen,  depending  upon  the  situation)  searches rather than always faced right at
   command startup time.

   On the other hand, on an operating system without the mapping ability, lookup would  start  up
   very  slowly  as all the files and indexes are read into memory, but would then search quickly
   from the beginning, all the file already having been read.

   To get around the slow startup, particularly when many files  are  loaded,  lookup  uses  lazy
   loading  if  it  can:  a file is not actually read into memory at the time the load command is
   given. Rather, it will be read when first actually accessed.  Furthermore,  files  are  loaded
   while  lookup  is  idle,  such  as when waiting for user input. See the files command for more
   information.

REGULAR EXPRESSIONS, A BRIEF TUTORIAL

   Regular expressions (¡Èregex¡Éfor short) are a¡Ècode¡Éused  to  indicate  what  kind  of  text
   you're     looking    for.     They're    how    one    searches    for    things    in    the
   editors¡Èvi¡É,¡Èstevie¡É,¡Èmifes¡Éetc., or with the  grep  commands.   There  are  differences
   among  the various regex flavors in use -- I'll describe the flavor used by lookup here. Also,
   in order to be clear for the common case, I might tell a few lies, but nothing too heinous.

   The regex¡Öa¡×means¡Èany line with an¡Æa¡Çin it.¡É Simple enough.

   The regex¡Öab¡×means¡Èany line with an¡Æa¡Çimmediately followed by a¡Æb¡Ç¡É.  So the line
       I am feeling flabby
   would¡Èmatch¡Éthe regex¡Öab¡×because, indeed, there's an¡Èab¡Éon that line.  But  it  wouldn't
   match the line

       this line has no a followed _immediately_ by a b

   because, well, what the lines says is true.

   In  most cases, letters and numbers in a regex just mean that you're looking for those letters
   and numbers in the order given. However, there are  some  special  characters  used  within  a
   regex.

   A  simple example would be a period. Rather than indicate that you're looking for a period, it
   means¡Èany character¡É.  So the silly regex¡Ö.¡×would mean¡Èany line that has any character on
   it.¡ÉWell, maybe not so silly... you can use it to find non-blank lines.

   But more commonly it's used as part of a larger regex. Consider the regex¡Ögray¡×. It wouldn't
   match the line

       The sky was grey and cloudy.

   because of the different spelling (grey vs. gray).  But the  regex¡Ögr.y¡×asks  for¡Èany  line
   with a¡Æg¡Ç,¡Ær¡Ç, some character, and then a¡Æy¡Ç¡É.  So this would get¡Ègrey¡Éand¡Ègray¡É.
   A special construct somewhat similar to¡Æ.¡Çwould be the character class.  A  character  class
   starts  with  a¡Æ[¡Çand  ends  with  a¡Æ]¡Ç, and will match any character given in between. An
   example might be

       gr[ea]y

   which would match lines with a¡Æg¡Ç,¡Ær¡Ç, an¡Æe¡Çor  an¡Æa¡Ç,  and  then  a¡Æy¡Ç.   Inside  a
   character class you can list as many characters as you want to.

   For  example  the  simple  regex¡Öx[0123456789]y¡×would match any line with a digit sandwiched
   between an¡Æx¡Çand a¡Æy¡Ç.

   The   order   of   the   characters   within    the    character    class    doesn't    really
   matter...¡Ö[513467289]¡×would be the same as¡Ö[0123456789]¡×.

   But  as  a  short  cut,  you  could  put¡Ö[0-9]¡×instead of¡Ö[0123456789]¡×.  So the character
   class¡Ö[a-z]¡×would match any lower-case letter, while the character class¡Ö[a-zA-Z0-9]¡×would
   match any letter or digit.

   The  character¡Æ-¡Çis  special within a character class, but only if it's not the first thing.
   Another character that's special in a character class is¡Æ^¡Ç,  if  it  is  the  first  thing.
   It¡Èinverts¡Éthe   class   so   that   it   will   match   any   character   not  listed.  The
   class¡Ö[^a-zA-Z0-9]¡×would match any line with spaces or punctuation on them.

   There  are  some  special  short-hand  sequences  for  some  common  character  classes.   The
   sequence¡Ö\d¡×means¡Èdigit¡É,  and  is the same as¡Ö[0-9]¡×. ¡Ö\w¡×means¡Èword element¡Éand is
   the  same  as¡Ö[0-9a-zA-Z_]¡×.  ¡Ö\s¡×means¡Èspace-type   thing¡Éand   is   the   same   as¡Ö[
   \t]¡×(¡Ö\t¡×means tab).

   You  can  also  use¡Ö\D¡×,¡Ö\W¡×, and¡Ö\S¡×to mean things not a digit, word element, or space-
   type thing.

   Another special character would be¡Æ?¡Ç. This means¡Èmaybe one of whatever was just before it,
   not  is  fine  too¡É.   In the regex ¡Öbikes? for rent¡×, the¡Èwhatever¡Éwould be the¡Æs¡Ç, so
   this would match lines with either¡Èbikes for rent¡Éor¡Èbike for rent¡É.

   Parentheses are also special, and can group things together.  In the regex

   big (fat harry)? deal

   the¡Èwhatever¡Éfor the¡Æ?¡Çwould  be¡Èfat  harry¡É.   But  be  careful  to  pay  attention  to
   details... this regex would match
       I don't see what the big fat harry deal is!
   but not
       I don't see what the big deal is!

   That's because if you take away the¡Èwhatever¡Éof the¡Æ?¡Ç, you end up with
       big  deal
   Notice  that there are two spaces between the words, and the regex didn't allow for that.  The
   regex to get either line above would be
       big (fat harry )?deal
   or
       big( fat harry)? deal
   Do you see how they're essentially the same?

   Similar to¡Æ?¡Çis¡Æ*¡Ç, which means¡Èany  number,  including  none,  of  whatever's  right  in
   front¡É.  It more or less means that whatever is tagged with¡Æ*¡Çis allowed, but not required,
   so something like
       I (really )*hate peas
   would match¡ÈI hate peas¡É,¡ÈI really hate peas!¡É,¡ÈI really really hate peas¡É, etc.

   Similar to both¡Æ?¡Çand¡Æ*¡Çis¡Æ+¡Ç, which means¡Èat least one of whatever just in front,  but
   more          is          fine          too¡É.           The         regex¡Ömis+pelling¡×would
   match¡Èmispelling¡É,¡Èmisspelling¡É,¡Èmissspelling¡É,  etc.  Actually,  it's  just  the   same
   as¡Ömiss*pelling¡×but more simple to type. The regex¡Öss*¡×means¡Èan¡Æs¡Ç, followed by zero or
   more¡Æs¡Ç¡É, while¡Ös+¡×means¡Èone or more¡Æs¡Ç¡É.  Both really the same.

   The special character¡Æ|¡Çmeans¡Èor¡É.  Unlike¡Æ+¡Ç,¡Æ*¡Ç,  and¡Æ?¡Çwhich  act  on  the  thing
   immediately before, the¡Æ|¡Çis more¡Èglobal¡É.
       give me (this|that) one
   Would match lines that had¡Ègive me this one¡Éor¡Ègive me that one¡Éin them.

   You can even combine more than two:
       give me (this|that|the other) one

   How about:
       [Ii]t is a (nice |sunny |bright |clear )*day

   Here, the¡Èwhatever¡Éimmediately before the¡Æ*¡Çis
       (nice |sunny |bright |clear )
   So this regex would match all the following lines:
      It is a day.
      I think it is a nice day.
      It is a clear sunny day today.
      If it is a clear sunny nice sunny sunny sunny bright day then....
   Notice how the¡Ö[Ii]t¡×matches either¡ÈIt¡Éor¡Èit¡É?

   Note that the above regex would also match
      fruit is a day
   because  it indeed fulfills all requirements of the regex, even though the¡Èit¡Éis really part
   of the word¡Èfruit¡É.  To answer concerns like this, which are common, are¡Æ<¡Çand¡Æ>¡Ç, which
   mean¡Èword  break¡É.   The  regex¡Ö<it¡×would  match  any  line  with¡Èit¡Ébeginning  a  word,
   while¡Öit>¡×would match any line with¡Èit¡Éending a word.  And, of course,¡Ö<it>¡×would  match
   any line with the word¡Èit¡Éin it.

   Going back to the regex to find grey/gray, that would make more sense, then, as
       <gr[ae]y>
   which  would  match  only  the  words¡Ègrey¡Éand¡Ègray¡É.   Somewhat similar are¡Æ^¡Çand¡Æ$¡Ç,
   which mean¡Èbeginning of line¡Éand¡Èend of line¡É,  respectively  (but,  not  in  a  character
   class,   of   course).   So  the  regex¡Ö^fun¡×would  find  any  line  that  begins  with  the
   letters¡Èfun¡É,  while¡Ö^fun>¡×would  find  any  line  that  begins  with   the   word¡Èfun¡É.
   ¡Ö^fun$¡×would find any line that was exactly¡Èfun¡É.

   Finally,¡Ö^\s*fun\s*$¡×would  match  any line that¡Èfun¡Éexactly, but perhaps also had leading
   and/or trailing whitespace.

   That's pretty much it. There are more complex things, some of which I'll mention in  the  list
   below,  but  even  with  these few simple constructs one can specify very detailed and complex
   patterns.

   Let's summarize some of the special things in regular expressions:

   Items that are basic units:
     char      any non-special character matches itself.
     \char     special chars, when proceeded by \, become non-special.
     .         Matches any one character (except \n).
     \n        Newline
     \t        Tab.
     \r        Carriage Return.
     \f        Formfeed.
     \d        Digit. Just a short-hand for [0-9].
     \w        Word element. Just a short-hand for [0-9a-zA-Z_].
     \s        Whitespace. Just a short-hand for [\t \n\r\f].
     \## \###  Two or three digit octal number indicating a single byte.
     [chars]   Matches a character if it's one of the characters listed.
     [^chars]  Matches a character if it's not one of the ones listed.

     The \char items above can be used within a character class,
     but not the items below.

     \D        Anything not \d.
     \W        Anything not \w.
     \S        Anything not \s.
     \a        Any ASCII character.
     \A        Any multibyte character.
     \k        Any (not half-width) katakana character (including ¡¼).
     \K        Any character not \k (except \n).
     \h        Any hiragana character.
     \H        Any character not \h (except \n).
     (regex)   Parens make the regex one unit.
     (?:regex)   [from perl5] Grouping-only parens -- can't use for \# (below)
     \c        Any JISX0208 kanji (kuten rows 16-84)
     \C        Any character not \c (except \n).
     \#        Match whatever was matched by the #th paren from the left.

   With¡È¡ù¡Éto indicate one¡Èunit¡Éas above, the following may be used:

     ¡ù?       A ¡ù allowed, but not required.
     ¡ù+       At least one ¡ù required, but more ok.
     ¡ù*       Any number of ¡ù ok, but none required.

   There are also ways to match¡Èsituations¡É:

     \b        A word boundary.
     <         Same as \b.
     >         Same as \b.
     ^         Matches the beginning of the line.
     $         Matches the end of the line.

   Finally, the¡Èor¡Éis

     reg1|reg2 Match if either reg1 or reg2 match.

   Note that¡È\k¡Éand the like aren't allowed in character classes, so
   something such as¡Ö[\k\h]¡×to try to get all kana won't work.
   Use ¡Ö(\k|\h)¡×instead.

BUGS

   Needs full support for half-width katakana and JIS X 0212-1990.
   Non-EUC (JIS & SJIS) items not tested well.
   Probably won't work on non-UNIX systems.
   Screen control codes (for clear and highlight commands) are hard-coded for ANSI/VT100/kterm.

AUTHOR

   Jeffrey Friedl (jfriedl@nff.ncl.omron.co.jp)

INFO

   Jim  Breen's  text  files  edict  and  kanjidic  and  their   documentation   can   be   found
   in¡Èpub/nihongo¡Éon ftp.cc.monash.edu.au (130.194.1.106

   Information  on  input and output encoding and codes can be found in Ken Lunde's Understanding
   Japanese Information Processing (ÆüËܸì¾ðÊó½èÍý) published by O'Reilly and  Associates.   ISBN
   1-56592-043-0.  There is also a Japanese edition published by SoftBank.

   A  program  to convert files among the various encoding methods is Dr. Ken Lunde'sjconv, which
   can also be found on ftp.cc.monash.edu.au.  Jconv is  also  useful  for  converting  halfwidth
   katakana (which lookup doesn't yet support well) to full-width.

                                                                                        LOOKUP(1)