lunar (1) ra-index.1.gz

Provided by: remembrance-agent_2.12-7.2_amd64 bug

NAME

       ra-index - index files for use with remembrance agent software

SYNOPSIS

       ra-index   [--version]   [-v]  [-d]  [-s]  <base-dir>  <source1>  [<source2>]  [...]   [-e
       <excludee1> [<excludee2>] [...]]

DESCRIPTION

       ra-index and ra-retrieve make up the Savant search engine, an information retrieval engine
       designed  as  a back-end for the Remembrance Agent (RA).  Given a collection of the user's
       accumulated email, usenet news articles, papers, saved HTML files and  other  text  notes,
       the  RA  attempts  to  find  those documents which are most relevant to the user's current
       context.  That is, it searches this collection of text for the documents  which  bear  the
       highest  word-for-word  similarity  to the text the user is currently editing, in the hope
       that they will also bear high conceptual similarity and  thus  be  useful  to  the  user's
       current work.  With the Emacs front-end, these suggestions are continuously displayed in a
       small buffer at the bottom of the user's window.  If a suggestion looks useful,  the  full
       text can be retrieved with a single command.

       The Remembrance Agent works in two stages.  First, the user's collection of text documents
       is indexed into a database saved in a vector format.  After the database is  created,  the
       other  stage  of  the  Remembrance  Agent is run from emacs, where it periodically takes a
       sample of text from the working buffer and finds those documents from the collection  that
       are  most similar.  It summarizes the top documents in a small emacs window and allows you
       to retrieve the entire text of any  one  with  a  keystroke.   See  the  README  file  for
       information on using the Emacs front-end.

       At  its  core  Savant  is  a  text-retrieval  search-engine  that  uses  a standard TF/iDF
       algorithm, but it also uses a template system to recognize different  kinds  of  documents
       and  extract various field information.  For example, ra-index can recognize subject lines
       and address information from email files and file this  information  separately.   It  can
       also  pull  apart  file  archives into separate documents, e.g. RMAIL files are indexed as
       separate email documents.  Finally, there are filters defined for many document  types  to
       remove  extraneous  information  like  HTML  tags  that  might otherwise cause problems in
       retrieval.  These are all precompiled in a template structure.  It is not  currently  well
       documented,  though  if  anyone  wants  to  play with it is all defined in the source file
       templates/conftemplates.c.

       The RA is primarily designed as a proactive information provider  that  continually  gives
       you information that might be relevant to your current environment, but Savant can also be
       used as a standard text and information retrieval search engine.

   USAGE
       To index, you must have a set of  source  text-files,  and  a  directory  Savant  can  put
       database  files into.  The <source> arguments may be files or directories.  If a directory
       is in the list, Savant will use all its contents, recursing into all subdirectories.  Non-
       text  files  and backup files (those appended with ~ or prepended with #) are ignored.  It
       also ignores dot-files  (those  starting  with  .)  and  symbolic  links.   Any  files  or
       directories  specified  after  the optional -e flag will be excluded.  Savant will use any
       files it finds to create a database in the specified base directory,  which  must  already
       exist.   The  optional -v argument (verbose) will direct Savant to keep you updated on its
       progress.  So for example,

            ra-index -v ~/RA-indexes/mail ~/RMAIL ~/Rmail-files -e ~/Rmail-files/Old-files
       will build a database in the ~/RA-indexes/mail directory, made up of emails from my  RMAIL
       file  plus  all files and subdirectories of ~/Rmail-files, excluding files and directories
       in ~/Rmail-files/Old-files.

       ra-index can build databases in any directory you like, but the emacs  interface  for  the
       Remembrance Agent expects a particular structure.  For each database you want to make, you
       should create a directory, and all these  directories  should  live  in  the  same  parent
       directory.   For example, for my own use I have a directory ~/RA-indexes/, and within that
       are the directories ~/RA-indexes/mail/, ~/RA-indexes/papers/, etc. which actually  contain
       the database files.

   OPTIONS
       -v     Verbose mode.  Print useful information.

       -d     Debug mode.  Print not-so-useful information.

       -e     Exclude all filenames and directories which follow

       -s     Follow symbolic links when indexing

       --version
              Print version information.

SEE ALSO

       ra-retrieve(1)

AUTHOR

       Bradley   Rhodes,   MIT   Media   Lab.    Please   send  comments  and  questions  to  ra-
       bugs@media.mit.edu.     New    versions    and     updates     can     be     found     at
       http://www.media.mit.edu/~rhodes/RA/

       All code included in versions up to and including 2.09:
          Copyright (C) 2001 Massachusetts Institute of Technology.

       All  modifications  subsequent  to  version  2.09  are  copyright  Bradley Rhodes or their
       respective authors.

       Developed by Bradley Rhodes at the Media Laboratory, MIT, Cambridge,  Massachusetts,  with
       support from British Telecom and Merrill Lynch.

       This program is free software; you can redistribute it and/or modify it under the terms of
       the GNU General Public License as  published  by  the  Free  Software  Foundation;  either
       version 2 of the License, or (at your option) any later version.  For commercial licensing
       under other terms, please consult the MIT Technology Licensing Office.

       This program may be subject to the following US and/or foreign patents (pending):  "Method
       and  Apparatus  for  Automated,  Context-Dependent Retrieval of Information," MIT Case No.
       7870TS. If any of these  patents  are  granted,  royalty-free  license  to  use  this  and
       derivative programs under the GNU General Public License are hereby granted.

       This  program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
       without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR  PURPOSE.
       See the GNU General Public License for more details.

       You should have received a copy of the GNU General Public License along with this program;
       if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,  Boston,
       MA 02111-1307, USA.

BUGS

       Dates  are not currently indexed, so anything trying to do a date query gets no suggestion
       back.

       Requires GNU make to compile.

       The template structure isn't documented.