Provided by: libchatbot-eliza-perl_1.08-1_all bug

NAME

       Chatbot::Eliza - A clone of the classic Eliza program

SYNOPSIS

         use Chatbot::Eliza;

         $mybot = new Chatbot::Eliza;
         $mybot->command_interface;

         # see below for details

DESCRIPTION

       This module implements the classic Eliza algorithm.  The original Eliza program was
       written by Joseph Weizenbaum and described in the Communications of the ACM in 1966.
       Eliza is a mock Rogerian psychotherapist.  It prompts for user input, and uses a simple
       transformation algorithm to change user input into a follow-up question.  The program is
       designed to give the appearance of understanding.

       This program is a faithful implementation of the program described by Weizenbaum.  It uses
       a simplified script language (devised by Charles Hayden).  The content of the script is
       the same as Weizenbaum's.

       This module encapsulates the Eliza algorithm in the form of an object.  This should make
       the functionality easy to incorporate in larger programs.

INSTALLATION

       The current version of Chatbot::Eliza.pm is available on CPAN:

         http://www.perl.com/CPAN/modules/by-module/Chatbot/

       To install this package, just change to the directory which you created by untarring the
       package, and type the following:

               perl Makefile.PL
               make test
               make
               make install

       This will copy Eliza.pm to your perl library directory for use by all perl scripts.  You
       probably must be root to do this, unless you have installed a personal copy of perl.

USAGE

       This is all you need to do to launch a simple Eliza session:

               use Chatbot::Eliza;

               $mybot = new Chatbot::Eliza;
               $mybot->command_interface;

       You can also customize certain features of the session:

               $myotherbot = new Chatbot::Eliza;

               $myotherbot->name( "Hortense" );
               $myotherbot->debug( 1 );

               $myotherbot->command_interface;

       These lines set the name of the bot to be "Hortense" and turn on the debugging output.

       When creating an Eliza object, you can specify a name and an alternative scriptfile:

               $bot = new Chatbot::Eliza "Brian", "myscript.txt";

       You can also use an anonymous hash to set these parameters.  Any of the fields can be
       initialized using this syntax:

               $bot = new Chatbot::Eliza {
                       name       => "Brian",
                       scriptfile => "myscript.txt",
                       debug      => 1,
                       prompts_on => 1,
                       memory_on  => 0,
                       myrand     =>
                               sub { my $N = defined $_[0] ? $_[0] : 1;  rand($N); },
               };

       If you don't specify a script file, then the new object will be initialized with a default
       script.  The module contains this script within itself.

       You can use any of the internal functions in a calling program.  The code below takes an
       arbitrary string and retrieves the reply from the Eliza object:

               my $string = "I have too many problems.";
               my $reply  = $mybot->transform( $string );

       You can easily create two bots, each with a different script, and see how they interact:

               use Chatbot::Eliza

               my ($harry, $sally, $he_says, $she_says);

               $sally = new Chatbot::Eliza "Sally", "histext.txt";
               $harry = new Chatbot::Eliza "Harry", "hertext.txt";

               $he_says  = "I am sad.";

               # Seed the random number generator.
               srand( time ^ ($$ + ($$ << 15)) );

               while (1) {
                       $she_says = $sally->transform( $he_says );
                       print $sally->name, ": $she_says \n";

                       $he_says  = $harry->transform( $she_says );
                       print $harry->name, ": $he_says \n";
               }

       Mechanically, this works well.  However, it critically depends on the actual script data.
       Having two mock Rogerian therapists talk to each other usually does not produce any
       sensible conversation, of course.

       After each call to the transform() method, the debugging output for that transformation is
       stored in a variable called $debug_text.

               my $reply      = $mybot->transform( "My foot hurts" );
               my $debugging  = $mybot->debug_text;

       This feature always available, even if the instance's $debug variable is set to 0.

       Calling programs can specify their own random-number generators.  Use this syntax:

               $chatbot = new Chatbot::Eliza;
               $chatbot->myrand(
                       sub {
                               #function goes here!
                       }
               );

       The custom random function should have the same prototype as perl's built-in rand()
       function.  That is, it should take a single (numeric) expression as a parameter, and it
       should return a floating-point value between 0 and that number.

       What this code actually does is pass a reference to an anonymous subroutine ("code
       reference").  Make sure you've read the perlref manpage for details on how code references
       actually work.

       If you don't specify any custom rand function, then the Eliza object will just use the
       built-in rand() function.

MAIN DATA MEMBERS

       Each Eliza object uses the following data structures to hold the script data in memory:

   %decomplist
       Hash: the set of keywords;  Values: strings containing the decomposition rules.

   %reasmblist
       Hash: a set of values which are each the join of a keyword and a corresponding
       decomposition rule; Values: the set of possible reassembly statements for that keyword and
       decomposition rule.

   %reasmblist_for_memory
       This structure is identical to %reasmblist, except that these rules are only invoked when
       a user comment is being retrieved from memory. These contain comments such as "Earlier you
       mentioned that...," which are only appropriate for remembered comments.  Rules in the
       script must be specially marked in order to be included in this list rather than
       %reasmblist. The default script only has a few of these rules.

   @memory
       A list of user comments which an Eliza instance is remembering for future use.  Eliza does
       not remember everything, only some things.  In this implementation, Eliza will only
       remember comments which match a decomposition rule which actually has reassembly rules
       that are marked with the keyword "reasm_for_memory" rather than the normal "reasmb".  The
       default script only has a few of these.

   %keyranks
       Hash: the set of keywords;  Values: the ranks for each keyword

   @quit
       "quit" words -- that is, words the user might use to try to exit the program.

   @initial
       Possible greetings for the beginning of the program.

   @final
       Possible farewells for the end of the program.

   %pre
       Hash: words which are replaced before any transformations; Values: the respective
       replacement words.

   %post
       Hash: words which are replaced after the transformations and after the reply is
       constructed;  Values: the respective replacement words.

   %synon
       Hash: words which are found in decomposition rules; Values: words which are treated just
       like their corresponding synonyms during matching of decomposition rules.

   Other data members
       There are several other internal data members.  Hopefully these are sufficiently obvious
       that you can learn about them just by reading the source code.

METHODS

   new()
           my $chatterbot = new Chatbot::Eliza;

       new() creates a new Eliza object.  This method also calls the internal _initialize()
       method, which in turn calls the parse_script_data() method, which initializes the script
       data.

           my $chatterbot = new Chatbot::Eliza 'Ahmad', 'myfile.txt';

       The eliza object defaults to the name "Eliza", and it contains default script data within
       itself.  However, using the syntax above, you can specify an alternative name and an
       alternative script file.

       See the method parse_script_data(). for a description of the format of the script file.

   command_interface()
           $chatterbot->command_interface;

       command_interface() opens an interactive session with the Eliza object, just like the
       original Eliza program.

       If you want to design your own session format, then you can write your own while loop and
       your own functions for prompting for and reading user input, and use the transform()
       method to generate Eliza's responses.  (Note: you do not need to invoke preprocess() and
       postprocess() directly, because these are invoked from within the transform() method.)

       But if you're lazy and you want to skip all that, then just use command_interface().  It's
       all done for you.

       During an interactive session invoked using command_interface(), you can enter the word
       "debug" to toggle debug mode on and off.  You can also enter the keyword "memory" to
       invoke the _debug_memory() method and print out the contents of the Eliza instance's
       memory.

   preprocess()
           $string = preprocess($string);

       preprocess() applies simple substitution rules to the input string.  Mostly this is to
       catch varieties in spelling, misspellings, contractions and the like.

       preprocess() is called from within the transform() method.  It is applied to user-input
       text, BEFORE any processing, and before a reassebly statement has been selected.

       It uses the array %pre, which is created during the parse of the script.

   postprocess()
           $string = postprocess($string);

       postprocess() applies simple substitution rules to the reassembly rule.  This is where all
       the "I"'s and "you"'s are exchanged.  postprocess() is called from within the transform()
       function.

       It uses the array %post, created during the parse of the script.

   _testquit()
            if ($self->_testquit($user_input) ) { ... }

       _testquit() detects words like "bye" and "quit" and returns true if it finds one of them
       as the first word in the sentence.

       These words are listed in the script, under the keyword "quit".

   _debug_memory()
            $self->_debug_memory()

       _debug_memory() is a special function which returns the contents of Eliza's memory stack.

   transform()
           $reply = $chatterbot->transform( $string, $use_memory );

       transform() applies transformation rules to the user input string.  It invokes
       preprocess(), does transformations, then invokes postprocess().  It returns the
       transformed output string, called $reasmb.

       The algorithm embedded in the transform() method has three main parts:

       1.  Search the input string for a keyword.

       2.  If we find a keyword, use the list of decomposition rules for that keyword, and
           pattern-match the input string against each rule.

       3.  If the input string matches any of the decomposition rules, then randomly select one
           of the reassembly rules for that decomposition rule, and use it to construct the
           reply.

       transform() takes two parameters.  The first is the string we want to transform.  The
       second is a flag which indicates where this sting came from.  If the flag is set, then the
       string has been pulled from memory, and we should use reassembly rules appropriate for
       that.  If the flag is not set, then the string is the most recent user input, and we can
       use the ordinary reassembly rules.

       The memory flag is only set when the transform() function is called recursively.  The
       mechanism for setting this parameter is embedded in the transoform method itself.  If the
       flag is set inappropriately, it is ignored.

   How memory is used
       In the script, some reassembly rules are special.  They are marked with the keyword
       "reasm_for_memory", rather than just "reasm".  Eliza "remembers" any comment when it
       matches a docomposition rule for which there are any reassembly rules for memory.  An
       Eliza object remembers up to $max_memory_size (default: 5) user input strings.

       If, during a subsequent run, the transform() method fails to find any appropriate
       decomposition rule for a user's comment, and if there are any comments inside the memory
       array, then Eliza may elect to ignore the most recent comment and instead pull out one of
       the strings from memory.  In this case, the transform method is called recursively with
       the memory flag.

       Honestly, I am not sure exactly how this memory functionality was implemented in the
       original Eliza program.  Hopefully this implementation is not too far from Weizenbaum's.

       If you don't want to use the memory functionality at all, then you can disable it:

               $mybot->memory_on(0);

       You can also achieve the same effect by making sure that the script data does not contain
       any reassembly rules marked with the keyword "reasm_for_memory".  The default script data
       only has 4 such items.

   parse_script_data()
           $self->parse_script_data;
           $self->parse_script_data( $script_file );

       parse_script_data() is invoked from the _initialize() method, which is called from the
       new() function.  However, you can also call this method at any time against an already-
       instantiated Eliza instance.  In that case, the new script data is added to the old script
       data.  The old script data is not deleted.

       You can pass a parameter to this function, which is the name of the script file, and it
       will read in and parse that file.  If you do not pass any parameter to this method, then
       it will read the data embedded at the end of the module as its default script data.

       If you pass the name of a script file to parse_script_data(), and that file is not
       available for reading, then the module dies.

Format of the script file

       This module includes a default script file within itself, so it is not necessary to
       explicitly specify a script file when instantiating an Eliza object.

       Each line in the script file can specify a key, a decomposition rule, or a reassembly
       rule.

         key: remember 5
           decomp: * i remember *
             reasmb: Do you often think of (2) ?
             reasmb: Does thinking of (2) bring anything else to mind ?
           decomp: * do you remember *
             reasmb: Did you think I would forget (2) ?
             reasmb: What about (2) ?
             reasmb: goto what
         pre: equivalent alike
         synon: belief feel think believe wish

       The number after the key specifies the rank.  If a user's input contains the keyword, then
       the transform() function will try to match one of the decomposition rules for that
       keyword.  If one matches, then it will select one of the reassembly rules at random.  The
       number (2) here means "use whatever set of words matched the second asterisk in the
       decomposition rule."

       If you specify a list of synonyms for a word, the you should use a "@" when you use that
       word in a decomposition rule:

         decomp: * i @belief i *
           reasmb: Do you really think so ?
           reasmb: But you are not sure you (3).

       Otherwise, the script will never check to see if there are any synonyms for that keyword.

       Reassembly rules should be marked with reasm_for_memory rather than reasmb when it is
       appropriate for use when a user's comment has been extracted from memory.

         key: my 2
           decomp: * my *
             reasm_for_memory: Let's discuss further why your (2).
             reasm_for_memory: Earlier you said your (2).
             reasm_for_memory: But your (2).
             reasm_for_memory: Does that have anything to do with the fact that your (2) ?

How the script file is parsed

       Each line in the script file contains an "entrytype" (key, decomp, synon) and an "entry",
       separated by a colon.  In turn, each "entry" can itself be composed of a "key" and a
       "value", separated by a space.  The parse_script_data() function parses each line out, and
       splits the "entry" and "entrytype" portion of each line into two variables, $entry and
       $entrytype.

       Next, it uses the string $entrytype to determine what sort of stuff to expect in the
       $entry variable, if anything, and parses it accordingly.  In some cases, there is no
       second level of key-value pair, so the function does not even bother to isolate or create
       $key and $value.

       $key is always a single word.  $value can be null, or one single word, or a string
       composed of several words, or an array of words.

       Based on all these entries and keys and values, the function creates two giant hashes:
       %decomplist, which holds the decomposition rules for each keyword, and %reasmblist, which
       holds the reassembly phrases for each decomposition rule.  It also creates %keyranks,
       which holds the ranks for each key.

       Six other arrays are created: "%reasm_for_memory, %pre, %post, %synon, @initial," and
       @final.

COPYRIGHT AND LICENSE

       This software is copyright (c) 2003 by John Nolan  <jpnolan@sonic.net>.

       This is free software; you can redistribute it and/or modify it under the same terms as
       the Perl 5 programming language system itself.

AUTHOR

       John Nolan  jpnolan@sonic.net  January 2003.

       Implements the classic Eliza algorithm by Prof. Joseph Weizenbaum.  Script format devised
       by Charles Hayden.